Sunday, November 6, 2016

Query that returns a large ResultSet using Hive JDBC takes ages to complete

You are trying to execute a query that the size of its result set is huge and its execution time using the beeline CLI is fine. The Hive and Spark logs don't show any errors but you might see lots of Kryo messages in the Hive debug logs - in such cases its highly recommended to start Hive using the following command:
hiveserver2 --hiveconf hive.root.logger=DEBUG,console
It usually happens because of the Kryo serialization/deserialization process time in case you have configured Hive on Spark.
In such cases I recommend executing the query using Spark so the end-to-end process is much faster and equals to the beeline execution time.

Good luck

No comments: