62761

HIVE : Why does Hive generate mapreduce job on select column from tablename Vs not generating mapred

Question:

Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?

Answer1:

When a simple statement like this is executed select * from tablename, what hive does is simply to fetch the data from the file stored in hdfs and bring it out in a columnar output format. Basically it generates a statement like

hadoop fs -cat hdfs://schemaname/tablename.txt hadoop fs -cat hdfs://schemaname/tablename.rc hadoop fs -cat hdfs://schemaname/tablename.orc

Or in whichever format your table's file is stored.

If you try selecting a column or adding a where clause to the query or using any aggregate on the table, MR comes into picture for obvious reasons.

Answer2:

Whenever you run a normal 'select *', a fetch task is created rather than a mapreduce task which just dumps the data as it is without doing anything on it. Whereas whenever you do a 'select column', a map job internally picks that particular column and gives the output.

There was also a bug filed for this to make 'select column' query run without mapreduce. Check the details here: <a href="https://issues.apache.org/jira/browse/HIVE-887" rel="nofollow">https://issues.apache.org/jira/browse/HIVE-887</a>

Recommend

  • HIVE : Why does Hive generate mapreduce job on select column from tablename Vs not generating mapred
  • Error in Dynamic SQL
  • Getting error when mapping PostgreSQL LTREE column in hibernate
  • How to view Apache Parquet file in Windows?
  • How can I omit system databases and allow SQL Server 2008 agent job to move past ERROR_NUMBER 208?
  • Dynamic table name for INSERT INTO query
  • C# MVC Handling nulls in ApplicationDbContext Constructor parameters
  • jooq issue with limit and offset
  • grant permission to all operations with database
  • Derby database export as a single file?
  • SailsJS Schema Name Issue
  • Is a collocated join (a-la-netezza) theoretically possible in hive?
  • how to zip a file in hdfs without pulling it into local file system
  • Expanding root partition on AWS EC2
  • Change zIndex in HighChart
  • mysql table locked after php crashes
  • Excel passing a range into a function
  • Checkpointing In ALS Spark Scala
  • Yii2: Using Kartik Depdrop Widget?
  • How to determine the CCSID used in CPYFRMIMPF command?
  • php is_file always return false
  • Spark job failing in YARN mode
  • Keep this build forever option - Jenkins
  • Converting a WriteableBitmap image ToArray in UWP
  • Is there any way to access browser form field suggestions from JavaScript?
  • recyclerView does not call the onBindViewHolder when scroll in the view
  • Knitr HTML Loop - Some HTML output, some R output
  • Deserializing XML into class C#
  • Hazelcast - OperationTimeoutException
  • Font Awesome Showing Box instead of Icons
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • Run Powershell script from inside other Powershell script with dynamic redirection to file
  • Load html files in TinyMce
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • coudnt use logback because of log4j
  • python draw pie shapes with colour filled
  • Is it possible to post an object from jquery to bottle.py?
  • How to Embed XSL into XML
  • Python/Django TangoWithDjango Models and Databases