85758

How to load data to Hive table and make it also accessible in Impala

Question:

I have a table in Hive:

CREATE EXTERNAL TABLE sr2015( creation_date STRING, status STRING, first_3_chars_of_postal_code STRING, intersection_street_1 STRING, intersection_street_2 STRING, ward STRING, service_request_type STRING, division STRING, section STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'colelction.delim'='\u0002', 'field.delim'=',', 'mapkey.delim'='\u0003', 'serialization.format'=',', 'skip.header.line.count'='1', 'quoteChar'= "\"")

The table is loaded data this way:

LOAD DATA INPATH "hdfs:///user/rxie/SR2015.csv" INTO TABLE sr2015;

Why the table is only accessible in Hive? when I attempt to access it in HUE/Impala Editor I got the following error:

<blockquote>

AnalysisException: Could not resolve table reference: 'sr2015'

</blockquote>

which seems saying there is no such a table, but the table does show up in the left panel.

In Impala-shell, error is different as below:

<blockquote>

ERROR: AnalysisException: Failed to load metadata for table: 'sr2015' CAUSED BY: TableLoadingException: Failed to load metadata for table: sr2015 CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.

</blockquote>

I have always been thinking Hive table and Impala table are essentially the same and difference is Impala is a more efficient query engine.

Can anyone help sort it out? Thank you very much.

Answer1:

Assuming that sr2015 is located in DB called db, in order to make the table visible in Impala, you need to either issue

<blockquote>

invalidate metadata db;

</blockquote>

or

<blockquote>

invalidate metadata db.sr2015;

</blockquote>

in Impala shell

However in your case, the reason is probably the version of Impala you're using, since it doesn't support the table format altogether

Answer2:

Unfortunately, Impala <a href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_faq.html#faq_features__faq_unsupported" rel="nofollow">doesn't support</a> custom Hive Serializer/Deserializer classes (SerDes). It can only work with a limited set of file formats supported by its built-in SerDes.

Seems like this leaves you with the only option -- to CREATE another table (in Hive or Impala) using standard file format, and copy the data over from your Hive-only table sr2015.

CREATE TABLE unquoted ( ... ) ROW FORMAT DELIMITED STORED AS TEXTFILE;

INSERT INTO unquoted SELECT * FROM sr2015;

FWIW regarding your original issue (that Impala couldn't handle quotes in CSV file) there is a stagnant <a href="https://issues.apache.org/jira/browse/IMPALA-2148" rel="nofollow">Impala JIRA</a> which didn't get any traction for the last 3 years :(

Recommend

  • Custom editors in a property grid that uses Dictionary
  • Click on line in canvas
  • HIVE Query returning null values after import data from local stored file
  • Python3 openpyxl Copying data from row that contains certain value to new sheet in existing workbook
  • how to zip a file in hdfs without pulling it into local file system
  • Expanding root partition on AWS EC2
  • Checkpointing In ALS Spark Scala
  • Returning this from a constructor function in JS
  • Sending cookie value via httpget but not getting the desired response
  • hide missing dates from x-axis ggplot2
  • Find Previous month name using Calender or any classes that in java
  • android.support.v7.widget.Toolbar VectorDrawableCompat IllegalStateException when using support lib
  • C# program and C++ DLL compiled for 32-bit system crash on 64-bit system
  • Loading .coffee files via a view in Rails
  • How integrated is Collada to OpenGL ES
  • Date Conversion from yyyy-mm-dd to dd-mm-yyyy
  • Create DicomImage from scratch using Dcmtk
  • Textfile Structure (tables)
  • Refering to the class itself from within a class mehod in Objective C
  • Email format validation in mvc3 view
  • Cannot connect to cassandra from Spark
  • Fill an image in a square container while keeping aspect ratio
  • Font Awesome Showing Box instead of Icons
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • File upload with ng-file-upload throwing error
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • KeystoneJS: Relationships in Admin UI not updating
  • trying to dynamically update Highchart column chart but series undefined
  • Free memory of cv::Mat loaded using FileStorage API
  • Java static initializers and reflection
  • embed rChart in Markdown
  • Is it possible to post an object from jquery to bottle.py?
  • Busy indicator not showing up in wpf window [duplicate]
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Python/Django TangoWithDjango Models and Databases
  • Net Present Value in Excel for Grouped Recurring CF
  • How to load view controller without button in storyboard?