
Question:
I have a table in Hive:
CREATE EXTERNAL TABLE sr2015(
creation_date STRING,
status STRING,
first_3_chars_of_postal_code STRING,
intersection_street_1 STRING,
intersection_street_2 STRING,
ward STRING,
service_request_type STRING,
division STRING,
section STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
'colelction.delim'='\u0002',
'field.delim'=',',
'mapkey.delim'='\u0003',
'serialization.format'=',', 'skip.header.line.count'='1',
'quoteChar'= "\"")
The table is loaded data this way:
LOAD DATA INPATH "hdfs:///user/rxie/SR2015.csv" INTO TABLE sr2015;
Why the table is only accessible in Hive? when I attempt to access it in HUE/Impala Editor I got the following error:
<blockquote>AnalysisException: Could not resolve table reference: 'sr2015'
</blockquote>which seems saying there is no such a table, but the table does show up in the left panel.
In Impala-shell, error is different as below:
<blockquote>ERROR: AnalysisException: Failed to load metadata for table: 'sr2015' CAUSED BY: TableLoadingException: Failed to load metadata for table: sr2015 CAUSED BY: InvalidStorageDescriptorException: Impala does not support tables of this type. REASON: SerDe library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.
</blockquote>I have always been thinking Hive table and Impala table are essentially the same and difference is Impala is a more efficient query engine.
Can anyone help sort it out? Thank you very much.
Answer1:Assuming that sr2015
is located in DB called db
, in order to make the table visible in Impala, you need to either issue
invalidate metadata db;
</blockquote>or
<blockquote>invalidate metadata db.sr2015;
</blockquote>in Impala shell
However in your case, the reason is probably the version of Impala you're using, since it doesn't support the table format altogether
Answer2:Unfortunately, Impala <a href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_faq.html#faq_features__faq_unsupported" rel="nofollow">doesn't support</a> custom Hive Serializer/Deserializer classes (SerDes). It can only work with a limited set of file formats supported by its built-in SerDes.
Seems like this leaves you with the only option -- to CREATE another table (in Hive or Impala) using standard file format, and copy the data over from your Hive-only table sr2015
.
CREATE TABLE unquoted (
...
)
ROW FORMAT DELIMITED
STORED AS TEXTFILE;
INSERT INTO unquoted SELECT * FROM sr2015;
FWIW regarding your original issue (that Impala couldn't handle quotes in CSV file) there is a stagnant <a href="https://issues.apache.org/jira/browse/IMPALA-2148" rel="nofollow">Impala JIRA</a> which didn't get any traction for the last 3 years :(