71817

AWS Athena json_extract query from string field returns empty values

Question:

I have a table in athena with this structure

CREATE EXTERNAL TABLE `json_test`( `col0` string , `col1` string , `col2` string , `col3` string , `col4` string , ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'quoteChar'='\"', 'separatorChar'='\;')

A Json String like this is stored in "col4":

{'email': 'test_email@test_email.com', 'name': 'Andrew', 'surname': 'Test Test'}

I´m trying to make a json_extract query:

SELECT json_extract(col4 , '$.email') as email FROM "default"."json_test"

But the query returns empty values.

Any help would be appreciated.

Answer1:

The JSON needs to use double quotes (") for enclosing values.

Compare:

presto> SELECT json_extract('{"email": "test_email@test_email.com", "name": "Andrew"}' , '$.email'); _col0 ----------------------------- "test_email@test_email.com"

and

presto> SELECT json_extract('{''email'': ''test_email@test_email.com'', ''name'': ''Andrew''}', '$.email'); _col0 ------- NULL

(Note: '' inside SQL varchar literal mean single ' in the constructed value, so the literal here is the same format that in the question.)

If your string value is a "JSON with single quotes", you can try to fix it with <a href="https://prestodb.io/docs/current/functions/string.html" rel="nofollow">replace(string, search, replace) → varchar</a>

Answer2:

The problem was the single quote char of the json string stored

{'email': 'test_email@test_email.com', 'name': 'Andrew', 'surname': 'Test Test'}

Changing to double quote

{"email": "test_email@test_email.com", "name": "Andrew", "surname": "Test Test"}

Athena Query works properly:

SELECT json_extract(col4 , '$.email') as email FROM "default"."json_test"

Recommend