![Pandas KeyError: “['value'] not in index”](https://www.xszz.org/skin/wt/rpic/t12.jpg)
Question:
I'm having some issues with the index from a Pandas data frame. What I'm trying to do is load data from a JSON file, create a Pandas data frame and then select specific fields from that data frame and send it to my database.
The following is a link to what's in the JSON file so you can see the fields actually exist: <a href="https://pastebin.com/Bzatkg4L" rel="nofollow">https://pastebin.com/Bzatkg4L</a>
import pandas as pd
from pandas.io import sql
import MySQLdb
from sqlalchemy import create_engine
# Open and read the text file where all the Tweets are
with open('US_tweets.json') as f:
tweets = f.readlines()
# Convert the list of Tweets into a structured dataframe
df = pd.DataFrame(tweets)
# Attributes needed should be here
df = df[['created_at', 'screen_name', 'id', 'country_code', 'full_name', 'lang', 'text']]
# To create connection and write table into MySQL
engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
.format(user="blah",
pw="blah",
db="blah"))
df.to_sql(con=engine, name='US_tweets_Table', if_exists='replace', flavor='mysql')
Thanks for your help!
Answer1:Pandas doesn't map every object in the JSON file to a column in the dataframe. Your example file contains 24 columns:
with open('tweets.json') as f:
df = pd.read_json(f, lines = True)
df.columns
Returns:
Index(['contributors', 'coordinates', 'created_at', 'entities',
'favorite_count', 'favorited', 'geo', 'id', 'id_str',
'in_reply_to_screen_name', 'in_reply_to_status_id',
'in_reply_to_status_id_str', 'in_reply_to_user_id',
'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'metadata',
'place', 'retweet_count', 'retweeted', 'source', 'text', 'truncated',
'user'],
dtype='object')
To dig deeper into the JSON data, I found this solution, but I hope a more elegant approach exists: <a href="https://stackoverflow.com/questions/18665284/how-do-i-access-embedded-json-objects-in-a-pandas-dataframe" rel="nofollow">How do I access embedded json objects in a Pandas DataFrame?</a>
For example, df['entities'].apply(pd.Series)['urls'].apply(pd.Series)[0].apply(pd.Series)['indices'][0][0]
returns 117
.
To access full_name
and copy it to the df, try this:
df['full_name'] = df['place'].apply(pd.Series)['full_name']
, which returns 0 Austin, TX
.