89040

Pandas KeyError: “['value'] not in index”

Question:

I'm having some issues with the index from a Pandas data frame. What I'm trying to do is load data from a JSON file, create a Pandas data frame and then select specific fields from that data frame and send it to my database.

The following is a link to what's in the JSON file so you can see the fields actually exist: <a href="https://pastebin.com/Bzatkg4L" rel="nofollow">https://pastebin.com/Bzatkg4L</a>

import pandas as pd from pandas.io import sql import MySQLdb from sqlalchemy import create_engine # Open and read the text file where all the Tweets are with open('US_tweets.json') as f: tweets = f.readlines() # Convert the list of Tweets into a structured dataframe df = pd.DataFrame(tweets) # Attributes needed should be here df = df[['created_at', 'screen_name', 'id', 'country_code', 'full_name', 'lang', 'text']] # To create connection and write table into MySQL engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}" .format(user="blah", pw="blah", db="blah")) df.to_sql(con=engine, name='US_tweets_Table', if_exists='replace', flavor='mysql')

Thanks for your help!

Answer1:

Pandas doesn't map every object in the JSON file to a column in the dataframe. Your example file contains 24 columns:

with open('tweets.json') as f: df = pd.read_json(f, lines = True) df.columns

Returns:

Index(['contributors', 'coordinates', 'created_at', 'entities', 'favorite_count', 'favorited', 'geo', 'id', 'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'metadata', 'place', 'retweet_count', 'retweeted', 'source', 'text', 'truncated', 'user'], dtype='object')

To dig deeper into the JSON data, I found this solution, but I hope a more elegant approach exists: <a href="https://stackoverflow.com/questions/18665284/how-do-i-access-embedded-json-objects-in-a-pandas-dataframe" rel="nofollow">How do I access embedded json objects in a Pandas DataFrame?</a>

For example, df['entities'].apply(pd.Series)['urls'].apply(pd.Series)[0].apply(pd.Series)['indices'][0][0] returns 117.

To access full_name and copy it to the df, try this: df['full_name'] = df['place'].apply(pd.Series)['full_name'], which returns 0 Austin, TX.

Recommend

  • How to add suffix and prefix to all columns in python/pyspark dataframe
  • R Lattice / LatticeExtra combine Barplot with Textplot - Labels not properly displayed
  • Sample from dataframe while respecting seasonal sequence
  • Pyspark implementation of DATEADD
  • How to setup django 1.8 to use jinja2?
  • Duplicating records to fill gap between dates in Google BigQuery
  • How to add new index numbers to the upsampled data while preserving the orginal indices one
  • Is there an API (SOAP, JSON, XML-RPC, REST, anything) to Google Code Issues?
  • Transpose table then set and rename index
  • CSS Grid, position absolute an element in a css grid item: IMPOSSIBLE
  • How to filter on year and quarter in pandas
  • Color time-series based on column values in pandas
  • how to get data attributes of dynamically generated element
  • Application level floating views with navigation in Android
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Row to Column conversion in Talend
  • jQuery Orbit - How to make a Random Slideshow?
  • Convert SQLite database to XML
  • Make new pandas columns based on pipe-delimited column with possible repeats
  • Reading a file into a multidimensional array
  • Object and struct member access and address offset calculation
  • ListItem.Attributes.Add not working
  • WPF Visiblity Binding to Boolean Expression with multiple Variables
  • Conversion from string “a” to type 'Boolean' is not valid
  • xtable package: Skipping some rows in the output
  • How to change the font size of a single index for UISegmentedControl?
  • Hardware Accelerated Image Scaling in windows using C++
  • Magento Fatal error: Maximum execution error solution, on WAMP
  • Bad request using file_get_contents for PUT request in PHP
  • Error when parsing timestamp with pandas read_csv
  • How to redirect a user to a different server and include HTTP basic authentication credentials?
  • How to extract text from Word files using C#?
  • Can I make an Android app that runs a web view in Chrome 39?
  • Arrays break string types in Julia
  • Turn off referential integrity in Derby? is it possible?
  • LevelDB C iterator
  • Linking SubReports Without LinkChild/LinkMaster
  • Add sale price programmatically to product variations
  • Unable to use reactive element in my shiny app
  • How do I use LINQ to get all the Items that have a particular SubItem?