84030

Convert Geo json with nested lists to pandas dataframe

Question:

I've a massive geo json in this form:

{'features': [{'properties': {'MARKET': 'Albany', 'geometry': {'coordinates': [[[-74.264948, 42.419877, 0], [-74.262041, 42.425856, 0], [-74.261175, 42.427631, 0], [-74.260384, 42.429253, 0]]], 'type': 'Polygon'}}}, {'properties': {'MARKET': 'Albany', 'geometry': {'coordinates': [[[-73.929627, 42.078788, 0], [-73.929114, 42.081658, 0]]], 'type': 'Polygon'}}}, {'properties': {'MARKET': 'Albuquerque', 'geometry': {'coordinates': [[[-74.769198, 43.114089, 0], [-74.76786, 43.114496, 0], [-74.766474, 43.114656, 0]]], 'type': 'Polygon'}}}], 'type': 'FeatureCollection'}

After reading the json:

import json with open('x.json') as f: data = json.load(f)

I read the values into a list and then into a dataframe:

#to get a list of all markets mkt=set([f['properties']['MARKET'] for f in data['features']]) #to create a list of market and associated lat long markets=[(market,list(chain.from_iterable(f['geometry']['coordinates']))) for f in data['features'] for market in mkt if f['properties']['MARKET']==mkt] df = pd.DataFrame(markets[0:], columns=['a','b'])

First few rows of df are:

a b 0 Albany [[-74.264948, 42.419877, 0], [-74.262041, 42.4... 1 Albany [[-73.929627, 42.078788, 0], [-73.929114, 42.0... 2 Albany [[-74.769198, 43.114089, 0], [-74.76786, 43.11...

Then to unnest the nested list in column b, I used pandas concat:

df1 = pd.concat([df.iloc[:,0:1], df['b'].apply(pd.Series)], axis=1)

But this is creating 8070 columns with many NaNs. Is there a way to group all the latitudes and longitudes by the Market (column a)? A million rows by two column dataframe is desired.

Desired op is:

mkt lat long Albany 42.419877 -74.264948 Albany 42.078788 -73.929627 .. Albuquerque 35.105361 -106.640342

Pls note that the zero in the list element ([-74.769198, 43.114089, 0]) needs to be ignored.

Answer1:

Something like this??

from pandas.io.json import json_normalize df = json_normalize(geojson["features"]) coords = 'properties.geometry.coordinates' df2 = (df[coords].apply(lambda r: [(i[0],i[1]) for i in r[0]]) .apply(pd.Series).stack() .reset_index(level=1).rename(columns={0:coords,"level_1":"point"}) .join(df.drop(coords,1), how='left')).reset_index(level=0) df2[['lat','long']] = df2[coords].apply(pd.Series) df2

Outputs:

index point properties.geometry.coordinates properties.MARKET \ 0 0 0 (-74.264948, 42.419877) Albany 1 0 1 (-74.262041, 42.425856) Albany 2 0 2 (-74.261175, 42.427631) Albany 3 0 3 (-74.260384, 42.429253) Albany 4 1 0 (-73.929627, 42.078788) Albany 5 1 1 (-73.929114, 42.081658) Albany 6 2 0 (-74.769198, 43.114089) Albuquerque 7 2 1 (-74.76786, 43.114496) Albuquerque 8 2 2 (-74.766474, 43.114656) Albuquerque properties.geometry.type lat long 0 Polygon -74.264948 42.419877 1 Polygon -74.262041 42.425856 2 Polygon -74.261175 42.427631 3 Polygon -74.260384 42.429253 4 Polygon -73.929627 42.078788 5 Polygon -73.929114 42.081658 6 Polygon -74.769198 43.114089 7 Polygon -74.767860 43.114496 8 Polygon -74.766474 43.114656

If:

geojson = {'features': [{'properties': {'MARKET': 'Albany', 'geometry': {'coordinates': [[[-74.264948, 42.419877, 0], [-74.262041, 42.425856, 0], [-74.261175, 42.427631, 0], [-74.260384, 42.429253, 0]]], 'type': 'Polygon'}}}, {'properties': {'MARKET': 'Albany', 'geometry': {'coordinates': [[[-73.929627, 42.078788, 0], [-73.929114, 42.081658, 0]]], 'type': 'Polygon'}}}, {'properties': {'MARKET': 'Albuquerque', 'geometry': {'coordinates': [[[-74.769198, 43.114089, 0], [-74.76786, 43.114496, 0], [-74.766474, 43.114656, 0]]], 'type': 'Polygon'}}}], 'type': 'FeatureCollection'}

Recommend

  • MPAndroidChart: getColors() is now deprecated for 'Legend'. What should I use instead?
  • about iOS target version and SDK version
  • Printing a flowdocument with dynamic data in WPF
  • Rails Template Error with Heroku
  • invoke-webrequest to get complete web page with images
  • Print a Form at higher dpi than screen resolution
  • search bar getting disappeared in ios UIsearchcontroller
  • Texture streaming in DirectX11, Immutable vs Dynamic
  • Need reference code for SMO in C# SQL Server 2008
  • “RepeatForUnit” item missing in Calendar entry?
  • AngularJS Dynamic Directives inside ng-repeat
  • PySpark: Get first Non-null value of each column in dataframe
  • Adding custom message on Thank You page by shipping method
  • Preloading webView doesn't work - trying to reduce loading time
  • UIImage to UIColor array of pixel colors
  • Enable CORS on Tomcat 8.0.30
  • Access to database zend framework
  • how to add dashed border on highcharts “area” graph for every point
  • How to add html image in to velocity template file to send email?
  • How to use Typescript with libraries like Ampersand.js that parse configs to build prototypes
  • getting the class name of an HTML tag using XPATH
  • How to create subsets of a single set of elements with XSLT?
  • how to run a different select statement based on condition in Hive SQL
  • Conflicting declaration using constexpr and auto in C++11
  • How to include associated objects using gon in Rails/jQuery
  • How can I ssh into a server that requires 2 password authentication using python's paramiko mod
  • VSTS work items list through REST API
  • php “page caching” solution suggestions for CMS Applications
  • Background transfer download task failed when app was closed
  • ssh remote server login script
  • XEP-0166: Jingle protocol implementation for voice/video chat in iOS
  • Using Service Component Runtime
  • multiple button click in asp.net MVC 3
  • Sql - ON DUPLICATE KEY UPDATE
  • Jersey serializes character value to ASCII equivalent numeric string
  • CAS 4 - Not able to retrieve the LDAP groups after successful authentication
  • What does the “id” field in an Android “Google Play Music” broadcast intent correspond to?
  • How to check if object is null in Java?