31450

Create Pandas DataFrame from (row, column, value) data

Question:

I have a Pandas Dataframe with three columns: row, column, value. The row values are all integers below some N, and the column values are all integers below some M. The values are all positive integers.

How do I efficiently create a Dataframe with N rows and M columns, with at index i, j the value val if (i, j , val) is a row in my original Dataframe, and some default value (0) otherwise? Furthermore, is it possible to create a <em>sparse</em> Dataframe immediately, since the data is already quite large, but N*M is still about 10 times the size of my data?

Answer1:

A NumPy solution would suit here for performance -

a = df.values m,n = a[:,:2].max(0)+1 out = np.zeros((m,n),dtype=a.dtype) out[a[:,0], a[:,1]] = a[:,2] df_out = pd.DataFrame(out)

Sample run -

In [58]: df Out[58]: row col val 0 7 1 30 1 3 3 0 2 4 8 30 3 5 8 18 4 1 3 6 5 1 6 48 6 0 2 6 7 4 7 6 8 5 0 48 9 8 1 48 10 3 2 12 11 6 8 18 In [59]: df_out Out[59]: 0 1 2 3 4 5 6 7 8 0 0 0 6 0 0 0 0 0 0 1 0 0 0 6 0 0 48 0 0 2 0 0 0 0 0 0 0 0 0 3 0 0 12 0 0 0 0 0 0 4 0 0 0 0 0 0 0 6 30 5 48 0 0 0 0 0 0 0 18 6 0 0 0 0 0 0 0 0 18 7 0 30 0 0 0 0 0 0 0 8 0 48 0 0 0 0 0 0 0

Recommend

  • Multiplying column elements of sparse Matrix
  • Hierarchical Clustering Large Sparse Distance Matrix R
  • Meteor: Block access to application if user's email is not verified
  • Authentication - JavaScript - Logout issue
  • Binary Tree Traversal Sum Of Each Depth
  • phpmailer - How to verify a sent email arrived at its destination
  • Small video playback
  • Using an STL Iterator without initialising it
  • Selecting a subset of data in ServiceStack.OrmLite
  • quiver not drawing arrows just lots of blue, matlab
  • Who propagate bugfixes across branches (corporate development)?
  • Group list of tuples by item
  • xcode don't localize specific strings
  • How do I include a SWC in an AS2 Flash project?
  • IE11 throwing “SCRIPT1014: invalid character” where all other browsers work
  • Atlas images wrong size on iPad iOS 9
  • jQuery .attr() and value
  • ilmerge with a PFX file
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • NetLogo BehaviorSpace - Measure runs using reporters
  • What is Eclipse's Declaration View used for?
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • Spray.io: When (not) to use non-blocking route handling?
  • Modifying destination and filename of gulp-svg-sprite
  • vba code to select only visible cells in specific column except heading
  • ActionScript 2 vs ActionScript 3 performance
  • Do I've to free mysql result after storing it?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • GridView Sorting works once only
  • Transpose CSV data with awk (pivot transformation)
  • Proper way to use connect-multiparty with express.js?
  • embed rChart in Markdown
  • apache spark aggregate function using min value
  • Running Map reduces the dimensions of the matrices
  • Sorting a 2D array using the second column C++
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Conditional In-Line CSS for IE and Others?
  • java string with new operator and a literal