62020

Complex Dataframe Merge Python Pandas

Question:

I am trying to merge 2 dataframes and can't quite get what I'm looking for.

Dataframe 1 looks like this.

Index Date Data1 Data2 A 2007-07-21 76 32 A 2007-08-13 nan 23 B 2007-06-15 53 nan B 2007-07-15 87 39

Dataframe 2 looks like this:

Index Date Data3 Data4 A 2007-07-24 14 nan A 2007-08-13 67 51 B 2007-06-21 32 36 B 2007-07-15 nan 91

The same indicies are in both dataframes. The index labels contain duplicates. There is some overlap in the dates but each dataframe also contains unique dates.

What I'd like in my result is the following: rows with the same Index and Date appear ONCE in the result with combined values (Data1, Data2, Data3, Data4). If an Index/Date combination appears once in either the left dataframe or right dataframe, that combination appears along with the relevant data from respective data frame and nans in columns from the dataframe where values don't exist.

From the above dataframes the result would look like this:

Index Date Data1 Data2 Data3 Data4 A 2007-07-21 76 32 nan nan A 2007-07-24 nan nan 14 nan A 2007-08-13 nan 23 67 51 B 2007-06-15 53 nan nan nan B 2007-06-21 nan nan 32 36 B 2007-07-15 87 39 nan 91

This exercise has aspects of a left join but also an outer join. Not sure how to get this using pd.merge or pd.concat.

Thank in advance for insight.

Answer1:

set_index + concat

pd.concat([df1.set_index(['Index','Date']),df2.set_index(['Index','Date'])],1).reset_index() Out[1145]: Index Date Data1 Data2 Data3 Data4 0 A 2007-07-21 76.0 32.0 NaN NaN 1 A 2007-07-24 NaN NaN 14.0 NaN 2 A 2007-08-13 NaN 23.0 67.0 51.0 3 B 2007-06-15 53.0 NaN NaN NaN 4 B 2007-06-21 NaN NaN 32.0 36.0 5 B 2007-07-15 87.0 39.0 NaN 91.0

Or we can use merge

df1.merge(df2,on=['Index','Date'],how='outer') Out[1147]: Index Date Data1 Data2 Data3 Data4 0 A 2007-07-21 76.0 32.0 NaN NaN 1 A 2007-08-13 NaN 23.0 67.0 51.0 2 B 2007-06-15 53.0 NaN NaN NaN 3 B 2007-07-15 87.0 39.0 NaN 91.0 4 A 2007-07-24 NaN NaN 14.0 NaN 5 B 2007-06-21 NaN NaN 32.0 36.0

Recommend

  • What is the purpose of Double.IsNaN(double d)? [closed]
  • Pandas: Delete duplicated items in a specific column
  • How to define an extern, C struct returning function in C++ using MSVC?
  • Combining two double[] arrays into double[,]
  • Webpack use UglifyJSPlugin to ONLY remove comments
  • Pattern combining type test and literal
  • Pipelining between two SEPARATE Powershell processes
  • How read between delimiters in php DOM of a XML file?
  • Slowly changing dimension - What is Pure type 6 implementation
  • SQL append distinct values from two columns and two tables
  • Hibernate: Inheritance and relationship mapping + generics
  • Should I use composite primary keys in Grails?
  • Xmonad multiple submap key combos
  • VBA Excel, loop through variables
  • Primefaces :radioButton inside a ui:repeat
  • How to autopopulate a field in SugarCRM form
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Differences in dis-assembled C code of GCC and Borland?
  • Breaking out column by groups in Pandas
  • Unable to get column index with table.getColumn method using custom table Model
  • Listbox within Listbox and scrolling trouble in Windows Phone 7 Silverlight
  • Disable Enter in editText android
  • Get one-time binding to work for ng-if
  • Is there any way to access browser form field suggestions from JavaScript?
  • Why ng-show works with ng-repeat but ng-if doesn't? [duplicate]
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • How to check if every primary key value is being referenced as foreign key in another table
  • vba code to select only visible cells in specific column except heading
  • Convert array of 8 bytes to signed long in C++
  • Do I've to free mysql result after storing it?
  • php design question - will a Helper help here?
  • Transpose CSV data with awk (pivot transformation)
  • AngularJs get employee from factory
  • python regex in pyparsing
  • Understanding cpu registers
  • IndexOutOfRangeException on multidimensional array despite using GetLength check
  • Authorize attributes not working in MVC 4
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • Sorting a 2D array using the second column C++