31087

Converting long table to wide and creating columns according to the rows

I have a data frame that look like this:

Customer_ID Category Products 1 Veg A 2 Veg B 3 Fruit A 3 Fruit B 3 Veg B 1 Fruit A 3 Veg C 1 Fruit C

I want to find out the for each customer ID for each category which products were bought, and create a column for each product accordingly. The output would look like this:

Customer_ID Category Pro_1 Pro_2 Pro_3 1 Veg A NA NA 1 Fruit A NA C 2 Veg NA B NA 3 Veg NA B C 3 Fruit A B NA

Answer1:

Use groupby with unstack, but if duplicates rows data are concanecate together:

df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack() df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)] df = df.reset_index() print (df) Customer_ID Category Pro_1 Pro_2 Pro_3 0 1 Fruit A None C 1 1 Veg A None None 2 2 Veg None B None 3 3 Fruit A B None 4 3 Veg None B C

Another solution with helper column, triples has to be unique:

#if not unique triples remove duplicates df = df.drop_duplicates(['Customer_ID','Category','Products']) df['a'] = df['Products'] df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack() df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)] df = df.reset_index() print (df) Customer_ID Category Pro_1 Pro_2 Pro_3 0 1 Fruit A None C 1 1 Veg A None None 2 2 Veg None B None 3 3 Fruit A B None 4 3 Veg None B C

Answer2:

Another option using crosstab :

pd.crosstab([df['Customer_ID'],df['Category']], df['Products'])

output:

Products A B C Customer_ID Category 1 Fruit 1 0 1 Veg 1 0 0 2 Veg 0 1 0 3 Fruit 1 1 0 Veg 0 1 1

Afterwards you can reset the index for a similar solution to what you wanted.

df = df.reset_index() Products Customer_ID Category A B C 0 1 Fruit 1 0 1 1 1 Veg 1 0 0 2 2 Veg 0 1 0 3 3 Fruit 1 1 0 4 3 Veg 0 1 1

Answer3:

Try this: (don't mind the IO thing it is just for simple copy/paste)

import pandas as pd from io import StringIO df = pd.read_csv(StringIO(""" Customer_ID Category Products 1 Veg A 2 Veg B 3 Fruit A 3 Fruit B 3 Veg B 1 Fruit A 3 Veg C 1 Fruit C"""), sep='\s+') df = df.join(pd.get_dummies(df['Products'])) g = df.groupby(['Customer_ID', 'Category']).sum() print(g)

output:

A B C Customer_ID Category 1 Fruit 1 0 1 Veg 1 0 0 2 Veg 0 1 0 3 Fruit 1 1 0 Veg 0 1 1

Recommend

  • Stack/Unstack Multi-index pivot table in Python Pandas
  • Connect pandas output to excel sheet via ODBC
  • Reshaping a pandas dataframe by repeating rows
  • Pandas groupwise percentage
  • Pandas wide_to_long with random id variables
  • pandas groupby rolling uneven time
  • why isn't the appended row's data displayed after being submitted? [duplicate]
  • NaNs after merging two dataframes
  • Python Pandas: Convert “.value_counts” output to dataframe
  • TypeError: (“sort_values() got multiple values for argument 'axis'”, 'occurred at ind
  • Pandas Modify Dataframe
  • how to merge two dataframes and sum the values of columns
  • Make Singleton class in Multiprocessing
  • Pandas dataframe transpose with original row and column values
  • how to convert a data frame with a list in the value to a big data frame with the each level as a si
  • How to rearrange table in pandas in a format suitable for analysis in R?
  • How to use pandas to read a line from a csv, proceed a VLOOKUP action and save the results into anot
  • Pandas Dataframe find intervals and count occurances
  • merging multiple columns into one columns in pandas
  • Pandas: index of max value for each group
  • Transforming multiindex to row-wise multi-dimensional NumPy array.
  • inserting duplicate records with SQL
  • Replace rows by index
  • Pandas groupby to to_csv
  • iOS 6 dateFromString returns wrong date
  • How to concat Pandas dataframe columns
  • How to turn (A, B, C) into (AB, AC, BC) with Pig?
  • How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation
  • Group list of tuples by item
  • Plotting line graph with factors in R
  • Diff between two dataframes in pandas
  • Breaking out column by groups in Pandas
  • R - Combining Columns to String Based on Logical Match
  • vba code to select only visible cells in specific column except heading
  • Do I've to free mysql result after storing it?
  • Transpose CSV data with awk (pivot transformation)
  • Python: how to group similar lists together in a list of lists?
  • Sorting a 2D array using the second column C++
  • reshape alternating columns in less time and using less memory
  • Unable to use reactive element in my shiny app