53883

Group by one column and show the availability of specific values from another column

Question:

I have this dataframe:

df1: drug_id illness lexapro.1 HD lexapro.1 MS lexapro.2 HDED lexapro.2 MS lexapro.2 MS lexapro.3 CD lexapro.3 Sweat lexapro.4 HD lexapro.5 WD lexapro.5 FN

I am going to first group the data based on drug_id, and search for availability of HD, MS, and FN in the illness column. Then fill in the second data frame like this:

df2: drug_id HD MS FN lexapro.1 1 1 0 lexapro.2 0 1 0 lexapro.3 0 0 0 lexapro.4 1 0 0 lexapro.5 0 0 1

This is my code for grouping.

df1.groupby('drug_id', sort=False).isin('HD')

but I do not know how I can assign 1 to the F2['HD'] for each drug_id, if the 'HD' was available for that drug_id in df1.

Thank you.

Answer1:

<strong>Option 1</strong><br />crosstab

pd.crosstab(df.drug_id, df.illness)[['HD', 'MS', 'FN']].ge(1).astype(int) illness HD MS FN drug_id lexapro.1 1 1 0 lexapro.2 0 1 0 lexapro.3 0 0 0 lexapro.4 1 0 0 lexapro.5 0 0 1 <hr />

<strong>Option 2</strong><br />groupby + value_counts + unstack

df.groupby('drug_id').illness.value_counts()\ .unstack()[['HD', 'MS', 'FN']].ge(1).astype(int) illness HD MS FN drug_id lexapro.1 1 1 0 lexapro.2 0 1 0 lexapro.3 0 0 0 lexapro.4 1 0 0 lexapro.5 0 0 1 <hr />

<strong>Option 3</strong><br />get_dummies + sum

df.set_index('drug_id').illness.str.get_dummies()\ .sum(level=0)[['HD', 'MS', 'FN']].ge(1).astype(int) HD MS FN drug_id lexapro.1 1 1 0 lexapro.2 0 1 0 lexapro.3 0 0 0 lexapro.4 1 0 0 lexapro.5 0 0 1

Thanks to Scott Boston for the improvement!

Answer2:

df.groupby(['drug_id','illness']).illness.count().unstack(-1).reindex_axis(['HD', 'MS', 'FN'],axis=1).ge(0).astype(int) Out[276]: illness HD MS FN drug_id lexapro.1 1 1 0 lexapro.2 0 1 0 lexapro.3 0 0 0 lexapro.4 1 0 0 lexapro.5 0 0 1

Recommend

  • defproject Compiler Exception
  • Bigcommerce Python API, how do I create a product with an image?
  • Pandas split array based on condition
  • How to group a list of lists by date using Linq?
  • Change storage class of (existing) objects in Google Cloud Storage
  • How to Optimize mach_msg_trap
  • How to emulate integrated numeric keypad cursor keys in linux
  • How can I sum two different columns at once where one contains Decimal objects in pandas?
  • jQuery file download plugin
  • How can I select the most recent and distinct records using LINQ?
  • if some function is not optimized does it mean that all functions where it is declared are not optim
  • Why doesn't a local variable live long enough for thread::scoped?
  • How to concat Pandas dataframe columns
  • IE10 strips out hashtag from the URL
  • Can't remove headers after they are sent
  • Group list of tuples by item
  • Linq Merge lists
  • wxPython: displaying multiple widgets in same frame
  • Groovy: Unexpected token “:”
  • How can I sort a a table with VBA with given text condition?
  • Use of this Javascript
  • Replace value with Factor in r data.table
  • Is it possible to access block's scope in method?
  • Google Custom Search with transparent background
  • C++ Partial template specialization - design simplification
  • How to access EntityManager inside Entity class in EJB3
  • R - Combining Columns to String Based on Logical Match
  • Repeat a vertical line on every page in Report Builder / SSRS
  • Insert into database using onclick function
  • Retrieving value from sql ExecuteScalar()
  • What is Eclipse's Declaration View used for?
  • How to convert from System.Drawing.Color to Excel.ColorFormat in C#? Change comment color
  • Can I make an Android app that runs a web view in Chrome 39?
  • How to get next/previous record number?
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • Matrix multiplication with MKL
  • Python: how to group similar lists together in a list of lists?
  • embed rChart in Markdown
  • Does armcc optimizes non-volatile variables with -O0?
  • Unable to use reactive element in my shiny app