524

How to compare two columns both with list of strings and create a new column with unique items?

Question:

I have two columns both with list of strings. Basically one column df['products'] which are in all capitals. The other column is product description df['desc'].

I want to check what all items in df['products'] are present in df['desc'] and make a new column out of it.

I tried the following code:

df['uniq'] = df.apply(lambda x : [i for i in x['products'] if i.lower() in x['desc']])

I checked the other similar questions and built the above code, but it's not working.

The data looks something like this:

<a href="https://i.stack.imgur.com/mpt0Y.png" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/mpt0Y.png" data-original="https://i.stack.imgur.com/mpt0Y.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

Answer1:

It seems you need add axis=1 if need check per rows:

df = pd.DataFrame({'products':[['A','B'],['D','C']], 'desc':[['a', 'c'],['c', 'e']]}) df['uniq'] = df.apply(lambda x: [i for i in x['products'] if i.lower() in x['desc']], axis=1) print (df) desc products uniq 0 [a, c] [A, B] [A] 1 [c, e] [D, C] [C]

Answer2:

Don't use apply() when you don't absolutely need to. It's slow.

Instead, do it the vectorized way:

desc_upper = df.desc.str.upper() matches = df.products.isin(desc_upper) result = df.products[matches]

Recommend

  • Search through sentences
  • How to extract distinct part of a string from a file in linux
  • Make existing column unique in Rails
  • Python 2.7 on Windows — Too Many Open Files
  • XGBOOST - DMATRIX
  • missing parameter name at index 0 {}
  • Python PIL remove sections of an image based on its colour
  • Passing information to server-side function in a Google Docs Add On
  • How to skip require in ruby?
  • What's the syntax to inherit documentation from another indexer?
  • Error in installing package: fatal error: stdlib.h: no such file or directory
  • How to make R's read_csv2() recognise the text characters properly
  • Add dynamic data to line chart from mysql database with highcharts
  • Using Sax parsing to edit and write XML in VB6
  • ListItem.Attributes.Add not working
  • Reduction and collapse clauses in OMP have some confusing points
  • how to avoid repetitive constructor in children
  • MongoDb aggregation
  • How to use remove-erase idiom for removing empty vectors in a vector?
  • Spark fat jar to run multiple versions on YARN
  • How to avoid particles glitching together in an elastic particle collision simulator?
  • Recording logins for password protected directories
  • Why value captured by reference in lambda is broken? [duplicate]
  • Using $this when not in object context
  • Is there any way to access browser form field suggestions from JavaScript?
  • Splitting given String into two variables - php
  • Deselecting radio buttons while keeping the View Model in synch
  • Nant, Vault & Windows Integrated Authentication
  • Check if a string to interpolate provides expected placeholders
  • ActionScript 2 vs ActionScript 3 performance
  • jQuery tmpl and DataLink beta
  • php design question - will a Helper help here?
  • json Serialization in asp
  • WPF Applying a trigger on binding failure
  • Acquiring multiple attributes from .xml file in c#
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • How can I remove ASP.NET Designer.cs files?
  • java string with new operator and a literal
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How do I use LINQ to get all the Items that have a particular SubItem?