How to compare two columns both with list of strings and create a new column with unique items?


I have two columns both with list of strings. Basically one column df['products'] which are in all capitals. The other column is product description df['desc'].

I want to check what all items in df['products'] are present in df['desc'] and make a new column out of it.

I tried the following code:

df['uniq'] = df.apply(lambda x : [i for i in x['products'] if i.lower() in x['desc']])

I checked the other similar questions and built the above code, but it's not working.

The data looks something like this:

<a href="https://i.stack.imgur.com/mpt0Y.png" rel="nofollow"><img alt="enter image description here" class="b-lazy" data-src="https://i.stack.imgur.com/mpt0Y.png" data-original="https://i.stack.imgur.com/mpt0Y.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>


It seems you need add axis=1 if need check per rows:

df = pd.DataFrame({'products':[['A','B'],['D','C']], 'desc':[['a', 'c'],['c', 'e']]}) df['uniq'] = df.apply(lambda x: [i for i in x['products'] if i.lower() in x['desc']], axis=1) print (df) desc products uniq 0 [a, c] [A, B] [A] 1 [c, e] [D, C] [C]


Don't use apply() when you don't absolutely need to. It's slow.

Instead, do it the vectorized way:

desc_upper = df.desc.str.upper() matches = df.products.isin(desc_upper) result = df.products[matches]


