pandas search for substring over multiple columns

Question

I have a df such that

       c_name  f_name 
0      abc     abc12  
1      xyz     abc1  
2      mnq     mnq2

The goal is to find a substring across the two columns an know which column it belongs to. Preference should be to c_name, as in if the substring is in both the columns then c_name gets precedence For eg: if I search for abc in the above dataframe I should somehow get row 0 abc for c_name and row 1 abc1 for f_name.

To solve this I started with df[df['c_name'].str.contains('abc', case=False)] which will give me the results for c_name. The question now is to how to exclude the rows where I already have the results from performing the same operation on f_name. Any help is greatly appreciated!

Does this answer your question? pandas dataframe str.contains() AND operation — Abu Shoeb
– Abu Shoeb, Commented Apr 26, 2021 at 18:25

Shijo · Accepted Answer · 2017-01-17 18:21:22Z

2

import pandas as pd
row  =[['abcx','abcy'],
       ['efg','abcz'],
       ['higj','UK']]
df= pd.DataFrame(row)
df.columns = ['c_name', 'f_name']

print df[df['c_name'].str.contains('abc', case=False)]

delta_df =df[~df['c_name'].str.contains('abc', case=False)]
print delta_df[delta_df['f_name'].str.contains('abc', case=False)]

output

  c_name f_name
0   abcx   abcy
  c_name f_name
1    efg   abcz

answered Jan 17, 2017 at 18:21

Shijo

9,7913 gold badges23 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2017-01-18 01:44:52Z

2

stack into a series
str.contains to get truth value of sub string
unstack to get back dataframe
subset results to ensure at least one match
idxmax(1) gets first True in columns

def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].idxmax(1)

find_subtext(df, 'abc')

0    c_name
1    f_name
dtype: object

df.assign(abc=find_subtext(df, 'abc'))

  c_name f_name     abc
0    abc  abc12  c_name
1    xyz   abc1  f_name
2    mnq   mnq2     NaN

answered Jan 18, 2017 at 1:44

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

Fizi Over a year ago

I can always rely on you to come through with a response :) Thank you very much. Its an interesting approach

yuewu008 · Accepted Answer · 2017-01-17 18:16:18Z

0

Mark your first search result as something like 2. This will be overridden(by 1) if the new search result meets.

answered Jan 17, 2017 at 18:16

yuewu008

234 bronze badges

1 Comment

Fizi Over a year ago

how is the question :) I know what to do, I just dont know how to do it since I am not proficient at pandas

Aziz Alto · Accepted Answer · 2019-02-08 22:11:57Z

0

Here is another simple way:

concatenate the target "string" columns into a new single column, e.g.
```
df['new_col'] = df['c_name'] + ' ' + df['f_name']
```

search the new_col for the substring, e.g.

result = df[df['new_col'].str.contains('abc')]

delete new_col after it has completed its mission:
```
del results['new_col']
```

Here is an example:

>>> df= pd.DataFrame(row, columns=['c_name', 'f_name'])
>>> df
  c_name f_name
0   abcx   abcy
1    efg   abcz
2   higj     UK
>>> df['new_col'] = df['c_name'] + ' ' + df['f_name']
>>> results = df[df['new_col'].str.contains('abc')]
>>> del df['new_col'], results['new_col']
>>> results
  c_name f_name
0   abcx   abcy
1    efg   abcz

edited Feb 8, 2019 at 22:11

answered Feb 8, 2019 at 21:48

Aziz Alto

20.7k5 gold badges82 silver badges63 bronze badges

Collectives™ on Stack Overflow

pandas search for substring over multiple columns

4 Answers 4

Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related