pandas isin using multiple column values

Question

I would like to assign True/False value to each row in my pandas dataframe if the row contains certain values for columns that I specified. For example, suppose I had the following dataframe:

d = {
    "col1": ["alpha", "beta", "alpha", "gamma", "alpha"],
    "col2": [1, 2, 2, 3, 1],
    "col3": ["a", "a", "b", "c", "d"],
}
df = pd.DataFrame(d)
df

    col1  col2 col3
0  alpha     1    a
1   beta     2    a
2  alpha     2    b
3  gamma     3    c
4  alpha     1    d

I know that if I wanted to create a row mask using values for a single column I can use pandas's isin. For example, if I want all rows that contain alpha in col1, I can do the following: df['col1'].isin(["alpha"]).

0     True
1    False
2    False
3    False
4     True

How do I get similar behaviour but with multiple columns? What if I wanted to set all rows that have alpha in col1 AND 1 in col2 to True and everything else False?

I would like to have a function that takes in a dataframe and a dictionary as an input. The dictionary would have keys corresponding to column names and values as lists of values that I want. The function returns a pandas Series of booleans where each row is True if the corresponding row in the dataframe contains values in the dictionary and False otherwise. For example:

def multi_column_isin(df, d):
    <implementation>
    s = pandas Series where each row is True if it has values in d else False
    return s

s = multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
s

0     True
1    False
2    False
3     True
4     True

Can you give an example of the dictionary you would like to pass to the function? — le_camerone
– le_camerone, Commented Apr 14, 2022 at 0:32
The dictionary is provided in the question: {"col1": ["gamma", "alpha"], "col2": [1, 3]} — yippingAppa
– yippingAppa, Commented Apr 14, 2022 at 0:42

sammywemmy · Accepted Answer · 2022-04-14 01:12:53Z

4

You can script a function to do this:

def multi_column_isin(df, d):
    # get rid of the index to avoid index alignment
    booleans = [df[col].isin(arr).array for col, arr in d.items()]
    return np.logical_and.reduce(booleans)

multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False,  True,  True])


# alternative route with pipe
 df.pipe(multi_column_isin, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False,  True,  True])

A simpler way would be to pass a dictionary to isin - sometimes though, this can result in incorrect/unwanted output, depending on index alignment:

(df
.drop(columns='col3')
.isin({"col1": ["gamma", "alpha"], "col2": [1, 3]})
.all(1)
)

0     True
1    False
2    False
3     True
4     True
dtype: bool

edited Apr 14, 2022 at 1:12

answered Apr 14, 2022 at 1:05

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

yippingAppa Over a year ago

Thank you! Can you give me an example where passing a dictionary to isin would result in incorrect output?

sammywemmy Over a year ago

The one that comes to mind is if the values in the dictionary are series and not array/list. Index alignment will occur first and if the indices differ, you might get unlikely errors. I might be wrong, but it is something I felt I should mention.

le_camerone · Accepted Answer · 2022-04-14 00:02:30Z

0

IIUC:

(df.col1 == 'alpha') & (df.col2 == 1)

Should get you there.

answered Apr 14, 2022 at 0:02

le_camerone

6285 silver badges17 bronze badges

Collectives™ on Stack Overflow

pandas isin using multiple column values

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related