0

I would like to assign True/False value to each row in my pandas dataframe if the row contains certain values for columns that I specified. For example, suppose I had the following dataframe:

d = {
    "col1": ["alpha", "beta", "alpha", "gamma", "alpha"],
    "col2": [1, 2, 2, 3, 1],
    "col3": ["a", "a", "b", "c", "d"],
}
df = pd.DataFrame(d)
df

    col1  col2 col3
0  alpha     1    a
1   beta     2    a
2  alpha     2    b
3  gamma     3    c
4  alpha     1    d

I know that if I wanted to create a row mask using values for a single column I can use pandas's isin. For example, if I want all rows that contain alpha in col1, I can do the following: df['col1'].isin(["alpha"]).

0     True
1    False
2    False
3    False
4     True

How do I get similar behaviour but with multiple columns? What if I wanted to set all rows that have alpha in col1 AND 1 in col2 to True and everything else False?

I would like to have a function that takes in a dataframe and a dictionary as an input. The dictionary would have keys corresponding to column names and values as lists of values that I want. The function returns a pandas Series of booleans where each row is True if the corresponding row in the dataframe contains values in the dictionary and False otherwise. For example:

def multi_column_isin(df, d):
    <implementation>
    s = pandas Series where each row is True if it has values in d else False
    return s

s = multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
s

0     True
1    False
2    False
3     True
4     True
2
  • Can you give an example of the dictionary you would like to pass to the function? Commented Apr 14, 2022 at 0:32
  • The dictionary is provided in the question: {"col1": ["gamma", "alpha"], "col2": [1, 3]} Commented Apr 14, 2022 at 0:42

2 Answers 2

4

You can script a function to do this:

def multi_column_isin(df, d):
    # get rid of the index to avoid index alignment
    booleans = [df[col].isin(arr).array for col, arr in d.items()]
    return np.logical_and.reduce(booleans)

multi_column_isin(df, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False,  True,  True])


# alternative route with pipe
 df.pipe(multi_column_isin, {"col1": ["gamma", "alpha"], "col2": [1, 3]})
array([ True, False, False,  True,  True])

A simpler way would be to pass a dictionary to isin - sometimes though, this can result in incorrect/unwanted output, depending on index alignment:

(df
.drop(columns='col3')
.isin({"col1": ["gamma", "alpha"], "col2": [1, 3]})
.all(1)
)

0     True
1    False
2    False
3     True
4     True
dtype: bool
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Can you give me an example where passing a dictionary to isin would result in incorrect output?
The one that comes to mind is if the values in the dictionary are series and not array/list. Index alignment will occur first and if the indices differ, you might get unlikely errors. I might be wrong, but it is something I felt I should mention.
0

IIUC:

(df.col1 == 'alpha') & (df.col2 == 1)

Should get you there.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.