1

I am trying to write function that obtains a df and a dictionary that maps columns to values. The function slices rows (indexes) such that it returns only rows whose values match ‘criteria’ keys values. for example: df_isr13 = filterby_criteria(df, {"Area":["USA"], "Year":[2013]}) Only rows with "Year"=2013 and "Area"="USA" are included in the output.

I tried:

def filterby_criteria(df, criteria):
    for key, values in criteria.items():
        return df[df[key].isin(values)]

but I get only the first criterion How can I get the new dataframe that except all criterias by pd.Dataframe.isin()?

1
  • Something like criteria = {"Area":["USA"], "Year":[2013]}; df[np.logical_and.reduce(df[k].isin(v) for k, v in criteria.items())]? Commented Jun 12, 2019 at 19:27

2 Answers 2

1

You can use for loop and add every criterion by pandas merge function:

def filterby_criteria(df, criteria):
    for key, values in criteria.items():
        df = pd.merge(df[df [key].isin(values)], df, how='inner')
    return df
Sign up to request clarification or add additional context in comments.

2 Comments

One should never grow objects in a loop including using merge, concat, append inside loops. This leads to excessive copying in memory.
@Parfait is absolutely correct. This is a terrible answer.
1

Consider a simple merge of two data frames since by default merge uses all matching names:

from itertools import product
import pandas as pd

def filterby_criteria(df, criteria):
    # EXTRACT DICT ITEMS
    k,v = criteria.keys(), criteria.values()
    # BUILD DF OF ALL POSSIBLE MATCHES
    all_matches = (pd.DataFrame(product(*v))
                     .set_axis(list(k), axis='columns', inplace=False)
                  )
    # RETURN MERGED DF
    return df.merge(all_matches)

To demonstrate with random, seeded data:

Data

import numpy as np
import pandas as pd

np.random.seed(61219)

tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']
years = list(range(2013, 2019))
random_df = pd.DataFrame({'Tool': np.random.choice(tools, 500),
                          'Int': np.random.randint(1, 10, 500),
                          'Num': np.random.uniform(1, 100, 500),
                          'Year': np.random.choice(years, 500)
                          })

print(random_df.head(10))
#      Tool  Int        Num  Year
# 0    spss    4  96.465327  2016
# 1     sas    7  23.455771  2016
# 2       r    5  87.349825  2014
# 3   julia    4  18.214028  2017
# 4   julia    7  17.977237  2016
# 5   stata    3  41.196579  2013
# 6   stata    8  84.943676  2014
# 7  python    4  60.576030  2017
# 8    spss    4  47.024075  2018
# 9   stata    3  87.271072  2017

Function call

criteria = {"Tool":["python", "r"], "Year":[2013, 2015]}

def filterby_criteria(df, criteria):
    k,v = criteria.keys(), criteria.values()
    all_matches = (pd.DataFrame(product(*v))
                     .set_axis(list(k), axis='columns', inplace=False)
                  )        
    return df.merge(all_matches)    

final_df = filterby_criteria(random_df, criteria)

Output

print(final_df)
#       Tool  Int        Num  Year
# 0   python    8  96.611384  2015
# 1   python    7  66.782828  2015
# 2   python    9  73.638629  2015
# 3   python    4  70.763264  2015
# 4   python    2  28.311917  2015
# 5   python    3  69.888967  2015
# 6   python    8  97.609694  2015
# 7   python    3  59.198276  2015
# 8   python    3  64.497017  2015
# 9   python    8  87.672138  2015
# 10  python    9  33.605467  2015
# 11  python    8  25.225665  2015
# 12       r    3  72.202364  2013
# 13       r    1  62.192478  2013
# 14       r    7  39.264766  2013
# 15       r    3  14.599786  2013
# 16       r    4  22.963723  2013
# 17       r    1  97.647922  2013
# 18       r    5  60.457344  2013
# 19       r    5  15.711207  2013
# 20       r    7  80.273330  2013
# 21       r    7  74.190107  2013
# 22       r    7  37.923396  2013
# 23       r    2  91.970678  2013
# 24       r    4  31.489810  2013
# 25       r    1  37.580665  2013
# 26       r    2   9.686955  2013
# 27       r    6  56.238919  2013
# 28       r    6  72.820625  2015
# 29       r    3  61.255351  2015
# 30       r    4  45.690621  2015
# 31       r    5  71.143601  2015
# 32       r    6  54.744846  2015
# 33       r    1  68.171978  2015
# 34       r    5   8.521637  2015
# 35       r    7  87.027681  2015
# 36       r    3  93.614377  2015
# 37       r    7  37.918881  2015
# 38       r    3   7.715963  2015
# 39  python    1  42.681928  2013
# 40  python    6  57.354726  2013
# 41  python    1  48.189897  2013
# 42  python    4  12.201131  2013
# 43  python    9   1.078999  2013
# 44  python    9  75.615457  2013
# 45  python    8  12.631277  2013
# 46  python    9  82.227578  2013
# 47  python    7  97.802213  2013
# 48  python    1  57.103964  2013
# 49  python    1   1.941839  2013
# 50  python    3  81.981437  2013
# 51  python    1  56.869551  2013

PyFiddle Demo (click Run at top)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.