Slice Pandas dataframe by index values that are (not) in a list

Question

I have a pandas dataframe, df.

I want to select all indices in df that are not in a list, blacklist.

Now, I use list comprehension to create the desired labels to slice.

ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

Works fine, but may be clumsy if I need to do this often.

Is there a better way to do this?

Possible duplicate of dropping rows from dataframe based on a "not in" condition — Jim G.
– Jim G., Commented Sep 11, 2019 at 18:34

Hooked · Accepted Answer · 2017-07-07 14:55:54Z

190

Use isin on the index and invert the boolean index to perform label selection:

In [239]:

df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
          a
0 -0.548275
1 -0.411741
2 -1.187369
3  1.028967
4 -2.755030
In [240]:

t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
          a
0 -0.548275
1 -0.411741
3  1.028967

edited Jul 7, 2017 at 14:55

Hooked

88.9k46 gold badges197 silver badges271 bronze badges

answered Mar 19, 2015 at 8:47

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ASGM · Accepted Answer · 2015-03-18 23:44:58Z

26

You could use set() to create the difference between your original indices and those that you want to remove:

df.loc[set(df.index) - set(blacklist)]

It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.

answered Mar 18, 2015 at 23:44

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

Comments

4b0 · Accepted Answer · 2020-10-22 08:51:56Z

6

df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]

df.drop(blacklist,0)

edited Oct 22, 2020 at 8:51

4b0

22.4k30 gold badges97 silver badges143 bronze badges

answered Oct 22, 2020 at 8:50

George Xiong

611 silver badge1 bronze badge

3 Comments

4b0 Over a year ago

Code-only answers are not particularly helpful. Please include a brief description of how this code solves the problem.

ZF007 Over a year ago

Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes.

Corey Levinson Over a year ago

This, in my opinion, is a lot prettier/more elegant than doing df.loc[~df.index.isin(blacklist)]. However, it is less interpretable, since usually people only use drop to remove columns (so axis=1).

Hagrid67 · Accepted Answer · 2016-12-06 19:29:15Z

4

Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:

mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails

(sorry can't comment, insufficient rep)

edited Dec 6, 2016 at 19:29

answered Dec 6, 2016 at 18:34

Hagrid67

3542 silver badges10 bronze badges

Comments

zmag · Accepted Answer · 2019-12-04 04:25:02Z

4

If you are looking for a way to select all rows that are outside a condition you can use np.invert() given that the condition returns an array of booleans.

df.loc[np.invert(({condition 1}) & (condition 2))]

edited Dec 4, 2019 at 4:25

zmag

8,27112 gold badges39 silver badges47 bronze badges

answered Dec 4, 2019 at 4:03

Hector Garcia L

412 bronze badges

Comments

Alexander Martins · Accepted Answer · 2022-03-16 12:29:43Z

4

You could use difference() to obtain the difference between your original indices and those that you want to exclude:

df.loc[df.index.difference(blacklist), :]

It has the advantage of being easier to read.

answered Mar 16, 2022 at 12:29

Alexander Martins

4032 silver badges14 bronze badges

Comments

rachwa · Accepted Answer · 2022-06-21 19:45:52Z

2

Inside query you can access your variable blacklist using @:

df.query('index != @blacklist')

# Or alternatively:
df.query('index not in @blacklist')

edited Jun 21, 2022 at 19:45

answered May 7, 2022 at 13:32

rachwa

2,3901 gold badge21 silver badges20 bronze badges

Comments

Dyno Fu · Accepted Answer · 2015-03-19 00:05:10Z

1

import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label actually, loc and iloc both take a boolean array, in this case the mask. from now on you can reuse this mask and should be more efficient.

edited Mar 19, 2015 at 0:05

answered Mar 18, 2015 at 23:41

Dyno Fu

9,0744 gold badges47 silver badges73 bronze badges

Comments

Dharman · Accepted Answer · 2021-09-27 12:41:24Z

0

You can use the np.setdiff1d function which finds the set difference of two arrays.

index = np.array(blacklist)
not_index = np.setdiff1d(df.index.to_numpy(), index)
df.iloc[not_index]

edited Sep 27, 2021 at 12:41

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Sep 27, 2021 at 12:36

hamnghi

992 silver badges7 bronze badges

Collectives™ on Stack Overflow

Slice Pandas dataframe by index values that are (not) in a list

9 Answers 9

Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related