27

If I have a pandas dataframe with a multi level index, how can I filter by one of the levels of that index. For example:

df = pd.DataFrame({"id": [1,2,1,2], "time": [1, 1, 2, 2], "val": [1,2,3,4]})
df.set_index(keys=["id", "time"], inplace=True)

I would like to do something like:

df[df["time"] > 1]

but time is no longer a column. I could keep it as a column but I don't want to drag around copies of data.

2
  • Are you taking about something along the lines of iterating through a pandas dataframe? Commented May 23, 2018 at 19:53
  • pandas.pydata.org/pandas-docs/stable/… Commented May 23, 2018 at 20:01

2 Answers 2

30
In [17]: df[df.index.get_level_values('time') > 1]
Out[17]:
         val
id time
1  2       3
2  2       4

@piRSquared's solution is more idiomatic though...

Sign up to request clarification or add additional context in comments.

2 Comments

interesting, this is along the lines of what i would have guessed the answer should look like. will keep in mind that query is standard practice
^ exactly what I thought. The idiomatic approach gets complicated for a beginner to that method (but who knows the usual way of filtering a df), especially when you'd wanna compare with .isin() or something similar. I imagined it wouldn't work out of the box since there aren't enough illustrations on pandas.pydata.org/pandas-docs/stable/reference/api/… but then I found stackoverflow.com/a/33991869/1332401 (so df[df.A.isin(list_ids)] vs df.query('A in @list_ids))
21

query

df.query('time > 1')

         val
id time     
1  2       3
2  2       4

IndexSlice

DataFrame index must be lexsorted

df.sort_index().loc[pd.IndexSlice[:, 2:], :]

         val
id time     
1  2       3
2  2       4

7 Comments

really? you have to pass strings around instead of having it in code?
This is one approach. Happens to also be a fast one at scale. stackoverflow.com/a/46165056/2336654
@Alex Note; level = 'time', df.query("@level > 1") works with variables.
@piRSquared: alright, i'm sold on the query thing. looks like the the best way to go for me. do people in pandas world generally just build up strings dynamically in code and pass those around? ironically that kind of thing tends to be discouraged in R
@coldspeed: good tip, was just looking at that in the docs, thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.