Preserving DataFrame subclass type during pandas groupby().aggregate()

Question

I'm subclassing pandas DataFrame in a project of mine. Most pandas operations preserve the subclass type, but df.groupby().agg() does not. Is this a bug? Is there a known workaround?

import pandas as pd

class MySeries(pd.Series):
    pass

class MyDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDataFrame
    _constructor_sliced = MySeries

MySeries._constructor_expanddim = MyDataFrame

df = MyDataFrame({"a": reversed(range(10)), "b": list('aaaabbbccc')})

print(type(df.groupby("b").sum()))
# <class '__main__.MyDataFrame'>

print(type(df.groupby("b").agg({"a": "sum"})))
# <class 'pandas.core.frame.DataFrame'>

It looks like there was an issue (described here) that fixed subclassing for df.groupby, but as far as I can tell df.groupby().agg() was missed. I'm using pandas version 2.0.3.

Looping in @alkasm and @grge who worked on the linked issue. — rasputin
– rasputin, Commented Aug 19, 2024 at 19:34

rasputin · Accepted Answer · 2024-08-20 17:13:47Z

0

The workaround I'm currently using is to re-initialize the subclassed DataFrame and call the __finalize__ method, which propogates metadata to the new object.

MyDataFrame(my_df.groupby("b").agg({"a": "sum"})).__finalize__(other=my_df)

OP case

First, I've added a custom attribute to MyDataFrame:

import pandas as pd

class MySeries(pd.Series):
    _metadata = ['my_attr']

class MyDataFrame(pd.DataFrame):
    _metadata = ['my_attr']
    
    def __init__(
            self, 
            data, 
            my_attr=None, 
            index=None, 
            columns=None, 
            dtype=None, 
            copy=None
        ):
        self.my_attr = my_attr
        super().__init__(data, index, columns, dtype, copy)
    
    @property
    def _constructor(self):
        return MyDataFrame
    _constructor_sliced = MySeries

MySeries._constructor_expanddim = MyDataFrame

Now we can check that subclass type and custom attributes are preserved:

my_df = MyDataFrame(
    {"a": reversed(range(10)), "b": list('aaaabbbccc')},
    my_attr='foo'
)
assert isinstance(my_df, MyDataFrame)
# Success!
assert isinstance(my_df.sample(3), MyDataFrame)
# Success!
assert isinstance(my_df.copy(), MyDataFrame)
# Success!

new_df = my_df.groupby("b").sum()
assert isinstance(new_df, MyDataFrame)
# Success! - fixed by issue linked in question

new_df = my_df.groupby("b").agg({"a": "sum"})
assert isinstance(new_df, MyDataFrame)
# AssertionError
assert new_df.my_attr == 'foo'
# AttributeError

new_df = my_df.groupby("b").agg({"a": "sum"})
new_df = MyDataFrame(new_df).__finalize__(other=my_df)
assert isinstance(new_df, MyDataFrame)
# Success!
assert new_df.my_attr == 'foo'
# Success!

answered Aug 20, 2024 at 17:13

rasputin

3952 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rasputin Over a year ago

Creating another object seems far from ideal, as I want end users to be able to use groupby().agg() on my DataFrame subclass without running into bugs. I'm open to accepting other answers that do a better job of patching groupby().agg() on the back end.

rasputin · Accepted Answer · 2024-09-03 14:41:00Z

It turns out that groupby().agg() combines Series to build a DataFrame, so the subclassed Series constructor needs to be properly defined. See this documentation.

The following code runs with no errors:

import pandas as pd

class MySeries(pd.Series):
    @property
    def _constructor(self):
        return MySeries

    @property
    def _constructor_expanddim(self):
        return MyDataFrame

class MyDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDataFrame

    @property
    def _constructor_sliced(self):
        return MySeries


df = MyDataFrame({"a": reversed(range(10)), "b": list('aaaabbbccc')})

assert isinstance(df.groupby("b").agg({"a": "sum"}), MyDataFrame)

Collectives™ on Stack Overflow

Preserving DataFrame subclass type during pandas groupby().aggregate()

2 Answers 2

OP case

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

OP case

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related