How To Select Identical rows from pandas dataframe based on certain columns

Question

I'm new to pandas and I'm having problem with row selections from dataframe.

Following is my DataFrame :

   Index    Column1 Column2 Column3 Column4
   0    1234    500 NEWYORK NY
   1    5678    700 AUSTIN  TX
   2    1234    300 NEWYORK NY
   3    8910    235 RICHMOND    FL

I want to select rows that are having same value in column1,column 3 and column4(identical rows in terms of these 3 columns). So the output dataframe will contain rows with index 0 and 2.

Can any one help me with a step-by-step procedure for this custom selection.

cs95 · Accepted Answer · 2017-11-01 13:50:44Z

3

Use df.duplicated as a mapper to index into df:

c = ['Column1', 'Column3', 'Column4']
df = df[df[c].duplicated(keep=False)]

df

   Index  Column1  Column2  Column3 Column4
0      0     1234      500  NEWYORK      NY
2      2     1234      300  NEWYORK      NY

keep=False will mark all duplicate rows for filtering.

answered Nov 1, 2017 at 13:50

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cs95 Over a year ago

Downvoter, would appreciate feedback on the downvote to improve the answer, thanks!

BENY Over a year ago

I received a lot downvote yesterday ... and no reason ...too

cs95 Over a year ago

@Wen Everyone loses with anonymous downvoting... they lose 1 rep, and I lose the opportunity to know where I made a mistake so I could improve...

BENY Over a year ago

That is what I care about it , *what am I doing wrong in the answer ... *

Avinash Clinton Over a year ago

Thanks a lot.. I was using following approach :

Avinash Clinton · Accepted Answer · 2017-11-01 14:16:55Z

0

Earler I was using following approach :

d = df.T.to_dict()   

dup=[]
for i in d.keys():
    for j in d.keys():
        if i!=j:
            if d[i]['column1']==agg_d[j]['column1'] and d[i]['column3']==d[j]['column3'] and d[i]['column3']==d[j]['column3']:
                set(dup.append(k[i]['column1'])

dup_rows = df[df.loc['column1'].isin(dup)]

answered Nov 1, 2017 at 14:16

Avinash Clinton

5432 gold badges8 silver badges21 bronze badges

Collectives™ on Stack Overflow

How To Select Identical rows from pandas dataframe based on certain columns

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related