How do I select pandas rows based on duplicate column values?

Question

If I have a pandas df which looks like this:

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|84      |   68      |    B     |                
|75      |   84      |    C     |
|75      |   84      |    A     |
+--------+-----------+----------+

I want the output to be

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|75      |  84       |    C     |

i.e. wherever the values of Col1 and Col 2 are the same but Col 3 is different. I have tried

df[df.duplicated(['ID'], keep=False)]

But this does not identify duplicates based on only 2 column similarity.

jezrael · Accepted Answer · 2020-04-06 12:10:39Z

1

First get all duplicates by Col1 and Col2 and then remove duplicates per all columns by DataFrame.drop_duplicates:

df = df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
print (df)
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

edited Apr 6, 2020 at 12:10

answered Apr 6, 2020 at 12:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2020-04-06 12:25:08Z

1

In [288]: df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
Out[288]: 
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

edited Apr 6, 2020 at 12:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

answered Apr 6, 2020 at 12:12

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Collectives™ on Stack Overflow

How do I select pandas rows based on duplicate column values?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related