0

If I have a pandas df which looks like this:

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|84      |   68      |    B     |                
|75      |   84      |    C     |
|75      |   84      |    A     |
+--------+-----------+----------+

I want the output to be

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|75      |  84       |    C     |  

i.e. wherever the values of Col1 and Col 2 are the same but Col 3 is different. I have tried

df[df.duplicated(['ID'], keep=False)]

But this does not identify duplicates based on only 2 column similarity.

2 Answers 2

1

First get all duplicates by Col1 and Col2 and then remove duplicates per all columns by DataFrame.drop_duplicates:

df = df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
print (df)
   Col1  Col2 Col3
0    75    84    A
2    75    84    C
Sign up to request clarification or add additional context in comments.

Comments

1
In [288]: df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
Out[288]: 
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.