1

I'm new to pandas and I'm having problem with row selections from dataframe.

Following is my DataFrame :

   Index    Column1 Column2 Column3 Column4
   0    1234    500 NEWYORK NY
   1    5678    700 AUSTIN  TX
   2    1234    300 NEWYORK NY
   3    8910    235 RICHMOND    FL

I want to select rows that are having same value in column1,column 3 and column4(identical rows in terms of these 3 columns). So the output dataframe will contain rows with index 0 and 2.

Can any one help me with a step-by-step procedure for this custom selection.

2 Answers 2

3

Use df.duplicated as a mapper to index into df:

c = ['Column1', 'Column3', 'Column4']
df = df[df[c].duplicated(keep=False)]

df

   Index  Column1  Column2  Column3 Column4
0      0     1234      500  NEWYORK      NY
2      2     1234      300  NEWYORK      NY

keep=False will mark all duplicate rows for filtering.

Sign up to request clarification or add additional context in comments.

5 Comments

Downvoter, would appreciate feedback on the downvote to improve the answer, thanks!
I received a lot downvote yesterday ... and no reason ...too
@Wen Everyone loses with anonymous downvoting... they lose 1 rep, and I lose the opportunity to know where I made a mistake so I could improve...
That is what I care about it , *what am I doing wrong in the answer ... *
Thanks a lot.. I was using following approach :
0

Earler I was using following approach :

d = df.T.to_dict()   

dup=[]
for i in d.keys():
    for j in d.keys():
        if i!=j:
            if d[i]['column1']==agg_d[j]['column1'] and d[i]['column3']==d[j]['column3'] and d[i]['column3']==d[j]['column3']:
                set(dup.append(k[i]['column1'])

dup_rows = df[df.loc['column1'].isin(dup)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.