How to add entries in Pandas DataFrame?

Question

Basically I have census data of US that I have read in Pandas from a csv file. Now I have to write a function that finds counties in a specific manner (not gonna explain that because that's not what the question is about) from the table I have gotten from csv file and return those counties.

MY TRY:

What I did is that I created lists with the names of the columns (that the function has to return), then applied the specific condition in the for loop using if-statement to read the entries of all required columns in their respective list. Now I created a new DataFrame and I want to read the entries from lists into this new DataFrame. I tried the same for loop to accomplish it, but all in vain, tried to make Series out of those lists and tried passing them as a parameter in the DataFrame, still all in vain, made DataFrames out of those lists and tried using append() function to concatenate them, but still all in vain. Any help would be appreciated.

CODE:

#idxl = list()
#st = list()
#cty = list()
idx2 = 0
cty_reg = pd.DataFrame(columns = ('STNAME', 'CTYNAME'))
for idx in range(census_df['CTYNAME'].count()):
    if((census_df.iloc[idx]['REGION'] == 1 or census_df.iloc[idx]['REGION'] == 2) and (census_df.iloc[idx]['POPESTIMATE2015'] > census_df.iloc[idx]['POPESTIMATE2014']) and census_df.loc[idx]['CTYNAME'].startswith('Washington')):
    #idxl.append(census_df.index[idx])
    #st.append(census_df.iloc[idx]['STNAME'])
    #cty.append(census_df.iloc[idx]['CTYNAME'])
    cty_reg.index[idx2] = census_df.index[idx]
    cty_reg.iloc[idxl2]['STNAME'] = census_df.iloc[idx]['STNAME']
    cty_reg.iloc[idxl2]['CTYNAME'] = census_df.iloc[idx]['CTYNAME']
    idx2 = idx2 + 1
cty_reg

CENSUS TABLE PIC:

SAMPLE TABLE:

   REGION  STNAME        CTYNAME
0       2  "Wisconsin"   "Washington County"
1       2  "Alabama"     "Washington County"
2       1  "Texas"       "Atauga County"
3       0  "California"  "Washington County"

SAMPLE OUTPUT:

  STNAME      CTYNAME
0 Wisconsin  Washington County
1 Alabama    Washington County

I am sorry for the less-knowledge about the US-states and counties, I just randomly put the state names and counties in the sample table, just to show you what do I want to get out of that. Thanks for the help in advanced.

Can you give us a small sample dataframe for both your input and (desired) output? That will make it easier for people to answer your question. — ASGM
– ASGM, Commented Oct 23, 2018 at 0:42
Okay, great. I've improved the formatting so people can paste it into their code. Now, can you be clear about the logic you're using for picking these rows? Don't you need the population columns in your input? — ASGM
– ASGM, Commented Oct 23, 2018 at 1:16
That's right, I only need STNAME and CTYNAME columns in my output and the logic you explained in your answer is correct, that's what I was applying to get the desired DataFrame. Thanks for your answer as it has helped me a lot. I basically knew that Pandas makes these conditionals very easy, but at the time of writing code my mind was not focused on the easiness of Pandas. Thanks again. Can you read my comment on your answer and reply me there? — Khubaib Khawar
– Khubaib Khawar, Commented Oct 23, 2018 at 9:21
I need the population columns in my input, but in sample I just did not write because it would be a long table then. By the way I have already solved the question, thanks. — Khubaib Khawar
– Khubaib Khawar, Commented Oct 23, 2018 at 10:45

edesz · Accepted Answer · 2018-10-30 23:02:32Z

1

There are some missing columns in the source DF posted in the OP. However, reading the loop I don't think the loop is required at all. There are 3 filters required - for REGION, POPESTIMATE2015 and CTYNAME. If I have understood the logic in the OP, then this should be feasible without the loop

Option 1 - original answer

print df.loc[
            (df.REGION.isin([1,2])) & \
            (df.POPESTIMATE2015 > df.POPESTIMATE2014) & \
            (df.CTYNAME.str.startswith('Washington')), \
                          ['REGION', 'STNAME', 'CTYNAME']]

Option 2 - using and with pd.eval

q = pd.eval("(df.REGION.isin([1,2])) and \
            (df.POPESTIMATE2015 > df.POPESTIMATE2014) and \
            (df.CTYNAME.str.startswith('Washington'))", \
            engine='python')
print df.loc[q, ['REGION', 'STNAME', 'CTYNAME']]

Option 3 - using and with df.query

regions_list = [1,2]
dfq = df.query("(REGION==@regions_list) and \
              (POPESTIMATE2015 > POPESTIMATE2014) and \
              (CTYNAME.str.startswith('Washington'))", \
              engine='python')
print dfq[['REGION', 'STNAME', 'CTYNAME']]

edited Oct 30, 2018 at 23:02

answered Oct 23, 2018 at 1:25

edesz

12.5k24 gold badges87 silver badges130 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Khubaib Khawar Over a year ago

thanks a lot this really helped me to chose the only required Columns from the final Table. Can you tell me why are you using the 'fore-slash' ``\` after the 'AND' & symbol? Also can I use the words and and or instead of & and | symbols?

edesz Over a year ago

The \ was to extend the command across multiple lines. You should read about Boolean indexing - 1, 2, 3.

edesz Over a year ago

@KhubaibKhawar, I missed your question about using and and or. The 3 links I posted for Boolean indexing in Pandas are certainly valid, but if you strictly want to use and and or then you have to use .eval or .query. I have edited my answer to show these possible uses in the OP. Related 1 and 2.

ASGM · Accepted Answer · 2018-10-23 01:24:13Z

1

If I'm reading the logic in your code right, you want to select rows according to the following conditions:

REGION should be 1 or 2
POPESTIMATE2015 > POPESTIMATE2014
CTYNAME needs to start with "Washington"

In general, Pandas makes it easy to select rows based on conditions without having to iterate over the dataframe:

df = census_df[
        ((df.REGION == 1) | (df.REGION == 2)) & \
        (df.POPESTIMATE2015 > POPESTIMATE2014) & \
        (df.CTYNAME.str.startswith('Washington'))
    ]

answered Oct 23, 2018 at 1:24

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

1 Comment

Khubaib Khawar Over a year ago

Thanks for the answer, it helped me to understand that Pandas makes it easier. By the way can you tell me why are you using the 'fore-slash' ``\` after the 'AND' & symbol? Also can I use the words and and or instead of & and | symbols?

Rocky Li · Accepted Answer · 2018-10-23 01:27:03Z

1

Assuming you're selecting some kind of rows that satisfy a criteria, let's just say that select(row) and this function returns True if selected or False if not. I'll not infer what it is because you specifically said it was not important

And then you wanted the STNAME and CTYNAME of that row.

So here's what you would do:

your_new_df = census_df[census_df.apply(select, axis=1)]\
.apply(lambda x: x[['STNAME', 'CTYNAME']], axis=1)

This is the one liner that will get you what you wanted provided you wrote the select function that will pick the rows.

answered Oct 23, 2018 at 1:27

Rocky Li

5,9862 gold badges21 silver badges36 bronze badges

Collectives™ on Stack Overflow

How to add entries in Pandas DataFrame?

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related