4

I have 4 different csv files.

csv1:

ID      Fruit
1001    Apple
1002    Banana
1003    Kiwi

csv2:

ID      Color
1001    Green
1005    Red
1006    Orange
1007    Yellow

csv3:

ID      Size
1001    Large
1008    Small
1009    Medium
1010    Large

csv4:

ID      Price
1002    20
1009    40
1010    30
1011    50

And this is a master csv file that I want to make:

Number  ID      Fruit   Color   Size    Price

1       1001    Apple   Green   Large   
2       1002    Banana                  20
3       1003    Kiwi            
4       1005            Red     
5       1006            Orange      
6       1007            Yellow      
7       1008            Small   
8       1009            Medium          40
9       1010            Large           30
10      1011                            50

I think using pandas will be easier to make it, but I have no idea on Python.

As each csv file has different columns, how can I choose the column and paste all of them together on master csv file? If there is no information, I want to make it NULL or N/A value.

You can see the each of csv files and the master: Click here for image

I spent 6 hours already but I have no idea how to do this.

Thank you in advance.

2
  • If I use pandas is it ok? Commented Nov 13, 2019 at 19:58
  • @AndreyBerenda actually I can use anything Commented Nov 13, 2019 at 19:59

4 Answers 4

1

reduce + combine_first

The key is to set 'ID' as the index that way we get the proper alignment across both axes. I've assumed all DataFrames are in memory, but if not you can read them into a list, or do the reading in the reduce step.

from functools import reduce

my_dfs = [df.set_index('ID') for df in [df1, df2, df3, df4]]
#my_dfs = [pd.read_csv(file).set_index('ID') for file in your_list_of_files]

reduce(lambda l,r: l.combine_first(r), my_dfs)

       Color   Fruit  Price    Size
ID                                 
1001   Green   Apple    NaN   Large
1002     NaN  Banana   20.0     NaN
1003     NaN    Kiwi    NaN     NaN
1005     Red     NaN    NaN     NaN
1006  Orange     NaN    NaN     NaN
1007  Yellow     NaN    NaN     NaN
1008     NaN     NaN    NaN   Small
1009     NaN     NaN   40.0  Medium
1010     NaN     NaN   30.0   Large
1011     NaN     NaN   50.0     NaN
Sign up to request clarification or add additional context in comments.

3 Comments

Hello, I am using this solution for my problem which is the same, but I am getting the Killed error after runing reduce... How can I solve this?
@user3224522 well probably you're combining too many Frames so you're using too much memory or too much computing time. If it's a memory issue, which is the most likely, basically you need more (buy some, or use , or need to process in chunks.
Thanks a lot for your comment, I solved the problem like this (stackoverflow.com/questions/66367399/…) if you want to have a look and give me some suggestion.
0

Something like this should work:

import pandas as pd

list_of_csv_filenames = ['csv1.csv', 'csv2.csv', 'csv3.csv', 'csv4.csv']
all_dfs = []
for i in range(1, 5):
    temp = pd.read_csv(list_of_csv_filesnames[i-1])
    temp['Number'] = i
    all_dfs.append(temp)
full_df = pd.concat(all_dfs)
full_df.to_csv('output_filename.csv', index=False)

Comments

0

You can use merge:

import pandas as pd

df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')
df = df1.merge(df2).merge(df3).merge(df4)
df.to_csv('result.csv')

1 Comment

how='outer' needs to be added to each merge, which works in this case because "each csv file has different columns"
0
Concat can be used to append data to dataframe.   
            
                file_path1 = r"path1.csv"
                df1 = pd.read_csv(file_path1)
                file_path2 = r"path2.csv"
                df2 = pd.read_csv(file_path2)
                file_path3 = r"path3.csv"
                df3 = pd.read_csv(file_path3)
                frames = [df1,df2,df3]
                df = pd.concat(frames)
reset index after concat
               df.reset_index(inplace=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.