How to read and combine .csv files with similar names from a folder using pandas

Question

I have file names as below in a folder C/Downloads -

Mango001-003.csv
Mango004-006.csv
Mango007-100.csv
Applefruit.csv
Banana001-003.csv
Banana004-006.csv

How to import the fruits files separately and then join same fruit files together into a single file?

What's expected is one output for Mango, one for Apple & one for Banana

import os
import re
data_files = os.listdir(r'C:\Downloads')
def load_files(filenames):
    # Pre-compile regex for code readability
    regex = re.compile(r'Mango.*?.csv')
    
    # Map filenames to match objects, filter out not matching names
    matches = [m for m in map(regex.match, filenames) if m is not None]
    
    li = []
    for match in matches:
                
        df = pd.read_csv(match, index_col=None, header=0, dtype=object)
        li.append(df)
        
    #Concatenating the data
    frame = pd.concat(li, axis=0, ignore_index=True)
    return (frame)
    
df  = load_files(data_files)
print(df.shape)
df.head(2)

I am getting errors. In addition, it cannot be so complex, I must be doing something wrong.

Vidya Ganesh · Accepted Answer · 2021-08-04 20:04:23Z

1

I think the easiest way to do this is to use glob.glob to get a list of all files that start with a particular fruit name (here I used mango) and concatenate them all together using pd.concat.

data_files = r"path\to\folder\containing\csv"
df_mango= pd.DataFrame()
df_mango= pd.concat(map(pd.read_csv,glob.glob(os.path.join(data_files,'mango*.csv'))), ignore_index= True)
df_mango.to_csv('mango.csv')

Here is the example I tried:

mango0110.csv
   A  B  C
0  1  2  3
mango01220.csv
   A  B  C
0  4  5  6
To get:
   A  B  C
0  1  2  3
1  4  5  6

edited Aug 4, 2021 at 20:04

answered Aug 4, 2021 at 19:44

Vidya Ganesh

8281 gold badge13 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Dr.Chuck Over a year ago

It does the job but I am unable to insert a separator df = pd.read_csv(filename, sep=",")

Vidya Ganesh Over a year ago

Why do you need the separator ? I'm assuming all the file names start with the name of a fruit and glob.glob(os.path.join(data_files,'mango*.csv') gets the files that start with mango after which they are all concatenated at once

Vidya Ganesh Over a year ago

Apparently the * accounts for anything that follows mango. As shown in my example like mango0110.csv etc

Dr.Chuck Over a year ago

Assume values in my file is separated by "~!" and I want to import the datasets

Vidya Ganesh Over a year ago

Wow thats interesting. I dint really consider anything other than comma separated values as of now (because I found only csv in your example) . Let me go try this out for such cases :)

|

MDR · Accepted Answer · 2021-08-04 19:42:50Z

Perhaps not the greatest way to do it but, for the file names given...

Try:

import pandas as pd
import glob
import re

path = r'./files' # use your path
all_files = glob.glob(path + "/*.csv")

fruits = []

# for all files in the folder get the fruit name
# this could be where things go wrong if the regex does not
# account for all filename types.  Pattern may need tweaking
# example https://regex101.com/r/E69LWa/1
for file in all_files:
    cleanFile = file.replace('fruit', '')
    match = re.match(r'^.*/([A-Za-z]+)',cleanFile)
    fruits.append(match.group(1))

# There will be one output for Mango, one for Apple & one for Banana hence three...
dfs_man = []
dfs_ban = []
dfs_app = []

# for all files create a df and append to the correct list holding other dfs of the same fruit
for i, file in enumerate(all_files):
    df = pd.read_csv(file)
    if fruits[i] == 'Mango':
        dfs_man.append(df)
    elif fruits[i] == 'Banana':
        dfs_ban.append(df)
    elif fruits[i] == 'Apple':
        dfs_app.append(df)

# concatenate if more than one df in list, else just get the df out of list
if len(dfs_man) > 1:
    df_mango = pd.concat(dfs_man, ignore_index=True)
elif len(dfs_man) == 1:
    df_mango = dfs_man[0]
if len(dfs_ban) > 1:
    df_banana = pd.concat(dfs_ban, ignore_index=True)
elif len(dfs_ban) == 1:
    df_banana = dfs_ban[0]
if len(dfs_app) > 1:
    df_apple = pd.concat(dfs_app, ignore_index=True)
elif len(dfs_app) == 1:
    df_apple = dfs_app[0]
    
print(df_mango.shape, df_banana.shape, df_apple.shape)

Dr.Chuck · Accepted Answer · 2021-08-04 20:29:44Z

0

Thank you @Vidya Ganesh

data_files = r'C:\Downloads'
list_file_names = ['Mango','Apple','Banana']
for i in list_file_names:
    name = i
    df = pd.DataFrame()
    df= pd.concat(map(pd.read_csv,glob.glob(os.path.join(data_files,str(name)+'*.csv'))), ignore_index= True)
    df = df.loc[:1000,:]
    print (name)
    print (df.shape)
    df.to_csv(str(name)+".csv")

answered Aug 4, 2021 at 20:29

Dr.Chuck

2233 silver badges13 bronze badges

Collectives™ on Stack Overflow

How to read and combine .csv files with similar names from a folder using pandas

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related