Reading multiple .csv files from different directories into pandas DataFrame

Question

My DataFrame has a index SubjectID, and each Subject ID has its own directory. In each Subject directory is a .csv file with info that I want to put into my DataFrame. Using my SubjectID index, I want to read in the header of the .csv file for every subject and put it into a new column in my DataFrame.

Each subject directory has the same pathway except for the individual subject number.

I have found ways to read multiple .csv files from a single target directory into a pandas DataFrame, but not from multiple directories. Here is some code I have for importing multiple .csv files from a target directory:

subject_path = ('/home/mydirectory/SubjectID/')
filelist = []
os.chdir('subject_path')
for files in glob.glob( "*.csv" ) :
    filelist.append(files)

# read each csv file into single dataframe and add a filename reference column 
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
    key = "file%i" % c
    frame = pd.read_csv( (subject_path + f), skiprows = 1, index_col=0, names=columns )
    frame['key'] = key
    df = df.append(frame,ignore_index=True)

I want to do something similar but iteratively go into the different Subject directories instead of having a single target directory.

Edit: I think I want to do this using os not pandas, is there a way to use a loop to search through multiple directories using os?

The above code is what I have tried for importing .csv from a single directory, and the problem is that I am not sure how to adapt this to import files from multiple directories. — MScar
– MScar, Commented Oct 3, 2016 at 18:53
Would I want to do this using os? It doesn't look like this can be accomplished in pandas — MScar
– MScar, Commented Oct 3, 2016 at 19:16
Yes, basically just repeat lines 3-5 for every subject directory. (Although you should probably store in filelist the full path rather than just filename.) — Dan Mašek
– Dan Mašek, Commented Oct 3, 2016 at 19:19

Parfait · Accepted Answer · 2016-10-03 19:51:43Z

Consider the recursive method of os.walk() to read all directories and files top-down (default=TRUE) or bottom-up. Additionally, you can use regex to check names to filter specifically for .csv files.

Below will import ALL csv files in any child/grandchild folder from the target root /home/mydirectory. So, be sure to check if non-subject csv files exist, else adjust re.match() accordingly:

import os, re
import pandas as pd

# CURRENT DIRECTORY (PLACE SCRIPT IN /home/mydirectory)
cd = os.path.dirname(os.path.abspath(__file__))

i = 0
columns = range(1,100)
dfList = []

for root, dirs, files in os.walk(cd):
    for fname in files:
        if re.match("^.*.csv$", fname):
            frame = pd.read_csv(os.path.join(root, fname), skiprows = 1, 
                                index_col=0, names=columns)
            frame['key'] = "file{}".format(i)
            dfList.append(frame)    
            i += 1

df = pd.concat(dfList)

Scratch'N'Purr · Accepted Answer · 2016-10-03 19:30:56Z

0

Assuming your subject folders are in mydirectory, you can just create a list of all folders in the directory and then add the csv's into your filelist.

import os

parent_dir = '/home/mydirectory'
subject_dirs = [os.path.join(parent_dir, dir) for dir in os.listdir(parent_dir) if os.path.isdir(os.path.join(parent_dir, dir))]

filelist = []
for dir in subject_dirs:
    csv_files = [os.path.join(dir, csv) for csv in os.listdir(dir) if os.path.isfile(os.path.join(dir, csv)) and csv.endswith('.csv')]
    for file in csv_files:
        filelist.append(file)

# Do what you did with the dataframe from here
...

answered Oct 3, 2016 at 19:30

Scratch'N'Purr

10.5k2 gold badges39 silver badges54 bronze badges

Collectives™ on Stack Overflow

Reading multiple .csv files from different directories into pandas DataFrame

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related