1

My DataFrame has a index SubjectID, and each Subject ID has its own directory. In each Subject directory is a .csv file with info that I want to put into my DataFrame. Using my SubjectID index, I want to read in the header of the .csv file for every subject and put it into a new column in my DataFrame.

Each subject directory has the same pathway except for the individual subject number.

I have found ways to read multiple .csv files from a single target directory into a pandas DataFrame, but not from multiple directories. Here is some code I have for importing multiple .csv files from a target directory:

subject_path = ('/home/mydirectory/SubjectID/')
filelist = []
os.chdir('subject_path')
for files in glob.glob( "*.csv" ) :
    filelist.append(files)

# read each csv file into single dataframe and add a filename reference column 
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
    key = "file%i" % c
    frame = pd.read_csv( (subject_path + f), skiprows = 1, index_col=0, names=columns )
    frame['key'] = key
    df = df.append(frame,ignore_index=True)

I want to do something similar but iteratively go into the different Subject directories instead of having a single target directory.

Edit: I think I want to do this using os not pandas, is there a way to use a loop to search through multiple directories using os?

4
  • The above code is what I have tried for importing .csv from a single directory, and the problem is that I am not sure how to adapt this to import files from multiple directories. Commented Oct 3, 2016 at 18:53
  • Maybe use a loop and search more than one subject path? Commented Oct 3, 2016 at 19:03
  • Would I want to do this using os? It doesn't look like this can be accomplished in pandas Commented Oct 3, 2016 at 19:16
  • Yes, basically just repeat lines 3-5 for every subject directory. (Although you should probably store in filelist the full path rather than just filename.) Commented Oct 3, 2016 at 19:19

2 Answers 2

3

Consider the recursive method of os.walk() to read all directories and files top-down (default=TRUE) or bottom-up. Additionally, you can use regex to check names to filter specifically for .csv files.

Below will import ALL csv files in any child/grandchild folder from the target root /home/mydirectory. So, be sure to check if non-subject csv files exist, else adjust re.match() accordingly:

import os, re
import pandas as pd

# CURRENT DIRECTORY (PLACE SCRIPT IN /home/mydirectory)
cd = os.path.dirname(os.path.abspath(__file__))

i = 0
columns = range(1,100)
dfList = []

for root, dirs, files in os.walk(cd):
    for fname in files:
        if re.match("^.*.csv$", fname):
            frame = pd.read_csv(os.path.join(root, fname), skiprows = 1, 
                                index_col=0, names=columns)
            frame['key'] = "file{}".format(i)
            dfList.append(frame)    
            i += 1

df = pd.concat(dfList)
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming your subject folders are in mydirectory, you can just create a list of all folders in the directory and then add the csv's into your filelist.

import os

parent_dir = '/home/mydirectory'
subject_dirs = [os.path.join(parent_dir, dir) for dir in os.listdir(parent_dir) if os.path.isdir(os.path.join(parent_dir, dir))]

filelist = []
for dir in subject_dirs:
    csv_files = [os.path.join(dir, csv) for csv in os.listdir(dir) if os.path.isfile(os.path.join(dir, csv)) and csv.endswith('.csv')]
    for file in csv_files:
        filelist.append(file)

# Do what you did with the dataframe from here
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.