0

I have about 5600 directories structured as follows:

structure of dirs

I need to merge all A files into one file, all B files into another file, and so on.

How can I do this?

2
  • So you need all "root directory/dir1" csv files into a single df? Also do they have the exact same structure? Commented Jul 27, 2022 at 15:24
  • I need to merge all the A files into one file, all the B files into one file, etc.... Basically, I need to output 7 files: A, B, C, D, E, F, G, and each X file contains within it the union of the X files contained in the 5600 dirs. All the files have the same header Commented Jul 27, 2022 at 15:30

1 Answer 1

1

IIUC, this should work for your case (I used a RootDir with 2 subdirectories Dir1 and Dir2 with in each 2 files A.csv and B.csv). You can change the value of rootdir to match your usecase:

import os
import pandas as pd
rootdir = 'RootDir' # Change when needed to your root directory
files = [os.path.join(dp, f) for dp, dn, filenames in os.walk(rootdir) for f in filenames if os.path.splitext(f)[1] == '.csv']
names = set([x.rstrip('.csv').split('/')[-1] for x in files])
df_dict = {key: pd.DataFrame() for key in names}
for file in files:
    key = file.rstrip('.csv').split('/')[-1]
    df = pd.read_csv(file)
    df_dict[key] = pd.concat([df_dict[key], df])

Output is a dictionary of dataframes df_dict with A and B as keys.

Use df_dict['A'] to access DataFrame A and so on...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.