0

I am trying to concatenate multiple csv files into one file(about 30 files). All csv files are located in different folders.

However, I have encountered an error while appending all files together: OSError: Initializing from file failed

Here is my code:

import pandas
import glob
 
path = 'xxx'
target_folders=['Apples', 'Oranges', 'Bananas','Raspberry','Strawberry', 'Blackberry','Gooseberry','Liche']
output ='yyy'
path_list = []
for idx in target_folders:
    lst_of_files = glob.glob(path + idx +'\\*.csv')
    latest_files = max(lst_of_files, key=os.path.getmtime)
    path_list.append(latest_files)
    df_list = [] 
    for file in path_list: 
        df = pd.read_csv(file) 
        df_list.append(df) 
    final_df = df.append(df for df in df_list) 
    combined_csv = pd.concat([pd.read_csv(f) for f in latest_files])

    combined_csv.to_csv(output + "combined_csv.csv", index=False)

    OSError                                   Traceback (most recent call last)
    <ipython-input-126-677d09511b64> in <module>
  1 df_list = []
  2 for file in latest_files:
  ----> 3     df = pd.read_csv(file)
  4     df_list.append(df)
  5 final_df = df.append(df for df in df_list)

    OSError: Initializing from file failed


    
1
  • Whoa. 1. What OS are you running? 2. What is the path being passed to read_csv? 3. What version of Python are you running? 4. What version of Pandas are you using? Commented Dec 16, 2021 at 22:10

3 Answers 3

1

This solution should work as a charm to you:

import pandas as pd
import pathlib

data_dir = '/Users/thomasbryan/projetos/blocklist/files/'
out_dir = '.'

list_files = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    list_files.append(filename)

df = pd.concat(map(pd.read_csv, list_files), ignore_index=True)
df.to_csv(pathlib.Path(out_dir) / 'combined_csv.csv', index=False)
Sign up to request clarification or add additional context in comments.

Comments

0

Try to simplify your code:

import pandas as pd
import pathlib

data_dir = 'xxx'
out_dir = 'yyy'

data = []
for filename in pathlib.Path(data_dir).glob('**/*.csv'):
    df = pd.read_csv(filename)
    data.append(df)

df = pd.concat(df, ignore_index=True)
df.to_csv(pathlib.Path('out_dir') / 'combined_csv.csv', index=False)

Comments

0

Without seeing your CSV file it's hard to be sure, but I've come across this problem before with unusually formatted CSVs. The CSV parser may be having difficulty in determine the structure of the CSV files, separators etc.

Try df = pd.read_csv(file, engine = 'python')

From the docs: "The C engine is faster while the python engine is currently more feature-complete."

Try passing the engine = 'python' argument on reading a single CSV file and see if you get a successful read. That way you can narrow down the problem to either file reads or traversing the files.

2 Comments

the files has just 2 columns, first column named ['Fruit'], second column ['Harvest'], the headers off all files in different folder is the same, but they all in different locations
Hi Camilla, I've edited my answer - try the above. It's difficult to narrow down the problem without seeing your files but from your error, it's most likely due to CSV parsing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.