0

A sample dataset is structured as follows

  • Home_HeatSensor_AA.CSV
  • Office_HeatSensor_BB.CSV
  • Ship_ElevationSensor_XXYY.CSV

AA.CSV has the following columns, with a sample row

   Time  AA  AB  BB  Site  Type
0  1:00   5   4   5  Home  Heat

BB.CSV is formatted similarly

   Time  AA  AB  BB    Site  Type
0  1:00   6   2   4  Office  Heat

However, XXYY.CSV has a much different format

   Time     XX       XY     YY  Site       Type
0  1:00  1.332  12.1123  4.212  Ship  Elevation

I need to join these three CSV files into a master CSV file formatted as follows

   Time AA AB AB     XX       XY     YY    Site       Type
0  1:00  5  4  4                           Home       Heat
0  1:00  6  2  2                         Office       Heat
0  1:00           1.332  12.1123  4.212    Ship  Elevation

I've tried mucking about with pandas a bit but the results have been mixed. The code below will join the data but switches but the column order of time, Site, and Unit. Ideally I'd like these two to stay static, with time in the front of the order and Site and Unit staying the last two column values

for filename in filepaths:
 df = pd.read_csv(filename, index_col=None, header=0, parse_dates=True,infer_datetime_format=True)
 li.append(df)

1 Answer 1

2

pd.concat

def read_csv(fn):
    return pd.read_csv(fn, skipinitialspace=True)

files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']
cols = ['Time', 'AA', 'AB', 'BB', 'XX', 'XY', 'YY', 'Site', 'Type']

pd.concat(map(read_csv, files), sort=False)[cols].to_csv('MASTER.CSV', index=False)

Then confirm

cat MASTER.CSV

Time,AA,AB,BB,XX,XY,YY,Site,Type
1:00,5.0,4.0,5.0,,,,Home,Heat
1:00,6.0,2.0,4.0,,,,Office,Heat
1:00,,,,1.3319999999999999,12.1123,4.212,Ship,Elevation

If you won't know the column names in advanced:

def read_csv(fn):
    return pd.read_csv(fn, skipinitialspace=True)

files = ['Home_HeatSensor_AA.CSV', 'BB.CSV', 'XXYY.CSV']

pd.concat(map(read_csv, files), sort=False).to_csv('MASTER.CSV', index=False)
Sign up to request clarification or add additional context in comments.

2 Comments

Its a good answer, but I won't know the column names in advance. Users could upload a new file with a new set of column names, and the code needs to account for that
Without the column names, pandas places the columns in some order. I used the column names to present the result in the order you specified. If you won't know the column names, leave it out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.