22551

Pandas, concatenate certain columns if other columns are empty

I've got a CSV file that is supposed to look like this:

ID, years_active, issues ------------------------------- 'Truck1', 8, 'In dire need of a paintjob' 'Car 5', 3, 'To small for large groups'

However, the CSV is somewhat malformed and currently looks like this.

ID, years_active, issues ------------------------ 'Truck1', 8, 'In dire need' '','', 'of a' '','', 'paintjob' 'Car 5', 3, 'To small for' '', '', 'large groups'

Now, I am able to identify faulty rows by the lack of an 'ID' and 'years_active' value and would like to append the value of 'issues of that row to the last preceding row that had 'ID' and 'years_active' values.

I am not very experienced with pandas, but came up with the following code:

for index, row in df.iterrows(): if row['years_active'] == None: df.loc[index-1]['issues'] += row['issues']

Yet - the IF condition fails to trigger. Is the thing I am trying to do possible? And if so, does anyone have an idea what I am doing wrong?

Answer1:

Given your sample input:

df = pd.DataFrame({ 'ID': ['Truck1', '', '', 'Car 5', ''], 'years_active': [8, '', '', 3, ''], 'issues': ['In dire need', 'of a', 'paintjob', 'To small for', 'large groups'] })

You can use:

new_df = df.groupby(df.ID.replace('', method='ffill')).agg({'years_active': 'first', 'issues': ' '.join})

Which'll give you:

years_active issues ID Car 5 3 To small for large groups Truck1 8 In dire need of a paintjob

So what we're doing here is forward filling the non-blank IDs into subsequent blank IDs and using those to group the related rows. We then aggregate to take the first occurrence of the years_active and join together the issues columns in the order they appear to create a single result.

Answer2:

Following uses a for loop to find and add strings (dataframe from JonClements' answer):

df = pd.DataFrame({ 'ID': ['Truck1', '', '', 'Car 5', ''], 'years_active': [8, '', '', 3, ''], 'issues': ['In dire need', 'of a', 'paintjob', 'To small for', 'large groups'] }) ss = ""; ii = 0; ilist = [0] for i in range(len(df.index)): if i>0 and df.ID[i] != "": df.issues[ii] = ss ss = df.issues[i] ii = i ilist.append(ii) else: ss += ' '+df.issues[i] df.issues[ii] = ss df = df.iloc[ilist] print(df)

Output:

ID issues years_active 0 Truck1 In dire need of a paintjob 8 3 Car 5 To small for large groups 3

Answer3:

It might be worth mentioning in the context of this question that there is an often overlooked way of processing awkward input by using the StringIO library.

The essential point is that read_csv can read from a StringIO 'file'.

In this case, I arrange to discard single quotes and multiple commas that would confuse read_csv, and I append the second and subsequent lines of input to the first line, to form complete, conventional csv lines form read_csv.

Here is what read_csv receives.

ID years_active issues 0 Truck1 8 In dire need of a paintjob 1 Car 5 3 To small for large groups

The code is ugly but easy to follow.

import pandas as pd from io import StringIO for_pd = StringIO() with open('jasper.txt') as jasper: print (jasper.readline(), file=for_pd) line = jasper.readline() complete_record = '' for line in jasper: line = ''.join(line.rstrip().replace(', ', ',').replace("'", '')) if line.startswith(','): complete_record += line.replace(',,', ',').replace(',', ' ') else: if complete_record: print (complete_record, file=for_pd) complete_record = line if complete_record: print (complete_record, file=for_pd) for_pd.seek(0) df = pd.read_csv(for_pd) print (df)

Recommend

  • Error in gapfilling by row in Pandas
  • filling NAN and converting to int pandas
  • Pandas: groupby forward fill with datetime index
  • How to sort Pandas DataFrame both by MultiIndex and by value?
  • Multiple aggregation in group by in Pandas Dataframe
  • Change axis for pandas replace ffill
  • pandas count over multiple columns
  • groupby in pandas with different functions for different columns
  • How to concat Pandas dataframe columns
  • Changing media screen makes div overlay
  • How to create two column output from a single column
  • Symfony2 - Doctrine - no changeset in post update
  • Display validation errors inside
  • Rest Services conventions
  • Group list of tuples by item
  • How to calculate the camera position from Vuforia GL matrix?
  • Default parameter as generic type
  • SAXReader not re-ecape characters
  • Android application: how to use the camera and grab the image bytes?
  • custom UITableViewCell with image for highlighting
  • Meteor: Do Something On Email Verification Confirmation
  • Checking free space on FTP server
  • How to clear text inside text field when radio button is select
  • Scrapy recursive link crawler
  • Optimizing database types to compact database (SQLite)
  • How to check if every primary key value is being referenced as foreign key in another table
  • Where to put my custom functions in Wordpress?
  • How to handle AllServersUnavailable Exception
  • Trying to switch camera back to front but getting exception
  • How to get next/previous record number?
  • R: gsub and capture
  • RestKit - RKRequestDelegate does not exist
  • AT Commands to Send SMS not working in Windows 8.1
  • Python: how to group similar lists together in a list of lists?
  • Buffer size for converting unsigned long to string
  • WPF Applying a trigger on binding failure
  • Free memory of cv::Mat loaded using FileStorage API
  • Angular 2 constructor injection vs direct access
  • Programmatically clearing map cache
  • reshape alternating columns in less time and using less memory