30714

Merge Rows by ID and Date

Question:

I am newbie at R and I have been searching on how to solve the following problem.

I have a df that looks like:

id------------Date ------------OB1------ OB2----- OB3<br /> 1 ------- 2017-01-01 --------- 1 --------- 0--------- 0<br /> 2 ------- 2006-01-05 --------- 1 --------- 0--------- 0<br /> 2 ------- 2007-04-19 --------- 0 --------- 1--------- 0<br /> 3 ------- <strong>2015-02-23</strong> --------- 0 --------- 0--------- 1<br /> 3 ------- <strong>2015-02-23</strong> --------- 1 --------- 0--------- 0

What I have to achieve is shown here:

id------------Date ------------OB1------ OB2----- OB3<br /> 1 ------- 2017-01-01 --------- 1 --------- 0--------- 0<br /> 2 ------- 2006-01-05 --------- 1 --------- 0--------- 0<br /> 2 ------- 2007-04-19 --------- 0 --------- 1--------- 0<br /> 3 ------- <strong>2015-02-23</strong> --------- <strong>1</strong> --------- 0--------- <strong>1</strong>

This is, to combine rows, by id and date.

If there is value '1' for OB3 in a date and value '1' for OB1 in the same date (for the same ID) the result must be value '1' for OB1, value '1' for 'OB3' and a single date

I have been trying to apply some solutions explained here: <a href="https://stackoverflow.com/questions/31324894/merge-rows-having-same-values-in-multiple-columns" rel="nofollow">Merge rows having same values in multiple columns </a>

But it didn't work

EDIT: OB1, OB2, OBS3 are boolean values Thanks for your help!

EDIT 2: aggregate(. ~ ID + Date, df, any) works!

<hr />

Sample data

<strong>Input Data</strong>

structure(list(ID = c(-1L, 1L, 1L), Date = c("2008-01-15", "2011-01-21", "2011-01-21"), `OBS1` = c(0, 0, 0), `OBS2` = c(0, 0, 0), `OBS3` = c(0, 0, 0), `OBS4` = c(0, 0, 0), `OBS5` = c(0, 0, 0), `OBS6` = c(0, 1, 0)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, 3L), class = "data.frame")

<strong>Output Data</strong>

structure(list(ID = c(-1L, 1L), Date = c("2008-01-15", "2011-01-21"), `OBS1` = c(FALSE, FALSE), `OBS2` = c(FALSE, FALSE), `OBS3` = c(FALSE, FALSE), `OBS4` = c(FALSE, FALSE), `OBS5` = c(FALSE, FALSE), `OBS6` = c(FALSE, TRUE)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, -2L), class = "data.frame")

Answer1:

The question already has been answered using base R's aggregate() function.

However, I felt challenged to turn the sample dataset as printed in the question into a reproducible example (<em>before</em> the OP edited the question to include the results of dput()).

In addition, the OP has mentioned he has a <em>"very large df"</em> which might be worthwhile to try a data.table approach.

<h3>Convert sample data into a dataframe</h3> <pre class="lang-r prettyprint-override">library(magrittr) library(data.table) df <- readr::read_file( "id------------Date ------------OB1------ OB2----- OB3 1 ------- 2017-01-01 --------- 1 --------- 0--------- 0 2 ------- 2006-01-05 --------- 1 --------- 0--------- 0 2 ------- 2007-04-19 --------- 0 --------- 1--------- 0 3 ------- 2015-02-23 --------- 0 --------- 0--------- 1 3 ------- 2015-02-23 --------- 1 --------- 0--------- 0" ) %>% stringr::str_replace_all("[-]{2,}", " ") %>% fread() df <blockquote> <pre class="lang-r prettyprint-override"> id Date OB1 OB2 OB3 1: 1 2017-01-01 TRUE FALSE FALSE 2: 2 2006-01-05 TRUE FALSE FALSE 3: 2 2007-04-19 FALSE TRUE FALSE 4: 3 2015-02-23 FALSE FALSE TRUE 5: 3 2015-02-23 TRUE FALSE FALSE </blockquote>

Note that fread() has recognised automatically the boolean columns.

<h3>Aggregate</h3> <pre class="lang-r prettyprint-override">library(data.table) setDT(df)[, lapply(.SD, any), by = .(id, Date)] <blockquote> <pre class="lang-r prettyprint-override"> id Date OB1 OB2 OB3 1: 1 2017-01-01 TRUE FALSE FALSE 2: 2 2006-01-05 TRUE FALSE FALSE 3: 2 2007-04-19 FALSE TRUE FALSE 4: 3 2015-02-23 TRUE FALSE TRUE </blockquote>

In case, the OP expects integer values 0 and 1 instead of logical values, these can be created in one go:

<pre class="lang-r prettyprint-override">setDT(df)[, lapply(.SD, function(x) as.integer(any(x))), by = .(id, Date)] <blockquote> <pre class="lang-r prettyprint-override"> id Date OB1 OB2 OB3 1: 1 2017-01-01 1 0 0 2: 2 2006-01-05 1 0 0 3: 2 2007-04-19 0 1 0 4: 3 2015-02-23 1 0 1 </blockquote>

Recommend

  • Iterating over array of Ruby Hashes and counting the no. of times value has appeared?
  • mask for one digit and two digit
  • Shiny: Unable to set column width in Shiny DataTables
  • CoreData: How to refresh “calculated” attributes?
  • Data structure for when key and value are equally “important”
  • visual studio 2012, postbuild event, bat file not creating new file (not executing)
  • Set theory data structure
  • Sending Content-Type application/x-www-form-urlencoded WSO2 ESB
  • Counting problem C#
  • get path to groovy source file at runtime
  • git add error : “fatal : malloc, out of memory”
  • How to create CGPath from a SKSpriteNode in SWIFT
  • Switch to popup in python using selenium
  • Redirect to Post Method/Action
  • Javascript, Regex - I need to grab each section of a string contained in brackets
  • Reading a file into a multidimensional array
  • Is it possible to open regedit and navigate to straight to a specific key using process.start?
  • Adding elements to a huge XML file
  • Redux Form - Not able to type anything in input
  • Android application: how to use the camera and grab the image bytes?
  • Breaking out column by groups in Pandas
  • Unable to get column index with table.getColumn method using custom table Model
  • Blackberry - Custom EditField Cursor
  • Android full screen on only one activity?
  • How to clear text inside text field when radio button is select
  • Django: Count of Group Elements
  • Scrapy recursive link crawler
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Fetching methods from BroadcastReceiver to update UI
  • How to recover from a Spring Social ExpiredAuthorizationException
  • ILMerge & Keep Assembly Name
  • Symfony2: How to get request parameter
  • Large data - storage and query
  • WOWZA + RTMP + HTML5 Playback?
  • GridView Sorting works once only
  • WPF Applying a trigger on binding failure
  • Turn off referential integrity in Derby? is it possible?
  • Add sale price programmatically to product variations
  • Unable to use reactive element in my shiny app
  • How do I use LINQ to get all the Items that have a particular SubItem?