14703

Expanding a Frequency Table Where the Variable Names are the Values

Question:

I am working with a dataframe where each observation is linked to a specific ID, and I have a set of variables that define the "values" as if I had a factor variable. However, the value in the "cell" is the frequency. Here is a simplified version:

ID 1 2 3 A 2 3 2 B 1 4 1

I would like to get two vectors that expand the frequencies so that I can calculate an interpolated median for each ID. That is, I'd like something of the form:

A B 1 1 1 2 2 2 2 2 2 2 3 3 3

The psych package has a function interp.median that could then take each vector and return the interpolated median for each ID that I would like to include as a new variable in the original dataframe. I checked out the vcdExtra package which could maybe do this with its expand.dft function, but I'm not sure exactly how it would work.

Any help would be greatly appreciated!

EDIT: To refine a bit more, interp.median would work best if the final result was a data frame, with NAs padded at the end. That is, something of the form:

A B 1 1 1 2 2 2 2 2 2 2 3 3 3 NA

Answer1:

If dat is the dataset

lst <- by(dat[,-1], dat[,1], function(x) rep(seq_along(x), x)) lst #dat[, 1]: A #[1] 1 1 2 2 2 3 3 #------------------------------------------------------------ #dat[, 1]: B #[1] 1 2 2 2 2 3 indx <- max(sapply(lst,length)) dat2 <- do.call(data.frame,lapply(lst, function(x) c(x,rep(NA,indx-length(x))))) dat2 # A B #1 1 1 #2 1 2 #3 2 2 #4 2 2 #5 2 2 #6 3 3 #7 3 NA

Or

lst2 <- lapply(split(dat[,-1], dat$ID), function(x) rep(seq_along(unlist(x)), unlist(x))) do.call(data.frame,lapply(lst2, function(x) c(x,rep(NA,indx-length(x))))) <h3>data</h3> dat <- structure(list(ID = c("A", "B"), `1` = c(2L, 1L), `2` = 3:4, `3` = c(2L, 1L)), .Names = c("ID", "1", "2", "3"), class = "data.frame", row.names = c(NA, -2L))

Answer2:

Here one way:

# your data df <- data.frame(ID=c(1,2,3), A=c(2,3,2), B=c(1,4,1)) # function to repeat each ID a given number of times, # as specified in 'colname' of df rep_id <- function(colname) { unname(unlist(apply(df[, c('ID',colname)], 1, function(x) rep(x[1], x[2])))) } # apply this function to all columns (except the first, which is ID) sapply(names(df)[-1], rep_id)

Yields:

$A [1] 1 1 2 2 2 3 3 $B [1] 1 2 2 2 2 3

Answer3:

Sample data:

df <- read.table(text=" ID 1 2 3 A 2 3 2 B 1 4 1", header=TRUE, check.names=FALSE)

Use apply:

(newlist <- apply(df[2:4], 1, function(x) rep(names(x), x))) #[[1]] #[1] "1" "1" "2" "2" "2" "3" "3" # #[[2]] #[1] "1" "2" "2" "2" "2" "3" names(newlist) <- df$ID #$A #[1] "1" "1" "2" "2" "2" "3" "3" # #$B #[1] "1" "2" "2" "2" "2" "3"

This outputs characters, but you could output numbers like this:

newlist <- apply(df[2:4], 1, function(x) rep(as.numeric(names(x)), x)) names(newlist) <- df$ID

Edit:

To address OP's new request that the vectors be put in a data.frame and padded with NAs, call this after running either of the options above:

newlist <- sapply(newlist, function(x) x[1:max(sapply(newlist, length))]) # A B #[1,] 1 1 #[2,] 1 2 #[3,] 2 2 #[4,] 2 2 #[5,] 2 2 #[6,] 3 3 #[7,] 3 NA

Recommend

  • gcc, static library, external assembly function becomes undefined symbol
  • Getting ExecutionException on receiving file using asmack in Android
  • Meta Data refresh while looping through tables in SSIS
  • Python Interpreter Jython - execution of modules
  • Copying multiple tables using SSIS Package [duplicate]
  • Fourier Transform in EmguCV 3.4.1
  • Create table before the dataflow in BIML
  • Reshape array on xAxis and fill with mean value in Python?
  • Adding independent aspx/asmx pages into DotNetNuke
  • runtime-check whether an instance (Base*) override a parent function (Base::f())
  • xtable - background colour of added rows
  • Is there a package like bigmemory in R that can deal with large list objects?
  • Error in installing package: fatal error: stdlib.h: no such file or directory
  • Find group of records that match multiple values
  • How to make R's read_csv2() recognise the text characters properly
  • Center align outputs in ipython notebook
  • How to determine if there are bytes available to be read from boost:asio:serial_port
  • Custom Tabgroup Appcelerator
  • Initializer list vs. initialization method
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Projection media query: browser support and workarounds?
  • Deselecting radio buttons while keeping the View Model in synch
  • Nant, Vault & Windows Integrated Authentication
  • Different response to non-authenticated users and AJAX calls
  • Fetching methods from BroadcastReceiver to update UI
  • Symfony2: How to get request parameter
  • GridView Sorting works once only
  • R: gsub and capture
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Comma separated Values
  • WPF Applying a trigger on binding failure
  • How to set the response of a form post action to a iframe source?
  • python draw pie shapes with colour filled
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Running Map reduces the dimensions of the matrices
  • Reading document lines to the user (python)
  • Binding checkboxes to object values in AngularJs
  • Net Present Value in Excel for Grouped Recurring CF
  • jQuery Masonry / Isotope and fluid images: Momentary overlap on window resize
  • How to load view controller without button in storyboard?