57411

How can I calculate an inner product with an arbitrary number of columns using ddply?

Question:

I want to perform an inner product of the first D columns for each row in a data frame with a given array, W. I am trying the following:

W = (1,2,3); ddply(df, .(id), transform, inner_product=c(col1, col2, col3) %*% W);

This works but I typically may have an arbitrary number of columns. Can I generalize the above expression to handle that case?

Update:

This is an updated example as asked for in the comments:

libary(kernlab); data(spam); W = array(); W[1:3] = seq(1,3); spamdf = head(spam); spamdf$id = seq(1,nrow(spamdf)); df_out=ddply(spamdf, .(id), transform, inner_product=c(make, address, all) %*% W); > W [1] 1 2 3 > spamdf[1,] make address all num3d our over remove internet order mail receive will 1 0 0.64 0.64 0 0.32 0 0 0 0 0 0 0.64 people report addresses free business email you credit your font num000 1 0 0 0 0.32 0 1.29 1.93 0 0.96 0 0 money hp hpl george num650 lab labs telnet num857 data num415 num85 1 0 0 0 0 0 0 0 0 0 0 0 0 technology num1999 parts pm direct cs meeting original project re edu table 1 0 0 0 0 0 0 0 0 0 0 0 0 conference charSemicolon charRoundbracket charSquarebracket charExclamation 1 0 0 0 0 0.778 charDollar charHash capitalAve capitalLong capitalTotal type id 1 0 0 3.756 61 278 spam 1 > df_out[1,] make address all num3d our over remove internet order mail receive will 1 0 0.64 0.64 0 0.32 0 0 0 0 0 0 0.64 people report addresses free business email you credit your font num000 1 0 0 0 0.32 0 1.29 1.93 0 0.96 0 0 money hp hpl george num650 lab labs telnet num857 data num415 num85 1 0 0 0 0 0 0 0 0 0 0 0 0 technology num1999 parts pm direct cs meeting original project re edu table 1 0 0 0 0 0 0 0 0 0 0 0 0 conference charSemicolon charRoundbracket charSquarebracket charExclamation 1 0 0 0 0 0.778 charDollar charHash capitalAve capitalLong capitalTotal type id inner_product 1 0 0 3.756 61 278 spam 1 3.2

The above example performs a inner product of the first three dimensions with an array W=(1,2,3) of the spam data set available in <strong>kernlab</strong> package. Here I have explicity specified the first three dimensions as c(make, address, all). Thus df_out[1,"inner_product"] = 3.2.

Instead I want to perform the inner product over all the dimensions without having to list all the dimensions. The conversion to a matrix and back to a data frame seems to be an expensive operation?

Answer1:

A strategy along the lines of the following should work:

<ul><li>Convert each chunk to a matrix</li> <li>Perform a matrix multiplication</li> <li>Convert results to data.frame</li> </ul>

The code:

set.seed(1) df <- data.frame( id=sample(1:5, 20, replace=TRUE), col1 = runif(20), col2 = runif(20), col3 = runif(20), col4 = runif(20) ) W <- c(1,2,3,4) ddply(df, .(id), function(x)as.data.frame(as.matrix(x[, -1]) %*% W))

The results:

id V1 1 1 4.924994 2 1 5.076043 3 2 7.053864 4 2 5.237132 5 2 6.307620 6 2 3.413056 7 2 5.182214 8 2 7.623164 9 3 5.194714 10 3 6.733229 11 4 4.122548 12 4 3.569013 13 4 4.978939 14 4 5.513444 15 4 5.840900 16 4 6.526522 17 5 3.530220 18 5 3.549646 19 5 4.340173 20 5 3.955517

Answer2:

If you want to append a column of cross-products, you could do this (assuming W had the right number of elements to match the non-"id" columns:

df2 <- cbind(df, as.matrix(df[, -grep("id", names(df))]) %*% W )

It does not appear that the .(id) serves any useful purpose, since you are not do a sum of crossproducts within id, and if you were then you wouldn't be using transform but some other aggregating function.

Recommend

  • How can I calculate an inner product with an arbitrary number of columns using ddply?
  • Best way to find max and min of two values
  • Return values for Successive Years where criteria is met
  • Screen reader is not reading the price (“$47.49”) properly
  • VB.NET: Which As clause to use with anonymous type with Option Strict On?
  • Subsetting a matrix using a vector of indices
  • Google Test macros seem not to work with Lambda functions
  • HP-UFT WPF TextBlock object capture
  • How to get bluetooth mac address from local pc?
  • Magento get list of item SKUs from orders
  • csrf-token POST 405 (Method Not Allowed) Laravel
  • How to use Swing Timer ActionListener
  • Unit testing keypresses and terminal output in Python?
  • Rebol - HTTP response headers
  • Visual Basic Vending Machine
  • Choosing Correct papersize when printing with .NET PrintDocument
  • Creating a table in “MSWord” document out of a text in a .txt file
  • Displaying inference tree node values with “print”
  • What is the best SIMPLE replacement for VS Setup-project Installer for WinXP + WPF + .NET 4.0? [clos
  • Url for HttpWebRequest truncated by special characters
  • OneToOne bidirectional mapping foreign key auto fill
  • PXAction seemingly does nothing
  • Running sp_executesql query expects parameter @statement
  • Peer to peer/adaptive payments with paypal [closed]
  • View Paypal shopping cart contents on my site
  • Using SWIG with a build system [closed]
  • RESTful compliant design
  • Replace Fragment with another on back button
  • How do I access an unhandled exception in an MVC Error view?
  • Email verification using google app script and google forms
  • Xamarin Forms - UWP Fonts
  • Email format validation in mvc3 view
  • How to apply VCL Styles to DLL-based forms in Inno Setup?
  • sending/ receiving email in Java
  • Change an a tag attribute in JavaScript based on screen width
  • Unanticipated behavior