31507

sparseMatrix with numerical and categorical data

Question:

I am trying to create a sparse matrix with numerical and categorical data which will be used as an input to cv.glmnet. When only numerical data is involved, I can create a sparseMatrix using the following syntax

sparseMatrix(i=c(1,3,5,2), j=c(1,1,1,2), x=c(1,2,4,3), dims=c(5,2))

For categorical variables, the following approach seems to work:

sparse.model.matrix(~-1+automobile, data.frame(automobile=c("sedan","suv","minivan","truck","sedan")))

My VERY sparse instance has 1,000,000 observations and 10,000 variables. I do not have enough memory to first create the full matrix. The only way I can think of creating a sparseMatrix is to manually handle the categorical variables by creating the columns and converting the data in (i,j,x) format. I am hoping that somebody can suggest a better approach.

Answer1:

This may or may not work, but you could try creating the model matrices for each variable separately and then cBinding them together.

do.call(cBind, sapply(names(df), function(x) sparse.model.matrix(~., df[x])[, -1, drop=FALSE]))

Note that you probably want to create the intercept column and then remove it, rather than specifying -1 in the formula as you've done above. The latter will remove one level for your first factor, but keep all the levels for the others, so it depends on the ordering of the variables.

Answer2:

Sparse matrices have the same capacity as dense matrices for assignment to positions using a two -column matrix as a single argument to "[":

require(Matrix) M <- Matrix(0, 10, 10) dfrm <- data.frame(rows=sample(1:10,5), cols=sample(1:10,5), vals=rnorm(5)) dfrm #--------- rows cols vals 1 3 9 -0.1419332 2 4 3 1.4806194 3 6 7 -0.5653500 4 5 1 -1.0127539 5 1 2 -0.5047298 #-------- M[ with( dfrm, cbind(rows,cols) ) ] <- dfrm$vals M #--------------- M 10 x 10 sparse Matrix of class "dgCMatrix" [1,] . -0.5047298 . . . . . . . . [2,] . . . . . . . . . . [3,] . . . . . . . . -0.1419332 . [4,] . . 1.480619 . . . . . . . [5,] -1.012754 . . . . . . . . . [6,] . . . . . . -0.56535 . . . [7,] . . . . . . . . . . [8,] . . . . . . . . . . [9,] . . . . . . . . . . [10,] . . . . . . . . . .

Recommend

  • Combine pairs to groups [PHP / Arrays]
  • LINQ doesn't end with certain characters
  • HTTP 401 Fog::Storage::Rackspace::ServiceError
  • GetSaveFileName() not returning path of CD burning staging area on XP
  • Return an average of last or first two rows from a different group (indicated by a variable)
  • Finding Cook's Distance within subjects in R
  • Compiling with OpenMP results in a memory leak
  • Enable Case Sensitive when using DataTable.select
  • Subsetting DataFrame in R by duplicate values for Year by lowest value for Rating
  • Calculating the occurrences of numbers in the subsets of a data.frame
  • Filter on CALayer except for a shape which is an union of (non necessarily distinct) rectangles
  • Alamofire and Reachability.swift not working on xCode8-beta5
  • Primefaces :radioButton inside a ui:repeat
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • How do I display a dialog that asks the user multi-choice questıon using tkInter?
  • Xstream to map “choice” elements of XML
  • Implementing “partial void” in VB
  • how does System.Web.HttpRequest::PathInfo work?
  • Convert Type Decimal to Hex (string) in .NET 3.5
  • Breaking out column by groups in Pandas
  • What's the purpose of QString?
  • Unable to get column index with table.getColumn method using custom table Model
  • Test if a set exists before trying to drop it
  • Chrome doesn't support silverlight anymore? How to solve this?
  • Django rest serializer Breaks when data exists
  • Exception “firebase.functions() takes … no argument …” when specifying a region for a Cloud Function
  • Highlight one bar in a series in highcharts?
  • How to rebase a series of branches?
  • Initializer list vs. initialization method
  • Control modification in presentation layer
  • Sails.js/waterline: Executing waterline queries in toJSON function of a model?
  • Azure Cloud Service Web Role web pages do not load
  • Fetching methods from BroadcastReceiver to update UI
  • what is the difference between the asp.net mvc application and asp.net web application
  • Font Awesome Showing Box instead of Icons
  • Properly structure and highlight a GtkPopoverMenu using PyGObject
  • Calling of Constructors in a Java
  • PHP: When would you need the self:: keyword?
  • Is it possible to post an object from jquery to bottle.py?
  • Python/Django TangoWithDjango Models and Databases