86595

Sparse (dgCMatrix) matrix row-normalization in R

Question:

I have a large sparse matrix, call it P:

> str(P) Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:7868093] 4221 6098 8780 10313 11102 14243 20570 22145 24468 24977 ... ..@ p : int [1:7357] 0 0 269 388 692 2434 3662 4179 4205 4256 ... ..@ Dim : int [1:2] 1303967 7356 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:7868093] 1 1 1 1 1 1 1 1 1 1 ... ..@ factors : list()

I'd like to row-normalize (say, with the L-2 norm)... (taking advantage of vector-recycling) the straight-forward approach would be something like:

> row_normalized_P <- P / rowSums(P^2)

But this causes a memory allocation error, since it appears the rowSums result is being recycled into a <em>dense</em> matrix with dimensions equal to dim(P). Given that P is known to be sparse (or at the very least is stored in sparse format), does anyone know of a non-iterative approach to achieve the desired row_normalized_P shown above? (I.e. the resultant matrix will be equally sparse as P itself... and I'd like to avoid ever having a dense matrix allocated during the normalization steps.)

The only semi-efficient method I've found around this is to apply across rows (more accurately through blocks of rows coerced into dense sub-matrices) of P, but I'd like to try to remove the looping logic from my codebase if I can, and I'm wondering if perhaps there's a built-in in the Matrix package (that I'm just not aware of) that helps with this particular type of computation.

Cheers and thanks for any help!

-murat

Answer1:

I figured out a nice solution (as usual, about 15 minutes after posting :-/ )...

> row_normalized_P <- Matrix::Diagonal(x = 1 / sqrt(Matrix::rowSums(P^2))) %*% P

Recommend

  • Android: free up bitmap memory resources programmatically
  • Evaluating output from systrace on Android
  • CheckBox checked state in a ListView
  • How to remove rows where all columns are zero using dplyr pipe
  • R: Delete rows based on different values following a certain pattern
  • How can we do some calculations using last row within a group in data.table in R?
  • UIPageControl + circular (Infinite) scrolling
  • How to suppress printing of 0 lines of a table?
  • How to export Rcpp Class method with default arguments
  • jQuery on select show div
  • How to mark last row in results of DataTable using R
  • GetStringAsync method in HttpClient throw an exception in WP8
  • IIS 7.5: Initial request to website never gets loaded
  • iCarousel not scrolling smoothly
  • Getting text from inside editText that is contained in a Recyclerview
  • Algorithm that Generates Unique Serial Number for Each English Word
  • Localstorage clearing after app store update?
  • Extracting data from a string where the data structure is embedded in the string itself
  • d3.js - Tree Layout - How can I flip it?
  • How to normalize a database schema
  • Forcing a context switch from the userland on Linux?
  • double precision error when converting to scientific notation
  • Singular Value Decomposition Implementation
  • Hatch area using pcolormesh in Basemap
  • Enterprise Architect - Cancel Connectors Bridges
  • Concise regex extract function in XSLT 2.0
  • How can I stop my python script when another python script is running?
  • Does the MySQL IN clause execute the subquery multiple times?
  • Why the SequenceFile is truncated?
  • Ember.js model to be organised as a tree structure
  • Create DicomImage from scratch using Dcmtk
  • Atlas images wrong size on iPad iOS 9
  • How to do unit test for HttpContext.Current.Server.MapPath
  • When should I choose bucket sort over other sorting algorithms?
  • Timeout for blocking function call, i.e., how to stop waiting for user input after X seconds?
  • How does Linux kernel interrupt the application?
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal