87657

Cython function with variable sized matrix input

Question:

I am trying to convert part of a native python function to cython to improve the compute time. I would like to write a cython function just for the loop component that is taking up the time (as ipython lprun kindly told me). However this function takes in variably sized matrices .. and I can't see how to bring that across easily to statically typed cython.

for index1 in range(0,num_products): for index2 in range(0,num_products): cond_prob = (data[index1] * data[index2]).sum() / max(col_sums[index1], col_sums[index2]) prox[index1][index2] = cond_prob

This issue is that num_products changes year to year, so the matrix (data) size is variable.

What is the best strategy here?

<ol><li>Should I write two C functions. One to create a matrix of a certain dimension using memalloc, and then One to do the loops over the created matrix?</li> <li>Is there some fancy cython/numpy wizardry to help in this scenario? Can I write a C function that takes in a variably sized Numpy Array in memory and pass the size?</li> </ol>

Answer1:

Cython code is (strategically) statically typed, but that doesn't mean that arrays must have a fixed size. In straight C passing a multidimensional array to a function can be a little awkward maybe, but in Cython you should be able to do something like the following:

Note I took the function and variable names from your <a href="https://stackoverflow.com/q/22853837/2379410" rel="nofollow">follow-up question.</a>

import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False) @cython.cdivision(True) def cooccurance_probability_cy(double[:,:] X): cdef int P, i, j, k P = X.shape[0] cdef double item cdef double [:] CS = np.sum(X, axis=1) cdef double [:,:] D = np.empty((P, P), dtype=np.float) for i in range(P): for j in range(P): item = 0 for k in range(P): item += X[i,k] * X[j,k] D[i,j] = item / max(CS[i], CS[j]) return D

On the other hand, using just Numpy should also be quite fast for this problem, if you use the right functions and some broadcasting. In fact, as the calculation complexity is dominated by the matrix multiplication, I found the following is much faster than the Cython code above (np.inner uses a highly optimized BLAS routine):

def new(X): CS = np.sum(X, axis=1, keepdims=True) D = np.inner(X,X) / np.maximum(CS, CS.T) return D

Answer2:

Have you tried getting rid of the for loops in numpy?

for the first part of your equation you could for example try:

(data[ np.newaxis,:] * data[:,np.newaxis]).sum(2)

if memory is an issue you can also use the np.einsum() function. For the second part one could probably also cook up a numpy expression (bit more difficult) if you've not already tried that.

Recommend

  • Mapping ManyToMany with composite Primary key and Annotation:
  • Why not Factory pattern for sorting? [closed]
  • Pointer vs Reference difference when passing Eigen objects as arguments
  • How to extract text from a PDF and decode characters?
  • Not able to display correct data in table -AngularJS
  • Do query loads all the data in memory
  • quiver not drawing arrows just lots of blue, matlab
  • R convert summary result (statistics with all dataframe columns) into dataframe
  • Approximate Order-Preserving Huffman Code
  • Invalid Date on validation Date of js
  • MySQL Order by column = x, column asc?
  • Grails calculated field in SQL
  • Change multiple background-images with jQuery
  • Display issues when we change from one jquery mobile page to another in firefox
  • Android screen density dpi vs ppi
  • Different response to non-authenticated users and AJAX calls
  • Is possible to count alias result on mysql
  • TFS: Get latest causes slow project reloading
  • output of program is not same as passed argument
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Arrow is showed instead of the material design version hamburger icon. Why doesn't syncState in
  • Updated Ionic CLI but shows previous version (Windows)
  • Change an a tag attribute in JavaScript based on screen width
  • Statically linking a C++ library to a C# process using CLI or any other way
  • Weird JavaScript statement, what does it mean?
  • How do I use the BLAS library provided by MATLAB?
  • How do you troubleshoot character encoding problems?
  • Why winpcap requires both .lib and .dll to run?
  • How to format a variable of double type
  • Data Validation Drop Down Box Arrow Disappearing
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • KeystoneJS: Relationships in Admin UI not updating
  • trying to dynamically update Highchart column chart but series undefined
  • embed rChart in Markdown
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app