Processing upper triangular elements only with NumPy einsum


I'm using numpy einsum to calculate the dot products of an array of column vectors pts, of shape (3,N), with itself, resulting on a matrix dotps, of shape (N,N), with all the dot products. This is the code I use:

dotps = np.einsum('ij,ik->jk', pts, pts)

This works, but I only need the values above the main diagonal. ie. the upper triangular part of the result without the diagonal. Is it possible to compute only these values with einsum? or in any other way that is faster than using einsum to compute the whole matrix?

My pts array can be quite large so if I could calculate only the values I need that would double my computation speed.


You can slice relevant columns and then use <a href="http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.einsum.html" rel="nofollow">np.einsum</a> -

R,C = np.triu_indices(N,1) out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])

Sample run -

In [109]: N = 5 ...: pts = np.random.rand(3,N) ...: dotps = np.einsum('ij,ik->jk', pts, pts) ...: In [110]: dotps Out[110]: array([[ 0.26529103, 0.30626052, 0.18373867, 0.13602931, 0.51162729], [ 0.30626052, 0.56132272, 0.5938057 , 0.28750708, 0.9876753 ], [ 0.18373867, 0.5938057 , 0.84699103, 0.35788749, 1.04483158], [ 0.13602931, 0.28750708, 0.35788749, 0.18274288, 0.4612556 ], [ 0.51162729, 0.9876753 , 1.04483158, 0.4612556 , 1.82723949]]) In [111]: R,C = np.triu_indices(N,1) ...: out = np.einsum('ij,ij->j',pts[:,R],pts[:,C]) ...: In [112]: out Out[112]: array([ 0.30626052, 0.18373867, 0.13602931, 0.51162729, 0.5938057 , 0.28750708, 0.9876753 , 0.35788749, 1.04483158, 0.4612556 ]) <hr />

Optimizing further -

Let's time our approach and see if there's any scope for improvement performance-wise.

In [126]: N = 5000 In [127]: pts = np.random.rand(3,N) In [128]: %timeit np.triu_indices(N,1) 1 loops, best of 3: 413 ms per loop In [129]: R,C = np.triu_indices(N,1) In [130]: %timeit np.einsum('ij,ij->j',pts[:,R],pts[:,C]) 1 loops, best of 3: 1.47 s per loop

Staying within the memory constraints, it doesn't look like we can do much about optimizing np.einsum. So, let's shift the focus to np.triu_indices.

For N = 4, we have :

In [131]: N = 4 In [132]: np.triu_indices(N,1) Out[132]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))

It seems to be creating a regular pattern, sort of like a shifting one though. This could be written with a cumulative sum that has shifts at those 3 and 5 positions. Thinking generically, we would end up coding it something like this -

def triu_indices_cumsum(N): # Length of R and C index arrays L = (N*(N-1))/2 # Positions along the R and C arrays that indicate # shifting to the next row of the full array shifts_idx = np.arange(2,N)[::-1].cumsum() # Initialize "shift" arrays for finally leading to R and C shifts1_arr = np.zeros(L,dtype=int) shifts2_arr = np.ones(L,dtype=int) # At shift positions along the shifts array set appropriate values, # such that when cumulative summed would lead to desired R and C arrays. shifts1_arr[shifts_idx] = 1 shifts2_arr[shifts_idx] = -np.arange(N-2)[::-1] # Finall cumsum to give R, C R_arr = shifts1_arr.cumsum() C_arr = shifts2_arr.cumsum() return R_arr, C_arr

Let's time it for various N's!

In [133]: N = 100 In [134]: %timeit np.triu_indices(N,1) 10000 loops, best of 3: 122 µs per loop In [135]: %timeit triu_indices_cumsum(N) 10000 loops, best of 3: 61.7 µs per loop In [136]: N = 1000 In [137]: %timeit np.triu_indices(N,1) 100 loops, best of 3: 17 ms per loop In [138]: %timeit triu_indices_cumsum(N) 100 loops, best of 3: 16.3 ms per loop

Thus, it looks like for decent N's, the customized cumsum based triu_indices might be worth a look!


  • encode repeated letters in a string with number
  • ARM + gcc: don't use one big .rodata section
  • Fossil add for external directory
  • Javascript strange generator yield sub function behavior
  • Android. CalendarView…show only one month calendar at a time
  • C# Linq finding duplicate row
  • What is the most pythonic way to conditionally return a function
  • Why the interface IOrderedEnumerable isn't covariant in T?
  • Modifying look/behavior of the new Popup control (ChildWindow) in Silverlight 3
  • How to get rid of zeros in the section numbering in LATEX report document class?
  • How to manually build Expression which will return always true?
  • How to programmatically set custom attributes of custom components?
  • How can I implement a JavaScript that submit this form according to the clicket button? [duplicate]
  • Removing duplicate rows by adding column value
  • Image path not convert to url in swift3
  • Different number of posts on first page in Wordpress loop
  • comparing and replacing the elements of a nested list
  • multiple versions of Mongo
  • How to show live chat dialog box on click of button? [closed]
  • Pandas better way to get rows that has all columns null except one
  • compareTo(K) in java.lang.Comparable cannot be applied to (java.lang.Comparable)
  • Talend: How to import this csv file in SQL?
  • ZPK Encryption ISO format 9594-1 Format 0
  • Filter divs/table rows based on checkbox criteria -Javascript/JQuery
  • How to write a CSV-parser using XSLT 1.0?
  • What is the benefit of using the super global `$_SERVER['PHP_SELF']` in PHP?
  • How to get column name of particular value in sql server 2008
  • How do I redirect an user back to the page they were trying to access once they log in? (Django)
  • how to change button text after succes in ajax
  • getting the values of checkboxes in a checkboxlist control
  • Check 'Manager can update membership list' in AD
  • Using loops in Jasmine (with injected service)
  • Bundling python(“.py”)files along with java class files for a web application
  • Multiple table join MySQL multiple foreign keys
  • What are advantages/disadvantages of using Selenium for Java vs .NET applications?
  • Add font awesome icon to custom add to cart button in Woocommerce 3
  • how to read to huge file into buffer
  • How to handle div that is created dynamically in a table
  • Google App Engine backend servlet not responding
  • Cross compile glibc for arm, got undefined reference to some unwind functions