I'm using numpy einsum to calculate the dot products of an array of column vectors pts, of shape (3,N), with itself, resulting on a matrix dotps, of shape (N,N), with all the dot products. This is the code I use:
dotps = np.einsum('ij,ik->jk', pts, pts)
This works, but I only need the values above the main diagonal. ie. the upper triangular part of the result without the diagonal. Is it possible to compute only these values with einsum? or in any other way that is faster than using einsum to compute the whole matrix?
My pts array can be quite large so if I could calculate only the values I need that would double my computation speed.Answer1:
You can slice relevant columns and then use <a href="http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.einsum.html" rel="nofollow">
R,C = np.triu_indices(N,1) out = np.einsum('ij,ij->j',pts[:,R],pts[:,C])
Sample run -
In : N = 5 ...: pts = np.random.rand(3,N) ...: dotps = np.einsum('ij,ik->jk', pts, pts) ...: In : dotps Out: array([[ 0.26529103, 0.30626052, 0.18373867, 0.13602931, 0.51162729], [ 0.30626052, 0.56132272, 0.5938057 , 0.28750708, 0.9876753 ], [ 0.18373867, 0.5938057 , 0.84699103, 0.35788749, 1.04483158], [ 0.13602931, 0.28750708, 0.35788749, 0.18274288, 0.4612556 ], [ 0.51162729, 0.9876753 , 1.04483158, 0.4612556 , 1.82723949]]) In : R,C = np.triu_indices(N,1) ...: out = np.einsum('ij,ij->j',pts[:,R],pts[:,C]) ...: In : out Out: array([ 0.30626052, 0.18373867, 0.13602931, 0.51162729, 0.5938057 , 0.28750708, 0.9876753 , 0.35788749, 1.04483158, 0.4612556 ])<hr />
Optimizing further -
Let's time our approach and see if there's any scope for improvement performance-wise.
In : N = 5000 In : pts = np.random.rand(3,N) In : %timeit np.triu_indices(N,1) 1 loops, best of 3: 413 ms per loop In : R,C = np.triu_indices(N,1) In : %timeit np.einsum('ij,ij->j',pts[:,R],pts[:,C]) 1 loops, best of 3: 1.47 s per loop
Staying within the memory constraints, it doesn't look like we can do much about optimizing
np.einsum. So, let's shift the focus to
N = 4, we have :
In : N = 4 In : np.triu_indices(N,1) Out: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
It seems to be creating a regular pattern, sort of like a shifting one though. This could be written with a cumulative sum that has shifts at those
5 positions. Thinking generically, we would end up coding it something like this -
def triu_indices_cumsum(N): # Length of R and C index arrays L = (N*(N-1))/2 # Positions along the R and C arrays that indicate # shifting to the next row of the full array shifts_idx = np.arange(2,N)[::-1].cumsum() # Initialize "shift" arrays for finally leading to R and C shifts1_arr = np.zeros(L,dtype=int) shifts2_arr = np.ones(L,dtype=int) # At shift positions along the shifts array set appropriate values, # such that when cumulative summed would lead to desired R and C arrays. shifts1_arr[shifts_idx] = 1 shifts2_arr[shifts_idx] = -np.arange(N-2)[::-1] # Finall cumsum to give R, C R_arr = shifts1_arr.cumsum() C_arr = shifts2_arr.cumsum() return R_arr, C_arr
Let's time it for various
In : N = 100 In : %timeit np.triu_indices(N,1) 10000 loops, best of 3: 122 µs per loop In : %timeit triu_indices_cumsum(N) 10000 loops, best of 3: 61.7 µs per loop In : N = 1000 In : %timeit np.triu_indices(N,1) 100 loops, best of 3: 17 ms per loop In : %timeit triu_indices_cumsum(N) 100 loops, best of 3: 16.3 ms per loop
Thus, it looks like for decent
N's, the customized cumsum based
triu_indices might be worth a look!