
Question:
I am trying to learn to efficiently implement various neural nets in python and am currently trying to implement this model
<img alt="" class="b-lazy" data-src="https://i.stack.imgur.com/WIaIy.png" data-original="https://i.stack.imgur.com/WIaIy.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" />.
However, I am having trouble using numpy operations to implement the summation.
I have been following <a href="https://github.com/shashankg7/multimodal-neural-language-models/blob/master/mnlm_code/lm/lbl.py#L119L140" rel="nofollow">this existing implementation</a> and am trying to simplify it, but it's not entirely clear to me what all of the array operations being performed are achieving. My interpretation is that the C's are multiplied through each of the columns of R and summed. However, my einsum implementation np.einsum('ijk,km->ij', C, R)
doesn't seem to produce the required result.
I would appreciate some pointers towards simplifying this implementation. My current attempts have been to use np.einsum
but that hasn't gotten me anywhere so far.
Code to simplify (explained in image/first link):
batchsize = X.shape[0]
R = self.R
C = self.C
bw = self.bw
# Obtain word features
tmp = R.as_numpy_array()[:,X.flatten()].flatten(order='F')
tmp = tmp.reshape((batchsize, self.K * self.context))
words = np.zeros((batchsize, self.K, self.context))
for i in range(batchsize):
words[i,:,:] = tmp[i,:].reshape((self.K, self.context), order='F')
words = gpu.garray(words)
# Compute the hidden layer (predicted next word representation)
acts = gpu.zeros((batchsize, self.K))
for i in range(self.context):
acts = acts + gpu.dot(words[:,:,i], C[i,:,:])
Answer1:Creating a small words
:
In [565]: words = np.zeros((2,3,4))
In [566]: tmp = np.arange(2*3*4).reshape((2,3*4))
In [567]: for i in range(2):
...: words[i,:,:] = tmp[i,:].reshape((3,4),order='F')
...:
In [568]: tmp
Out[568]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
In [569]: words
Out[569]:
array([[[ 0., 3., 6., 9.],
[ 1., 4., 7., 10.],
[ 2., 5., 8., 11.]],
[[ 12., 15., 18., 21.],
[ 13., 16., 19., 22.],
[ 14., 17., 20., 23.]]])
I'm pretty sure this can be done without the loop
In [577]: C = np.ones((4,3,3))
In [578]: acts = np.zeros((2,3))
In [579]: for i in range(4):
...: acts += np.dot(words[:,:,i], C[i,:,:])
...:
In [580]: acts
Out[580]:
array([[ 66., 66., 66.],
[ 210., 210., 210.]])
This dot
loop can be expressed in einsum
as:
In [581]: np.einsum('ijk,kjm->im', words, C)
Out[581]:
array([[ 66., 66., 66.],
[ 210., 210., 210.]])
This is summing on j
and k
. In the loop version the sum on j
was done in the dot
,and the sum on k
was done in the loop. But for very large arrays, and with gpu
speedup, the loop version might be faster. If the problem space gets too big, einsum
can be slow and even hit memory errors (though the newest version has some optimization options).
words
can be created from tmp
without a loop:
In [585]: tmp.reshape(2,3,4, order='F')
Out[585]:
array([[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]],
[[12, 15, 18, 21],
[13, 16, 19, 22],
[14, 17, 20, 23]]])