Vectorise a function taking advantage of concurrency


For a simple neural network I want to apply a function to all the values of a gonum VecDense.

Gonum has an Apply method for Dense matrices, but not for vectors, so I am doing this by hand:

func sigmoid(z float64) float64 { return 1.0 / (1.0 + math.Exp(-z)) } func vSigmoid(zs *mat.VecDense) { for i := 0; i < zs.Len(); i++ { zs.SetVec(i, sigmoid(zs.AtVec(i))) } }

This seems to be an obvious target for concurrent execution, so I tried

var wg sync.WaitGroup func sigmoid(z float64) float64 { wg.Done() return 1.0 / (1.0 + math.Exp(-z)) } func vSigmoid(zs *mat.VecDense) { for i := 0; i < zs.Len(); i++ { wg.Add(1) go zs.SetVec(i, sigmoid(zs.AtVec(i))) } wg.Wait() }

This doesn't work, perhaps not unexpectedly, as Sigmoid() doesn't end with wg.Done(), as the return statement (which does all the work) comes after it.

My question is: How can I use concurrency to apply a function to each element of a gonum vector?


First note that this attempt to do computation concurrenty assumes that the SetVec() and AtVec() methods are safe for concurrent use with distinct indices. If this is not the case, the attempted solution is inherently unsafe and may result in data races and undefined behavior.

<hr />

wg.Done() should be called to signal that the "worker" goroutine finished its work. But <em>only</em> when the goroutine finished its work.

In your case it is not (only) the sigmoid() function that is run in the worker goroutine, but rather zs.SetVec(). So you should call wg.Done() when zs.SetVec() has returned, not sooner.

One way would be to add a wg.Done() to the end of the SetVec() method (it could also be a defer wg.Done() at its beginning), but it wouldn't be feasible to introduce this dependency (SetVec() should not know about any wait groups and goroutines, this would seriously limit its usability).

The easiest and cleanest way in this case would be to launch an anonymous function (a function literal) as the worker goroutine, in which you may call zs.SetVec(), and in which you may call wg.Defer() once the above mentioned function has returned.

Something like this:

for i := 0; i < zs.Len(); i++ { wg.Add(1) go func() { zs.SetVec(i, sigmoid(zs.AtVec(i))) wg.Done() }() } wg.Wait()

But this alone <strong>won't</strong> work, as the function literal (closure) refers to the loop variable which is modified concurrently, so the function literal should work with its own copy, e.g.:

for i := 0; i < zs.Len(); i++ { wg.Add(1) go func(i int) { zs.SetVec(i, sigmoid(zs.AtVec(i))) wg.Done() }(i) } wg.Wait()

Also note that goroutines (although may be lightweight) do have overhead. If the work they do is "small", the overhead may outweight the performance gain of utilizing multiple cores / threads, and overall you might not gain performance by executing such small tasks concurrently (hell, you may even do worse than without using goroutines). Measure.

Also you are using goroutines to do minimal work, you may improve performance by not "throwing" away goroutines once they're done with their "tiny" work, but you may "reuse" them. See related question: <a href="https://stackoverflow.com/questions/38170852/is-this-an-idiomatic-worker-thread-pool-in-go/38172204#38172204" rel="nofollow">Is this an idiomatic worker thread pool in Go?</a>


  • How to find elements by one class, but exclude other using JQuery
  • Place ajax $get() into a javascript variable
  • How to use PNGJS library to create png from rgb matrix?
  • jQuery change date from format to another one
  • Vectorise a function taking advantage of concurrency
  • scikit-neuralnetwork mismatch error in dataset size
  • Keras: Accuracy stays zero
  • Entity Framework IsRowVersion() without concurrency check
  • Retrieving the “canonical value” from a Set where T has a custom equals()
  • Throttle WebRequests
  • Why does my GradientDescentOptimizer produce NaN?
  • How to register custom UDF jar in HiveThriftServer2?
  • windows could not start the service on local computer error 1067 on windows server 2008 SP1
  • Using Sphinx4.0 to build a speech recog. Android application
  • Incomplete data: Delta source ended unexpectedly while git svn fetch
  • Debian Jessie - Apache2 / PHP 5.6, can't upload more than 128kb
  • Escape exclamation mark in batch file
  • Rx produce and consume on different threads
  • Paramiko SSHException Channel Closed
  • Parallel sieve of Eratosthenes - Java Multithreading
  • XOR with Neural Networks (Matlab)
  • .NET video play library which allows to change the playback rate?
  • How to 'create temp table as select' in Slick?
  • Multiple producers single consumer locking schema
  • Python pickle not one-to-one: different pickles give same object
  • Code in Job's Script Block after Start-Process Does not Execute
  • MySQL Order by column = x, column asc?
  • C: Incompatible pointer type initializing
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • Ajax Loaded meta Tags
  • Xamarin Forms - UWP Fonts
  • Arrow is showed instead of the material design version hamburger icon. Why doesn't syncState in
  • Weird JavaScript statement, what does it mean?
  • R: gsub and capture
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Arrays break string types in Julia
  • Comma separated Values
  • Django query for large number of relationships
  • How to load view controller without button in storyboard?