68451

Sieve of Eratosthenes Speed Comparison: Python vs Julia

Question:

So I have a little sieve of Eratosthenes function written in both Python and Julia, and I'm comparing run times.

Here is the Python code:

import time def get_primes(n): numbers = set(range(n, 1, -1)) primes = [] while numbers: p = numbers.pop() primes.append(p) numbers.difference_update(set(range(p*2, n+1, p))) return primes start = time.time() get_primes(10000) print time.time() - start

And here is the Julia code:

function get_primes(n) numbers = [2:n] primes = Int[] while numbers != [] p = numbers[1] push!(primes, p) numbers = setdiff(numbers, [p*i for i=1:int(n/p)]) end return primes end @time get_primes(10000);

The Python code runs in about .005 seconds, while the Julia code takes about .5 seconds, so that means Python runs about 100x times faster. There's probably a perfectly logical reason for this, but I really have no idea what I'm doing.

Answer1:

The main difference is that in Python you're allocating a single set, number, and then modifying it in place, while in Julia, you're allocating a new array on every iteration of the loop. Another difference is that you're using a set in Python and an array in Julia (what Python calls a "list"). Let's change the Julia code to eliminate these two differences:

function get_primes(n) numbers = IntSet(2:n) primes = Int[] while !isempty(numbers) p = shift!(numbers) push!(primes, p) setdiff!(numbers, p:p:n) end return primes end

With this change, on my system, the Julia code is 10x faster than the Python code (0.0005 vs. 0.0048 seconds). Note that I also used a range as the second argument of the setdiff! – it's both terser and faster (1.5x) than constructing an array with a comprehension.

A somewhat more idiomatic style of writing this in Julia would be to use an array of booleans, turning them on and off:

function eratosthenes(n) primes = fill(true,n) primes[1] = false for p = 2:n primes[p] || continue for i = 2:div(n,p) primes[p*i] = false end end find(primes) end

The find at the end returns the indices where a vector is non-zero (i.e. true). On my machine, this is 5x faster (0.0001 seconds) than the other Julia version and 45x faster than the Python version.

Answer2:

<h3>Trying to use setdiff! instead of setdiff</h3>

With this code

function get_primes(n) numbers::Set{Int} = Set(2:n) primes::Array{Int64,1} = [] while !isempty(numbers) p = minimum(numbers) push!(primes,p); setdiff!(numbers,Set(p:p:n)) end return primes end

I got

julia> @time get_primes(10000); elapsed time: 0.029556727 seconds (1656448 bytes allocated)

The other (original) version is indeed bad because it spends most of the time in setdiff re-hashing after the calculation -- my timings for the unaltered version were similar to OP's. So it looks like <strong>setdiff!</strong> is perhaps 100x faster than <strong>setdiff</strong>, but accepts only <strong><em>Sets</em></strong> not <strong><em>Arrays</em></strong>.

This is still 6x slower than python but 13x faster than when using <strong>setdiff</strong>. However, if there were some way to maintain an ordered set and always take the first element, then it would likely be much faster because almost 90% of the time (209/235) is being spent finding the minimum of a set.

235 task.jl; anonymous; line: 96 235 REPL.jl; eval_user_input; line: 54 235 profile.jl; anonymous; line: 14 209 /Users/jeffw/src/errata/julia/sive.jl; get_primes; line: 5 2 reduce.jl; mapfoldl; line: 75 2 dict.jl; skip_deleted; line: 669 207 reduce.jl; mapfoldl; line: 81 1 reduce.jl; mapfoldl_impl; line: 54 1 dict.jl; skip_deleted; line: 670 199 reduce.jl; mapfoldl_impl; line: 57 10 dict.jl; skip_deleted; line: 668 132 dict.jl; skip_deleted; line: 669 12 dict.jl; skip_deleted; line: 670 27 dict.jl; skip_deleted; line: 672 7 reduce.jl; mapfoldl_impl; line: 58 1 /Users/jeffw/src/errata/julia/sive.jl; get_primes; line: 6 25 /Users/jeffw/src/errata/julia/sive.jl; get_primes; line: 7 14 set.jl; setdiff!; line: 24 1 dict.jl; skip_deleted; line: 669 <h3>Update changing to using IntSet and shift!</h3> function get_primes(n) numbers::IntSet = IntSet(2:n) primes::Array{Int64,1} = [] while !isempty(numbers) p = shift!(numbers) push!(primes,p); setdiff!(numbers,Set(p:p:n)) end return primes end julia> @time get_primes(10000); elapsed time: 0.003691856 seconds (1463152 bytes allocated)

Answer3:

I tested many approaches and I have found this one as the fastest in 8 different tests.

# Julia 0.4.0 [Execution time = 95us (After warm up!!)] function get_primes(n::Int64) numbers = fill(true,n) numbers[1] = false for i = 2:isqrt(n) if numbers[i] for j = i:div(n,i) numbers[i*j] = false end end end primes = find(numbers) # t=3-5us return primes end @time primes = get_primes(10000) println(get_primes(100))

First Julia code on this page calculated n=10000 in about <strong>1'000'000us</strong> on my PC, while this one took about <strong>95us</strong> which is 10'000x faster. <br />

# Python 3.4 [Execution time = 5ms (Every time)] def get_primes(n): m = n+1 numbers = [True for i in range(m)] for i in range(2, int(math.sqrt(n))): if numbers[i]: for j in range(i*i, m, i): numbers[j] = False primes = [] for i in range(2, m): if numbers[i]: primes.append(i) return primes start = time.time() primes = get_primes(10000) print(time.time() - start) print(get_primes(100))

Python test

# Python n=100e6 [Execution time 52.906s] start = time.time() get_primes(int(100e6)) print(time.time() - start)

Julia test

# Julia n=100e6 [Execution time 3.694s] @time get_primes(convert(Int64,100e6))

The difference is now less. <strong>Julia vs Python is about 12x faster.</strong>

Recommend

  • Form not working in Internet Explorer
  • Could not load file or assembly 'CefSharp.Wpf for both x64 and x86; only one works
  • Script to export layer coordinates to excel
  • C++ To call member function in for_each for items in the member container
  • How do you reference a field in the Embedded Code of an SSRS report
  • Flutter ThemeData is not working for the Text
  • Remove all specific value from array
  • Android getting path to image from Thumbnail?
  • How select elements between interval
  • Running the Python Code on Hadoop Failed
  • How to use aggregrate in mongodb to $match _id
  • qtextedit change font
  • SQL Server - Find records that have appeared 3 times in last 30 days
  • controlling length of the line of best fit in R
  • How to use symmetricDS in Embedded mode
  • tensorflow-lite - using tflite Interpreter to get an image in the output
  • saving and retrieving key-value pairs in an sqlite database android
  • How do I remove \\r\\n from string in C#?
  • Is Win2D yet available in C++/WinRt?
  • spark streaming write data to Hbase with python blocked on saveAsNewAPIHadoopDataset
  • how to set value in array form with angular
  • How to parse Response xml in JMeter and send the result as dynamic parameters to another http reques
  • WP Job Manager: Creating a custom job search form
  • What happens when one of the Kafka replicas is down
  • How can I extend a Retrofit 2.0 Call?
  • How do decorators in es6 work?
  • Cannot get Django 1.7 Migrations to detect proper changes to my DB.
  • git clone, upload-pack out of memory
  • Ruby on Rails: Get mediaplayer information (iTunes, TRAKTOR, Cog; current song + playlist)
  • Possible to set default CloudKit container not based on application name?
  • concise way of flattening multiindex columns
  • gnuplot - How to make zmin equal to zmax keeeping autoscale on z axis
  • Python 3x- Compression Makes File Bigger :(
  • Change cell value based on cell color in google spreadsheet
  • Bad automatic Triangulation with Mayavi for coloring a surface known only by its corner