4525

Collect Lowest Numbers Algorithm

Question:

I'm looking for an algorithm (or PHP code, I suppose) to end up with the 10 lowest numbers from a group of numbers. I was thinking of making a ten item array, checking to see if the current number is lower than one of the numbers in the array, and if so, finding the highest number in the array and replacing it with the current number.

However, I'm planning on finding the lowest 10 numbers from thousands, and was thinking there might be a faster way to do it. I plan on implementing this in PHP, so any native PHP functions are usable.

Answer1:

What you're looking for is called a <strong>selection algorithm</strong>. The Wikipedia page on the subject has a few subsections in the <a href="http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements" rel="nofollow">selecting <em>k</em> smallest or largest elements</a> section. When the list is large enough, you can <em>beat</em> the time required for the naive "sort the whole list and choose the first 10" algorithm.

Answer2:

Sort the array and use the ten first/last entries.

Honestly: sorting an array with a thousand entries costs less time than it takes you to blink.

Answer3:

Naive approach is to just sort the input. It's likely fast enough, so just try it and profile it before doing anything more complicated.

Potentially faster approach: Linearly search the input, but keep the output array sorted to make it easier to determine if the next input belongs in the array or not. Pseudocode:

output[0-9] = input[0-9]; sort(output); for i=10..n-1 if input[i] < output[9] insert(input[i])

where insert(x) will find the right spot (binary search) and do the appropriate shifting.

But seriously, just try the naive approach first.

Answer4:

Where are you getting this group of numbers?

If your List of numbers is already in an array you could simply do a <a href="http://us.php.net/manual/en/function.sort.php" rel="nofollow">sort()</a>, and then a <a href="http://us.php.net/manual/en/function.array-slice.php" rel="nofollow">array_slice()</a> to get the first 10.

Answer5:

I doesn't matter much for a small array, but as it gets larger a fast and easy way to increase processing speed is to take advantage of array key indexing, which for 1 mill. rows will use about 40% of the time. Example:

// sorting array values $numbers = array(); for($i = 0; $i < 1000000; ++$i) { $numbers[$i] = rand(1, 999999); } $start = microtime(true); sort($numbers); $res = array_slice($numbers, 0, 10, true); echo microtime(true) - $start . "\n"; // 2.6612658500671 print_r($res); unset($numbers, $res, $start); // sorting array keys $numbers = array(); for($i = 0; $i < 1000000; ++$i) { $numbers[rand(1, 999999)] = $i; } $start = microtime(true); ksort($numbers); $res = array_keys(array_slice($numbers, 0, 10, true)); echo microtime(true) - $start . "\n"; // 0.9651210308075 print_r($res);

But if the array data is from a database the fastest is probably to just sort it there:

SELECT number_column FROM table_with_numbers ORDER BY number_column LIMIT 10

Answer6:

Create a sorted set (TreeSet in Java, I don't know about PHP), and add the first 10 numbers. Now iterate over the rest of the numbers Iterate over all your numbers, add the new one, then remove the biggest number from the set.

This algorithm is O(n) if n >> 10.

Answer7:

I would use a <a href="http://en.wikipedia.org/wiki/Heap_%28data_structure%29" rel="nofollow">heap</a> with 10 elements and the highest number at the root of the tree. Then start at the beginning of the list of numbers:

<ul><li>If the heap has less than 10 elements: add the number to the list</li> <li>Otherwise, if the number is smaller than the highest number in the heap, remove the highest number in the heap, and then add the current number to the list</li> <li>Otherwise, ignore it.</li> </ul>

You will end up with the 10 lowest numbers in the heap. If you are using an array as the heap data structure, then you can just use the array directly.

(alternatively: you can slice out the first 10 elements, and heapify them instead of using the first step above, which will be slightly faster).

However, as other people have noted, for 1000 elements, just sort the list and take the first 10 elements.

Recommend

  • Is there better way than a Dijkstra algorithm for finding fastest path that do not exceed specified
  • Formatting numbers from 1000 to 10.00
  • Using the Menu, Back, Home buttons with AIR for Android, Flash CS6
  • Use of Java generics could be hanging the compiler
  • When is it better to use polling instead of the channel api?
  • SciKit One-class SVM classifier training time increases exponentially with size of training data
  • Proper handling of chuncked Http Response within Socket
  • Using bootstrap-datetime picker with Vuejs 2
  • How to (re)name an empty column header in a pandas dataframe without exporting to csv
  • looking for a slight variant of GROUP BY
  • VBA: Extract Top 'x' Entries from each category
  • How to display callstack line numbers when my program is broken in Rust?
  • Should I optimize around reads or CPU time in Google App Engine
  • SSRS 2008 - Sorting within a group
  • Spring Web Flow exception handling
  • Why not Factory pattern for sorting? [closed]
  • How to use ResourceDictionary in Windows Phone class library project
  • How to select table rows/complete table?
  • Wait for .each() .getJSON request to finish before executing a callback
  • SQL query to group by maximal sets of a column having inner consecutive distances below a threshold
  • How can I get the choice “H2” back in the H2 consol?
  • SyntaxError: (irb):26: both block arg and actual block given
  • Azure webjobs output logs indexing taking very long
  • How do I shift the decimal place in Python?
  • Can't remove headers after they are sent
  • How do I display a dialog that asks the user multi-choice questıon using tkInter?
  • ADO and msqli connections very slow
  • Marklogic : Query response time is very high
  • How can I sort a a table with VBA with given text condition?
  • How to use remove-erase idiom for removing empty vectors in a vector?
  • Is it possible to access block's scope in method?
  • Django: Count of Group Elements
  • Scrapy recursive link crawler
  • How to rebase a series of branches?
  • Xamarin Forms - UWP Fonts
  • Turn off referential integrity in Derby? is it possible?
  • Add sale price programmatically to product variations
  • Does armcc optimizes non-volatile variables with -O0?
  • Unable to use reactive element in my shiny app
  • How do I use LINQ to get all the Items that have a particular SubItem?