4872

Create ranking for vector of double

I have a vector with doubles which I want to rank (actually it's a vector with objects with a double member called costs). If there are only unique values or I ignore the nonunique values then there is no problem. However, I want to use the average rank for nonunique values. Furthermore, I have found some question at SO about ranks, however they ignore the non-unique values.

Example, say we have (1, 5, 4, 5, 5) then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore the non-unique values the ranks are (1, 3, 2, 4, 5).

When ignoring the nonunique values I used the following:

void Population::create_ranks_costs(vector<Solution> &pop) { size_t const n = pop.size(); // Create an index vector vector<size_t> index(n); iota(begin(index), end(index), 0); sort(begin(index), end(index), [&pop] (size_t idx, size_t idy) { return pop[idx].costs() < pop[idy].costs(); }); // Store the result in the corresponding solutions for (size_t idx = 0; idx < n; ++idx) pop[index[idx]].set_rank_costs(idx + 1); }

Does anyone know how to take the non-unique values into account? I prefer using std::algorithm since IMO this lead to clean code.

Answer1:

One way to do so would be using a multimap.

    <li>

    Place the items in a multimap mapping your objects to size_ts (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).

    </li> <li>

    Loop (either plainly or using whatever from algorithm) and assign 0, 1, ... as the values.

    </li> <li>

    Loop over the distinct keys. For each distinct key, call equal_range for the key, and set its values to the average (again, you can use stuff from algorithm for this).

    </li> </ul>

    The overall complexity should be Theta(n log(n)), where n is the length of the vector.

    Answer2:

    Here is a routine for vectors as the title of the question suggests:

    template<typename Vector> std::vector<double> rank(const Vector& v) { std::vector<std::size_t> w(v.size()); std::iota(begin(w), end(w), 0); std::sort(begin(w), end(w), [&v](std::size_t i, std::size_t j) { return v[i] < v[j]; }); std::vector<double> r(w.size()); for (std::size_t n, i = 0; i < w.size(); i += n) { n = 1; while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n; for (std::size_t k = 0; k < n; ++k) { r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values // r[w[i+k]] = i + 1; // min // r[w[i+k]] = i + n; // max // r[w[i+k]] = i + k + 1; // random order } } return r; }

    A working example see on IDEone.

    For ranks with tied (equal) values there are varying conventions (min, max, averaged rank, or random order). Choose one of these in the innermost for loop (averaged rank is common in statistics, min rank in sports).

    Please take into account, that averaged ranks can be non-integral (n+0.5). I don't know, if rounding down to integral rank n is a problem for your application.

    The algorithm easily could be generalized for user-defined orderings like pop[i].costs(), with std::less<> as default.

    Answer3:

    Something along these lines:

    size_t run_start = 0; double run_cost = pop[index[0]].costs(); for (size_t idx = 1; idx <= n; ++idx) { double new_cost = idx < n ? pop[index[idx]].costs() : 0; if (idx == n || new_cost != run_cost) { double avg_rank = (run_start + 1 + idx) / 2.0; for (size_t j = run_start; j < idx; ++j) { pop[index[j]].set_rank_costs(avg_rank); } run_start = idx; run_cost = new_cost; } }

    Basically, you iterate over the sorted sequence and identify runs of equal values (possibly runs of length 1). For each such run, you calculate its average rank, and set it for all elements in the run.

Recommend

  • accessing eigenvalues in RSSA package in R
  • java convert to int
  • picking PDF files using Intent in android
  • How to add a column to a DataFrame based on a multi-index map
  • select option by text using jquery
  • How to detect when a view inside HorizontalScrollView touches another view?
  • How to replace numbers with order in (python) list
  • How to rename file with a sequence that restarts if certain matches exist
  • Formula to remove entire words that start with certain characters
  • How to call firefox addon function with onclick in html
  • How do I create closures for model getter-setter in angular?
  • Shortest route between multiple points in mapkit for iphone app
  • uninitialized AudioTrack exception when I try to generate tone on 22nd time
  • Android RecyclerView Blank Space
  • MATLAB quick find of a vector in matrix
  • maven-shade-plugin reports: Error creating shaded jar: …target/classes (Is a directory)
  • opengl: adding higher resolution mipmaps to a texture
  • Weird session behaviour in codeigniter
  • Variant from android-autofittextview library : scaling makes strange behaviour
  • twisted.internet.error.ConnectError when run scrapy spider
  • Using multiple input pipelines in TensorFlow
  • Is there a equivalent to JSON.Net in Java? [duplicate]
  • Getting IIS6 to play nice with WordPress Pretty Permalinks
  • end daemon processes with multiprocessing module
  • Generating anchors with PyYAML.dump()?
  • Regex for nested values
  • How gzip file gets stored in HDFS
  • runtime-check whether an instance (Base*) override a parent function (Base::f())
  • in batch how do i use taskkill properly
  • Local Development, Apache vs Developer - file permissions
  • Configure Spring's MappingJacksonHttpMessageConverter
  • Building Qt project for C++11 standard
  • C++ friend class std::vector
  • Error in installing package: fatal error: stdlib.h: no such file or directory
  • How to autopopulate a field in SugarCRM form
  • Android app gives error “BatteryStatsImpl: reading network stats”
  • Time complexity of a program which involves multiple variables
  • print() is showing quotation marks in results
  • one Local Olampyad Questions on Informatic in 2011
  • Java Scanner input dilemma. Automatically inputs without allowing user to type