4872 # Create ranking for vector of double

I have a vector with doubles which I want to rank (actually it's a vector with objects with a double member called `costs`). If there are only unique values or I ignore the nonunique values then there is no problem. However, I want to use the average rank for nonunique values. Furthermore, I have found some question at SO about ranks, however they ignore the non-unique values.

Example, say we have (1, 5, 4, 5, 5) then the corresponding ranks should be (1, 4, 2, 4, 4). When we ignore the non-unique values the ranks are (1, 3, 2, 4, 5).

When ignoring the nonunique values I used the following:

```void Population::create_ranks_costs(vector<Solution> &pop) { size_t const n = pop.size(); // Create an index vector vector<size_t> index(n); iota(begin(index), end(index), 0); sort(begin(index), end(index), [&pop] (size_t idx, size_t idy) { return pop[idx].costs() < pop[idy].costs(); }); // Store the result in the corresponding solutions for (size_t idx = 0; idx < n; ++idx) pop[index[idx]].set_rank_costs(idx + 1); } ```

Does anyone know how to take the non-unique values into account? I prefer using `std::algorithm` since IMO this lead to clean code.

One way to do so would be using a `multimap`.

<li>

Place the items in a multimap mapping your objects to `size_t`s (the intial values are unimportant). You can do this with one line (use the ctor that takes iterators).

</li> <li>

Loop (either plainly or using whatever from `algorithm`) and assign 0, 1, ... as the values.

</li> <li>

Loop over the distinct keys. For each distinct key, call `equal_range` for the key, and set its values to the average (again, you can use stuff from `algorithm` for this).

</li> </ul>

The overall complexity should be Theta(n log(n)), where n is the length of the vector.

Here is a routine for vectors as the title of the question suggests:

```template<typename Vector> std::vector<double> rank(const Vector& v) { std::vector<std::size_t> w(v.size()); std::iota(begin(w), end(w), 0); std::sort(begin(w), end(w), [&v](std::size_t i, std::size_t j) { return v[i] < v[j]; }); std::vector<double> r(w.size()); for (std::size_t n, i = 0; i < w.size(); i += n) { n = 1; while (i + n < w.size() && v[w[i]] == v[w[i+n]]) ++n; for (std::size_t k = 0; k < n; ++k) { r[w[i+k]] = i + (n + 1) / 2.0; // average rank of n tied values // r[w[i+k]] = i + 1; // min // r[w[i+k]] = i + n; // max // r[w[i+k]] = i + k + 1; // random order } } return r; } ```

A working example see on IDEone.

For ranks with tied (equal) values there are varying conventions (min, max, averaged rank, or random order). Choose one of these in the innermost for loop (averaged rank is common in statistics, min rank in sports).

Please take into account, that averaged ranks can be non-integral (`n+0.5`). I don't know, if rounding down to integral rank `n` is a problem for your application.

The algorithm easily could be generalized for user-defined orderings like `pop[i].costs()`, with `std::less<>` as default.

```size_t run_start = 0; double run_cost = pop[index].costs(); for (size_t idx = 1; idx <= n; ++idx) { double new_cost = idx < n ? pop[index[idx]].costs() : 0; if (idx == n || new_cost != run_cost) { double avg_rank = (run_start + 1 + idx) / 2.0; for (size_t j = run_start; j < idx; ++j) { pop[index[j]].set_rank_costs(avg_rank); } run_start = idx; run_cost = new_cost; } } ```