I have a table that contains a list of records. Each iteration, a random set of these must be picked, starting from a specific offset. Each row has a chance to be picked (so e.g. new or not often picked rows are picked more).
However, <em>something</em> doesn't work, causing rows to be returned that do not satisfy a condition using an aliased
I'm attempting to use the following query:
select id, probability, rand() rolledChance from records where id > :offset having rolledChance < probability;
:offset is a prepared statement parameter, and is the last scanned id in the last iteration for this user.
On a table created like this (which is the relevant subset of the table):
CREATE TABLE records (id INT, probability FLOAT);
Where probability is a value between 0 and 1 on the table
records. However, this returns rows where the condition does not satisfy. I checked this with the following query:
select *, x.rolledChance < x.probability shouldPick from (select id, probability, rand() rolledChance from records having rolledChance < probability ) x;
A few rows returned are:
id probability rolledChance shouldPick 12 0.546358 0.015139976530466207 1 26 0.877424 0.9730734508233829 0 46 0.954425 0.35213605347288407 1
When I repurpose the second query as follows, it works as expected, and only returns rows where
rolledChance is actually lower than
select *, x.rolledChance < x.probability shouldPick from (select id, probability, rand() rolledChance from records) x where rolledChance < probability;
So what am I missing? Are the
rolledChance used differently than I thought in the comparison? Is the
rand() evaluated every time the alias is used in the same query?
mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2, running on Debian Jessie.
I think the problem is that HAVING is applied after GROUP BY, but still before the SELECT phase. I realise it's confusing because the HAVING clause references a column from the SELECT statement, but I think it basically just executes whatever is in the SELECT statement twice - once for the having, and then again for the SELECT.
Eg, see <a href="https://stackoverflow.com/a/14123694/749702" rel="nofollow">this answer</a>.
Note, it's especially confusing because if you refer to a column name that doesn't appear in the SELECT statement in a HAVING clause it'll throw an error.
Eg, <a href="http://sqlfiddle.com/#!9/6cc60/6" rel="nofollow">this fiddle</a>
But as per that fiddle above, it'll still let you actually filter based on the result of a function that doesn't appear in the output. Long story short, the HAVING clause is still doing what you want, but you can't both filter on a random value and display it at the same time using that approach. If you need to do that, you need to use a subquery to fix the value first, then the outer query can filter and display on it.
Also, to make it clear, it's probably worth just using RAND() in the having clause, not the SQL part. Though I get that this question is asking <em>why</em> it's doing this rather than trying to solve the problem specifically.