66910

Using `rand()` with `having`

Question:

I have a table that contains a list of records. Each iteration, a random set of these must be picked, starting from a specific offset. Each row has a chance to be picked (so e.g. new or not often picked rows are picked more).

However, <em>something</em> doesn't work, causing rows to be returned that do not satisfy a condition using an aliased rand().

I'm attempting to use the following query:

select id, probability, rand() rolledChance from records where id > :offset having rolledChance < probability;

Where :offset is a prepared statement parameter, and is the last scanned id in the last iteration for this user.

On a table created like this (which is the relevant subset of the table):

CREATE TABLE records (id INT, probability FLOAT);

Where probability is a value between 0 and 1 on the table records. However, this returns rows where the condition does not satisfy. I checked this with the following query:

select *, x.rolledChance < x.probability shouldPick from (select id, probability, rand() rolledChance from records having rolledChance < probability ) x;

A few rows returned are:

id probability rolledChance shouldPick 12 0.546358 0.015139976530466207 1 26 0.877424 0.9730734508233829 0 46 0.954425 0.35213605347288407 1

When I repurpose the second query as follows, it works as expected, and only returns rows where rolledChance is actually lower than probability:

select *, x.rolledChance < x.probability shouldPick from (select id, probability, rand() rolledChance from records) x where rolledChance < probability;

So what am I missing? Are the probability and rolledChance used differently than I thought in the comparison? Is the rand() evaluated every time the alias is used in the same query?

Version output: mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2, running on Debian Jessie.

Answer1:

I think the problem is that HAVING is applied after GROUP BY, but still before the SELECT phase. I realise it's confusing because the HAVING clause references a column from the SELECT statement, but I think it basically just executes whatever is in the SELECT statement twice - once for the having, and then again for the SELECT.

Eg, see <a href="https://stackoverflow.com/a/14123694/749702" rel="nofollow">this answer</a>.

Note, it's especially confusing because if you refer to a column name that doesn't appear in the SELECT statement in a HAVING clause it'll throw an error.

Eg, <a href="http://sqlfiddle.com/#!9/6cc60/6" rel="nofollow">this fiddle</a>

But as per that fiddle above, it'll still let you actually filter based on the result of a function that doesn't appear in the output. Long story short, the HAVING clause is still doing what you want, but you can't both filter on a random value and display it at the same time using that approach. If you need to do that, you need to use a subquery to fix the value first, then the outer query can filter and display on it.

Also, to make it clear, it's probably worth just using RAND() in the having clause, not the SQL part. Though I get that this question is asking <em>why</em> it's doing this rather than trying to solve the problem specifically.

Recommend

  • How to avoid creating multiple string objects in java?
  • Can I tell NDepend to ignore a single result?
  • how to solve access denied when using UWP GetFolderFromPathAsync
  • How to add multiple events for “input” in post messages
  • How to copy file to local directory using Ansible?
  • Javascript matrix inversion
  • Where can I find tomesh.c for windows?
  • How to retrieve data from Firebase Database with join if key only return true?
  • Open default mail app from within Qt with some html
  • Missing App Store icon codename one build
  • Identify xml text elements with Schematron
  • How to get Attachment value from “$File” Item? using C# (Lotus Notes)
  • Covert RFC3339 DateTime to Date in java [duplicate]
  • JavaFX resize children relative to parent
  • Need reference code for SMO in C# SQL Server 2008
  • Retrieve 3rd MAX salary in Hive
  • using vtkTimerCallback with QVTKRenderWindowInteractor not working
  • MVC3 Extension for ValidatorMessage
  • Defer unused CSS
  • Create .java file and compile it to a .class file at runtime
  • Authorize Attribute Authentication with Postman in Web Api
  • Extracting a small subset of data from XMLs
  • Slice assignment to tensorflow variable
  • Wireshark Display Filter for Unique Source/Destination IP and Protocol
  • Swift: UIView.animate works unexpectedly
  • How to check if a database and tables exist in sql server in a vb .net project?
  • Year over Year Stats from a Crossfilter Dataset
  • Rotating Towards Path in OpenGL
  • Is there a better way for handling SpatialPolygons that cross the antimeridian (date line)?
  • Neo4j…how to get a visual representation of my data?
  • How to call jQuery function in HTML returned by AJAX
  • How do I add a mouse over tooltip to an Image using .DrawImage()
  • reshape/remould data frame to create normalized bar chart and pie chart
  • How to change user identity when git pushing via ssh?
  • How to warp text around image in iOS?
  • Google App Engine Datastore: Dealing with eventual consistency
  • php “page caching” solution suggestions for CMS Applications
  • Accessing Arguments, Workflow Variables from custom activities
  • Firebase: How to read from external DB?
  • ReferenceError: TextEncoder is not defined