42311

Counting all rows with specific columns and grouping by week

I've been trying now for some time to create a query that would count all rows from a table per day that include a column with certain id, and then group them to weekly values based on the UNIX timestamp column. I have a medium sized dataset with 37 million rows, and have been trying to run following kind of query:

SELECT DATE(timestamp), COUNT(*) FROM `table` WHERE ( date(timestamp) between "YYYY-MM-DD" and "YYYY-MM-DD" and column_group_id=X ) group by week(date(startdate))

Though I'm getting weird results, and the query doesn't group the counts correctly but shows too large values on the resulting count column (I verified the value errors by querying very small spesific datasets.)

If I group by date(startdate) instead, the row counts match per day basis but I'd like to combine these daily amount of rows to weekly amounts. How this could be possible? The data is needed in format:

2006-01-01 | 5 2006-01-08 | 10

so that the day timestamp is the first column and second is the amount of rows per week.

Answer1:

Your query is non deterministic so it is not surprising you are getting unexpected results. By this I mean you could run this query on the same data 5 times and get 5 different result sets. This is due to the fact you are selecting DATE(timestamp) but grouping by WEEK(DATE(startdate)), the query is therefore returning the time of the first row it comes accross per startdate week in <strong>ANY</strong> order.

Consider the following 2 rows (with timestamp in date format for ease of reading):

TimeStamp StartDate 20120601 20120601 20120701 20120601

Your query is grouping by WEEK(StartDate) which is 23, since both rows evaluate to the same value you would expect your results to have 1 row with a count of 2.

<strong>HOWEVER</strong> DATE(Timestamp) Is also in the select list and since there is no ORDER BY statement the query has no idea which Timestamp to return '20120601' or '20120701'. So even on this small result set you have a 50:50 chance of getting:

TimeStamp COUNT 20120601 2

and a 50:50 chance of getting

TimeStamp COUNT 20120701 2

If you add more data to the dataset as so:

TimeStamp StartDate 20120601 20120601 20120701 20120601 20120701 20120701

You could get

TimeStamp COUNT 20120601 2 20120701 1

or

TimeStamp COUNT 20120701 2 20120701 1

You can see how with 37,000,000 rows you will soon get results that you do not expect and cannot predict!

<strong>EDIT</strong>

Since it looks like you are trying to get the weekstart in your results, while group by week you could use the following to get the week start (replacing CURRENT_TIMESTAMP with whichever column you want):

SELECT DATE_ADD(CURRENT_TIMESTAMP, INTERVAL 1 - DAYOFWEEK(CURRENT_TIMESTAMP) DAY) AS WeekStart

You can then group by this date too to get weekly results and avoid the trouble of having things in your select list that aren't in your group by.

Answer2:

Try this

SELECT DATE(timestamp), COUNT(week(date(startdate))) FROM `table` WHERE ( date(timestamp) between "YYYY-MM-DD" and "YYYY-MM-DD" and column_group_id=X ) group by week(date(startdate))

Recommend

  • Add-In Commands Ribbon shows in Excel Online but not in Excel for Windows
  • Getting SerializeObject to use JsonProperty “name” defined inside interface
  • Get highest value from a file using mSL and mIRC
  • How to override JAXB @XMLAccessorType(XMLAccessType.FIELD) specified at a Class level with @XMLEleme
  • Execute a piece of code from the data-section
  • Date format change angular 2
  • Facebook Open Graph Story Custom Actions Keep Getting Rejected - Advice Please?
  • finding symmetric difference/unique elements in multiple arrays in javascript
  • How to fetch the alt value from an img using vba
  • html5 tag support
  • How to use java.time.LocalDate on a Play Framework JSON Rest?
  • Year, Month, and Day parameters describe an un-representable DateTime in Persian calender
  • jQuery - how to validate a date of birth using jQuery Validation plugin?
  • jQuery random blockquote
  • A simple datepicker in VueJS
  • How to parsing NSDate to RFC 822 always use in English?
  • iOS Date formatting
  • Java : Simple XML not parsing the xml. Gives Exception
  • C++ String tokenisation from 3D .obj files
  • Scala split a multi line string by lines that contain all hyphens
  • Binary Tree Traversal Sum Of Each Depth
  • Small video playback
  • Insert records if not exist SQL Server 2005
  • Selecting a subset of data in ServiceStack.OrmLite
  • Get localized short date pattern as String?
  • Cannot upload to OneDrive using the new SDK
  • Multicolored edittext hint
  • Redux Form - Not able to type anything in input
  • Is playing sound in Javascript performance heavy?
  • Parse a date string in a specific locale (not timezone!)
  • How can I sort a a table with VBA with given text condition?
  • All Classes Conforming to Protocol Inherit Default Implementation
  • Insert into database using onclick function
  • What is Eclipse's Declaration View used for?
  • Controls, properties, events and timers running in design time
  • Knitr HTML Loop - Some HTML output, some R output
  • Updated Ionic CLI but shows previous version (Windows)
  • Can a Chrome extension content script make an jQuery AJAX request for an html file that is itself a
  • Setting background image for body element in xhtml (for different monitors and resolutions)
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?