83354

SQL grouping similar values together

Question:

I am having a problem to group data. I have used the group by clause all the while, but this time I want to group similar valued data together. The need is I have some jobs which have sequence numbers associated. If I can group the jobs together which have more or less the same sequence numbers then I can run a lesser number of jobs.

The data I have is like this.

JobID Sequence A01 8 A01 6 A01 10 A02 5 A02 10 A02 4 A02 2 A03 8 A03 3 A03 6 A03 10 A04 5 A04 4 A04 2 A04 9 A04 10

From the above data it can be seen that A02 and A04 have more in common and A01 and A03 have more in common.

What query should be done to get similar data group together as below to group A02 and A04 together and A01 and A03 together, the order on the sequence column being immaterial.

JobID Sequence A01 8 A01 6 A01 10 A03 8 A03 3 A03 6 A03 10 A02 5 A02 10 A02 4 A02 2 A04 5 A04 4 A04 2 A04 9 A04 10

Thanks for the time

david

PS - More explanation.

In the above list JobID A01 contains sequence (8,6,10) A02 contains sequence (5,10,4,2) A03 contains sequence (8,3,6,10) A04 contains sequence (5,4,2,9,10)

So Job A01 and Job A03 have similar sequence numbers and Job A02 and Job A03 have similar sequence numbers. I want to group them based on similar sequence numbers There are many other jobs which have sequences which might match some other job's sequence numbers. I just included 4 jobs to keep the list small.

Answer1:

This is a little more complex of a problem then I feel like thinking all the way through right now, but I'll give you an idea to start with and maybe someone else can help you complete it...

Join the table to itself like so:

Select A.JobID, A.Sequence, Count(*) from TheTable A join TheTable B on A.JobID <> B.JobID and A.Sequence = B.Sequence group by A.JobID

I haven't tested that so there could be typos, but you get the idea hopefully. Notice you're joining where the job is not the same, but the sequence is.

Answer2:

Just inferring from other answers... something that may help.

Here you have for every two jobids how simmilar are:

<a href="http://sqlfiddle.com/#!3/c28be/9" rel="nofollow">http://sqlfiddle.com/#!3/c28be/9</a>

Create table Data(Job nvarchar(10), seq int); insert into data SELECT 'A01' ,8 UNION ALL SELECT 'A01',6 UNION ALL SELECT 'A01',10 UNION ALL SELECT 'A02',5 UNION ALL SELECT 'A02',10 UNION ALL SELECT 'A02',4 UNION ALL SELECT 'A02',2 UNION ALL SELECT 'A03',8 UNION ALL SELECT 'A03',3 UNION ALL SELECT 'A03',6 UNION ALL SELECT 'A03',10 UNION ALL SELECT 'A04',5 UNION ALL SELECT 'A04',4 UNION ALL SELECT 'A04',2 UNION ALL SELECT 'A04',9 UNION ALL SELECT 'A04',10; select d1.job as j1, d2.job as j2, count(*) cnt from Data d1 inner join Data d2 on (d1.seq = d2.seq and d1.job < d2.job) group by d1.job, d2.job ;

Answer3:

Building on Brandon Moores answer:

Data setup:

DECLARE @Data TABLE (JobId nvarchar(10), Sequence int) INSERT INTO @Data(JobId, Sequence) SELECT 'A01',8 UNION ALL SELECT 'A01',6 UNION ALL SELECT 'A01',10 UNION ALL SELECT 'A02',5 UNION ALL SELECT 'A02',10 UNION ALL SELECT 'A02',4 UNION ALL SELECT 'A02',2 UNION ALL SELECT 'A03',8 UNION ALL SELECT 'A03',3 UNION ALL SELECT 'A03',6 UNION ALL SELECT 'A03',10 UNION ALL SELECT 'A04',5 UNION ALL SELECT 'A04',4 UNION ALL SELECT 'A04',2 UNION ALL SELECT 'A04',9 UNION ALL SELECT 'A04',10 UNION ALL SELECT 'A05',100

Find totals of all sequences each JobID has in common, order those by most to least, output all the data from each JobId depending on that order:

;WITH cte AS ( SELECT A.JobID, A.Sequence, Count(*) AS [SequencesInCommon] FROM @Data A LEFT OUTER JOIN @Data B on A.JobID <> B.JobID and A.Sequence = B.Sequence GROUP BY A.JobID, A.Sequence ), cte2 AS ( SELECT JobID, SUM(SequencesInCommon) AS Total FROM cte GROUP BY JobID ) SELECT d.JobId, d.Sequence FROM cte2 c INNER JOIN @Data d on c.jobID = d.JobID ORDER BY c.Total ASC, c.JobID ASC

Gives:

JobId Sequence ---------- ----------- A05 100 A01 8 A01 6 A01 10 A03 8 A03 3 A03 6 A03 10 A02 5 A02 10 A02 4 A02 2 A04 5 A04 4 A04 2 A04 9 A04 10 (17 row(s) affected)

Should do it :)

Recommend

  • Change font size to fit text inside button entirely?
  • Rectangular Nesting - Convergence to optimal solution using Simulated Annealing
  • SQL Server 2008 R2 - Islands and Gaps [closed]
  • Group variable in cobol
  • R h2o.glm - issue with max_active_predictors
  • CRASH: *** -[__NSArrayM objectAtIndex:]: index 4294967295 beyond bounds [0 .. 9]
  • Are there any side effects from calling SQLAlchemy flush() within code?
  • How to programatically 'login' a user based on 'remember me' cookie when using j
  • Java color detection
  • I18n locale disregarding fallbacks
  • Silverlight DependencyProperty.SetCurrentValue Equivalent
  • Sequential (transactional) API calls in angular 4 with state management
  • uniform generation of points on 3D box
  • How can I speed up CURL tasks?
  • Moving Android View and preventing onDraw to be called over and over again
  • SharedPreferences or SQLite Database?
  • How do I open a C file with a relative path?
  • Use of this Javascript
  • C++ Partial template specialization - design simplification
  • Q promise. Difference between .when and .then
  • Linq Objects Group By & Sum
  • Using $this when not in object context
  • Uncaught Error: Could not find module `ember-load-initializers`
  • Read text file and split every line in MSBuild
  • Optimizing database types to compact database (SQLite)
  • C# - Serializing and deserializing static member
  • Is possible to count alias result on mysql
  • Java applet as stand-alone Windows application?
  • How to get next/previous record number?
  • Calling of Constructors in a Java
  • using conditional logic : check if record exists; if it does, update it, if not, create it
  • Python: how to group similar lists together in a list of lists?
  • Buffer size for converting unsigned long to string
  • python regex in pyparsing
  • Error creating VM instance in Google Compute Engine
  • Android Google Maps API OnLocationChanged only called once
  • Programmatically clearing map cache
  • reshape alternating columns in less time and using less memory
  • costura.fody for a dll that references another dll
  • How can I use threading to 'tick' a timer to be accessed by other threads?