Skip to content Skip to sidebar Skip to footer

Sql Select Then Group by and Then Select and Group by Again

By:   |   Updated: 2021-09-09   |   Comments (2)   |   Related: More than > TSQL



Trouble

So, y'all have a basic understanding of the Group By clause in SQL Server, only y'all still experience like there is more than to this elementary clause than y'all take been taught. Well, you would be right, at least for some people. A majority of us learned the basics just no one ever went whatever further with explaining the full spectrum of this clause, what it is capable of and how to take advantage of its diversity. I've seen enough of preparation videos, attended a few classes that attempted to teach the basics of SQL Server / T-SQL, and and then far, none of them have delved deep into the full abilities of the Group Past clause.

Solution

In this tutorial we will encompass the basics of the GROUP BY clause and and then nosotros will delve further in and try to expand on all, or at least about, of the abilities of the clause and how y'all tin take advantage of its options. Kind of a GROUP By on steroids.

We will first with the basics, adding in some features similar CUBE, ROLLUP and GROUPING SETS, but nosotros also be discussing other benefits too as the limitations of the Group Past clause.

Getting AdventureWorks Sample Database for Testing

For simplicity's sake and keeping with a standard test database, nosotros volition exist working with the AdventureWorks2014 database. If you already have this sample database installed, don't worry, nosotros will not be changing any of the tables or data. We will, however, create some new tables along with a new schema to work with. Afterwards, we will simply dump the tables besides every bit whatsoever schemas nosotros create. (If you so cull.)

If you do not accept the AdventureWorks2014 database installed already, y'all can get a backup (BAK) version for free at this link: AdventureWorks sample databases

Once information technology's downloaded, merely follow the basic steps to restore from a ".BAK" file in your SQL Server Management Studio.

If you don't want to mess with sifting through the higher up referenced webpage to detect the right database, you tin click this link to initialize a straight download from Microsoft's GitHub repository.

Now, let'south dive right in and acquire most the Grouping BY clause from the footing up.

GROUP By Statement Basics

In the code block below, yous volition discover the basic syntax of a simple SELECT argument with a Grouping By clause.

SELECT columnA, columnB FROM tableName  GROUP By columnA, columnB; GO          

At the core, the GROUP Past clause defines a group for each singled-out combination of values in a grouped chemical element.

In simpler terms, the GROUP BY clause combines rows into groups based on matching data in specified columns of a table. One row will be returned for each group.

For example, if you have a cavalcade named "Title" in your table and it has iii values (manager, programmer, and clerk), but the table has xx rows, there will be duplicate entries of the three values in the "Title" column even though you lot take unique persons assigned to each row in the "Name" column. The Grouping By clause will suspension all 20 rows into three groups and return simply three rows of information, one for each group.

Important points for the GROUP By SQL Statement:

  • The GROUP Past statement can simply exist used in a SQL SELECT statement.
  • The GROUP BY statement must be afterwards the WHERE clause. (If one exists.)
  • The GROUP BY statement must be before the Social club By clause. (If 1 exists.)
  • To filter the Grouping By results, you must utilize the HAVING clause subsequently the Grouping BY.
  • The Group BY argument is often used in conjunction with an aggregate function such every bit COUNT, MIN, MAX, AVG, or SUM.
  • All column names listed in the SELECT command must also appear in the GROUP BY statement whether you have an aggregate function or non.
  • Except for TEXT, NTEXT, and Prototype, any column can be called in the GROUP By statement.

query results In 2017, Microsoft stated that the data types "TEXT", "NTEXT", and "IMAGE" would be deprecated in future versions of SQL Server. However, they are still applicable in SQL Server 2022 with SSMS version 18, although you withal cannot use them in a Group By clause. Of course, there are exceptions to every dominion I suppose. Yous can read more almost this in the section titled "A Work-Effectually" at the end of this SQL Tutorial.

IIt is important to note that using a GROUP BY clause is ineffective if at that place are no duplicates in the cavalcade yous are grouping by. When using the AdventureWorks2014 database and referencing the Person.Person table, if you GROUP By the "BusinessEntityID" column, it will return all xix,972 rows with a count of one on each row. A better example would exist to grouping past the "Championship" column of that tabular array. The SELECT clause below will return the six unique championship types as well as a count of how many times each one is found in the tabular array inside the "Title" cavalcade. This is the cadre basics of using a Grouping Past clause.

USE AdventureWorks2014;Become   SELECT Title, COUNT(*) Equally 'Count' FROM Person.Person WHERE Championship IS Non NULL GROUP By Title; GO          

Results:

query results

An ORDER Past clause was not used in this sample and equally you tin can come across there is no lodge to the result set. If yous need to use an ORDER BY clause, it must follow the Grouping BY clause. The other detail you lot may notice in the above query, is that we used a WHERE filter to cull out any rows that are NULL. This is certainly optional. If you want to include the rows that are NULL, simply remove the WHERE clause from the query.

The following results are given when we allow Zippo values past removing the WHERE clause from the code block in a higher place.

query results

AlAlthough the GROUP Past clause is most commonly used with the COUNT, AVG, MIN, MAX, and SUM functions to return numerical data, for the purpose of charts among other reasons, it tin can also be used to categorize names, places, regions, etc. without returning, nor relying on, a numeric value. In the sample beneath, nosotros volition return a list of the "CountryRegionName" column and the "StateProvinceName" from the "Sales.vSalesPerson" view in the AdventureWorks2014 sample database. In the starting time SELECT statement, we will non do a GROUP Past, merely instead, we will simply use the Club BY clause to make our results more readable sorted as either ASC (default) or DESC.

In the second SELECT argument, we will GROUP BY the "CountryRegionName" followed by the "StateProvinceName" columns. The offset SELECT statement will return all 17 rows in the table. Nevertheless, the second SELECT statement volition simply return 14 rows. Since "Washington, U.s.a." is listed four times in the table, the Group Past clause volition "group" those 4 entries into one entry for Washington.p>

USE AdventureWorks2014; Go   SELECT StateProvinceName, CountryRegionName  FROM Sales.vSalesPerson  Club Past CountryRegionName, StateProvinceName; Get   SELECT StateProvinceName, CountryRegionName  FROM Sales.vSalesPerson Grouping BY CountryRegionName, StateProvinceName; Become          

query results

It's not often yous will demand to render results similar the sample above, most of the fourth dimension y'all volition be working with an amass function. But, it's prissy to know that you lot tin can practice this as well equally how to do this, should the need arise.

Now, moving forward to some more than common methods of using the Group BY clause.

Aggregates with the SQL GROUP Past Clause

T-SQL (Transact SQL) offers nine amass functions, all of which can exist used with the GROUP Past clause. The v most common aggregate functions that will be discussed in this article are the, COUNT(), AVG(), MIN(), MAX(), and SUM(). The four remaining amass functions; STDDEV(), STDDEVP(), VAR(), and VARP() functions are specifically related to fiscal and statistical calculations.

The STDDEV() and STDDEVP() functions calculate sample standard divergence and population standard deviation respectively. The VAR() and VARP() functions calculate the sample variance and population variance respectively. An piece of cake way to recollect what these four do, is to remember that the DEV named functions provide difference statistics, while the VAR named functions provide the variance statistics. You tin can read more about these four amass functions on Microsoft Docs.

COUNT()

In its simplest grade, the COUNT() function can be used in one of ii means. Within the parenthesis you tin call the column proper noun that you want to count by, or you can use an * (asterisk) between the parenthesis.

  1. Using the * (asterisk) volition count and return all rows even if they incorporate a NULL value.
  2. Specifying the cavalcade name will not count or return whatever rows that accept a NULL value.

So, it really depends on whether or not you need the data from the associated columns/rows where the focused column contains a Zero value.

Now, permit'due south utilize the COUNT() aggregate in the post-obit query. Using the "Sales.vSalesPerson" view in the AdventureWorks2014 sample database, we volition count how many times each country or region appears in that view.

Utilise AdventureWorks2014; Become   SELECT CountryRegionName, COUNT(*) Every bit 'Count' FROM Sales.vSalesPerson Group BY CountryRegionName; GO          

Results:

query results

As y'all tin see, the query returned a count of 11 for the United states of america, 2 for Canada, and i for each of the remaining countries. These represent how times, or how many rows, these places are found in the "Sales.vSalesPerson" view.

In this context, the Grouping BY works similarly to the Singled-out clause past returning only i entry per land/region. However, unlike the DISTINCT clause, when we added the COUNT() function, the results displayed how many times each country/region is found in the table.

AVG()

The AVG() function sums all the not-nothing values in a ready, so divides that number past the corporeality of non-null values in that set to render the boilerplate value every bit the result. Unlike the COUNT() role, the AVG() part will non accept the wild card * (asterisk) every bit a value within the parenthesis. You must specify which column yous want to return an averaged value on. Since the AVG() office is adding and dividing, (doing arithmetic on the cavalcade values), the columns must contain a numeric value. For case, you cannot return an average on a column that contains character (CHAR, VARCHAR, NVARCHAR) data types.

In the sample beneath, we are returning the average sales bonus value for each territory in the Sales.SalesPerson column from the AdventureWorks2014 database.

SELECT     TerritoryID    , AVG(Bonus) AS 'Avg Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS NOT NULL GROUP BY TerritoryID; GO          

Results:

query results

WeWe added a "WHERE" clause to cull out any NULL valued rows. This was just for clarity'due south sake. Since T-SQL ignores whatsoever NULL valued rows, it makes this WHERE clause purely corrective in nature. Had we left out the WHERE clause, the returned values would remain the same for all rows, except for the additional row representing the NULL values. The sample beneath shows the results without the WHERE clause.

SELECT    TerritoryID    , AVG(Bonus) Every bit 'Avg Bonus' FROM Sales.SalesPerson Group Past TerritoryID; Get          

Results:

query results

For the three following aggregates, use this link as a starting point: Max, Min, and Avg SQL Server Functions

MIN()

The MIN() function (as its name implies) returns the smallest value in the cavalcade specified. MIN() is non restricted to numeric values simply as some people believe, it can likewise be used to return the lowest values of CHAR(), VARCHAR(), NVARCHAR(), UNIQUEIDENTIFIER, or datetime data types equally well. Nevertheless, it cannot exist used with the BIT data type.

With the character data types CHAR(), VARCHAR(), and NVARCHAR(), the MIN() function sorts the cord values alphabetically and returns the commencement (lowest) value in the alphabetized list.

Using the same Sales.SalePerson table every bit nosotros did in the AVG() function example above, nosotros volition return the minimum value from the "Bonus" cavalcade instead of the average value.

USE AdventureWorks2014; Become   SELECT     TerritoryID     , MIN(Bonus) AS 'MinBonus' FROM Sales.SalesPerson WHERE TerritoryID IS Non NULL Grouping By TerritoryID; Become          

Results:

query results

MAX()

In dissimilarity to the MIN() function, the MAX() office returns the largest value of the specified cavalcade. It does this past utilizing a collating sequence assuasive it to work as efficiently on character columns and datetime columns as it does on numeric columns. Keeping consistency, we again volition exist working with the Sales.SalesPerson table and render the maximum, or highest amount, paid in a bonus for each territory.

USE AdventureWorks2014; GO   SELECT     TerritoryID    , MAX(Bonus) AS 'MAX Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS Non NULL Grouping By TerritoryID; Become          

Results:

query results

SUM()

The SUM() office returns the full value of all non-null values in a specified cavalcade. Since this is a mathematical process, it cannot be used on string values such as the CHAR, VARCHAR, and NVARCHAR data types. When used with a GROUP Past clause, the SUM() part volition return the total for each category in the specified table.

Using the Sales.SalesPerson tabular array in the AdventureWorks2014 database, we will render the sum (total) of all bonuses paid out to each territory plant in the GROUP Past clause.

USE AdventureWorks2014; Get   SELECT     TerritoryID    , SUM(Bonus) AS 'SUM Bonus' FROM Sales.SalesPerson WHERE TerritoryID IS NOT Nix GROUP BY TerritoryID; GO          

Results:

query results

For the three following aggregates, use this link as a starting bespeak: Group By in SQL Server with CUBE, ROLLUP and GROUPING SETS Examples.

Group By ROLLUP

ROLLUP is an extension of the GROUP By clause that creates a group for each of the column expressions. Additionally, it "rolls upwardly" those results in subtotals followed by a thou total. Under the hood, the ROLLUP office moves from right to left decreasing the number of column expressions that it creates groups and aggregations on. Since the column order affects the ROLLUP output, it tin can also affect the number of rows returned in the result set.

The Group BY ROLLUP can be written in i of two ways. You can declare the ROLLUP extension before y'all telephone call the column names or after. Both will render the same results. This is another i of those "personal preference" options of writing your SQL lawmaking.

Pick 1: (Calling the ROLLUP extension before the column names)

Grouping By ROLLUP(Country, RegionState);          

Selection 2: (Calling the ROLLUP extension after the cavalcade names)

Grouping BY Country, RegionState WITH ROLLUP;          

Observe the parenthesis surrounding the column names in option 1 that are not present in pick 2. Pick i must have the parenthesis, choice 2 must NOT take them.

Moving on. For this sample, nosotros are going to create a new table in the AdventureWorks2014 database under the default "dbo" schema.

USE AdventureWorks2014; Become   CREATE TABLE salesTest( Country VARCHAR(30), RegionState VARCHAR(30), Sales INT );   INSERT INTO salesTest VALUES('United States', 'Washington', 100); INSERT INTO salesTest VALUES('United States', 'Main', 200); INSERT INTO salesTest VALUES('United States', 'Oregon', 300); INSERT INTO salesTest VALUES('Canada', 'Alberta', 100); INSERT INTO salesTest VALUES('Canada', 'Ontario', 200); Get          

In the adjacent block of code, we will generate a "rollup" of all the "summed" values from Canada, a rollup of all the "summed" values from the United States, and finally a "total summed" value on the ii countries listed.

SELECT     Land     , RegionState    , SUM(Sales) AS 'Total Sales' FROM salesTest  GROUP Past ROLLUP(Country, RegionState); Go          

Results:

query results

From the result ready to a higher place, we run across that line 3 is the total of lines ane and 2 (the two regions from Canada), line 7 is the total from lines 4 – half dozen (the 3 states from the U.s.a.) and line eight is the grand (rollup) total of lines three and seven.

query resultsY'all can replace the NULL values in the table with descriptive values past altering the SQL code block with the ISNULL constraint as shown in the sample beneath.


SELECT    ISNULL(Land, 'Rollup') AS 'Country'    , ISNULL(RegionState, 'Total') AS 'RegionState'    , SUM(Sales) As 'Total Sales' FROM salesTest GROUP By ROLLUP(Country, RegionState); GO          

Results:

query results

Again, line 3 is the rollup total for Canada, line 7 is the rollup full for the United States and line eight is the "Yard" rollup total for lines three and seven.

GROUP BY CUBE

Another extension, or sub-clause, of the Group By clause is the CUBE. The CUBE generates multiple grouping sets on your specified columns and aggregates them. In short, it creates unique groups for all possible combinations of the columns you specify. For example, if you lot employ Grouping By CUBE on (column1, column2) of your table, SQL returns groups for all unique values (column1, column2), (NULL, column2), (column1, NULL) and (NULL, Cypher).

Perhaps the best manner to understand this, is to encounter it action. Here nosotros will go on using the tabular array we created in the previous section "Grouping BY ROLLUP".

USE AdventureWorks2014; GO   SELECT     Country     , RegionState    , SUM(Sales) AS 'Total Sales' FROM salesTest  Group BY ROLLUP(Country, RegionState); Get          

Results:

query results

As you tin can see in the result set above, the query has returned all groups with unique values of (column1, column2), (NULL, column2), (column1, NULL) and (NULL, Zip). The NULL Zippo result prepare on line eleven represents the total rollup of all the cubed curlicue up values, much like it did in the GROUP By ROLLUP section from to a higher place.

GROUP BY Group SETS()

The Grouping SETS choice gives you the ability to combine multiple Grouping Past clauses into one GROUP BY clause. The Grouping BY Group SETS() clause produces the same results as a UNION ALL that is applied to the specified groups. In other words, if I used a UNION ALL to group two elements or groups into 1, it would look something similar the code block beneath.

Use AdventureWorks2014; GO   SELECT     Country    , RegionState    , SUM(Sales) Equally TotalSales FROM salesTest GROUP Past ROLLUP(Country, RegionState) UNION ALL SELECT     Country    , RegionState    , SUM(Sales) As TotalSales FROM salesTest GROUP BY CUBE(Country, RegionState); Get          

Results:

query results

A manner to condense that Spousal relationship ALL code would exist to employ the GROUPING SETS() sub-clause as in the sample below.

SELECT     Country    , RegionState    , SUM(Sales) AS TotalSales FROM salesTest GROUP BY Grouping SETS  ( ROLLUP (Land, RegionState), CUBE (Country, RegionState) ); Go          

Results:

query results

Don't be surprised if your results do not return in the same order each fourth dimension; the Club BY clause will help with that.

As you tin can see, we are returning the same results equally the UNION ALL, but with a scrap less lawmaking.

But Wait, There'southward More than

I told you lot this was going to be Group Past on steroids. Here are some additional tricks for using the Group By clause that those books and gratis videos wouldn't tell you lot about.

Group BY with Multiple Tables

Like most things in SQL/T-SQL, you can ever pull your data from multiple tables. Performing this task while including a Grouping Past clause is no dissimilar than whatsoever other SELECT argument with a Grouping BY clause. The fact that you lot're pulling the information from two or more tables has no bearing on how this works. In the sample below, we will be working in the AdventureWorks2014 once more as we join the "Person.Address" table with the "Person.BusinessEntityAddress" table. I have also restricted the sample code to return only the superlative 10 results for clarity sake in the result fix.

Employ AdventureWorks2014; GO   SELECT Meridian(ten)     a.Metropolis    , COUNT(b.AddressID) AS EmployeeCount FROM  Person.Accost Equally a INNER JOIN Person.BusinessEntityAddress AS b ON a.AddressID = b.AddressID Group By a.City; Get          

Results:

query results

Grouping By with an Expression

Using a GROUP BY clause on a SELECT statement that contains an expression or built-in part reference, requires that yous also include the same expression in both the SELECT statement as well equally the GROUP BY clause. In the sample below, we volition use the DATEPART part to render only the year from the "ModifiedDate" column of the "Sales.SalesTerritory" table forth with the boilerplate corporeality due from the same table.

SELECT DATEPART(yyyy, ModifiedDate) AS 'Yr'       ,CAST(AVG(Circular(SalesYTD, 2, 1)) Equally numeric(nine,2)) AS 'Avg Sales'   FROM Sales.SalesTerritory   Group Past DATEPART(yyyy, ModifiedDate); Get          

Yous may have noticed that I added a picayune actress code to this i on the second line. We are performing a Circular() function on the "SalesYTD" cavalcade to render the results in a dollar format with ii decimal places. Without the ROUND() function, our output would have been either 5275120.9953 (if we would take used the "money" datatype) or 5275121.00 (if we would have used the "numeric" information blazon) without the ROUND() function.

Results:

query results

Grouping BY with a HAVING clause

Adding a HAVING clause later on your Grouping BY clause requires that you include whatsoever special weather in both clauses. If the SELECT statement contains an expression, then it follows suit that the GROUP By and HAVING clauses must contain matching expressions. It is like in nature to the "Grouping BY with an EXCEPTION" sample from above. In the next sample code block, we are (still using the AdventureWorks2014 database) now referencing the "Sales.SalesOrderHeader" table to return the total (sum) from the "TotalDue" column, but only for a particular year. That yr volition exist referenced within the HAVING clause.

SELECT     DATEPART(yyyy,OrderDate) Every bit 'Twelvemonth'       ,CAST(SUM(Round(TotalDue, two, 1)) Every bit numeric(12,two)) As 'Total Due'   FROM Sales.SalesOrderHeader   Group BY DATEPART(yyyy,OrderDate)   HAVING DATEPART(yyyy,OrderDate) = '2014'; Go          

Results:

query results

Limitations when using Grouping BY

Every bit you would wait, there are a few limitations when using the GROUP BY clause in your SELECT statement. Below is a list of the main limitations that you volition need to be familiar with.

For Group By clauses that contain ROLLUP, CUBE or GROUPING SETS:

  • The maximum number of expressions is 32.
  • The maximum number of groups is 4096.

For GROUP BY clauses that do not contain ROLLUP, CUBE or Group SETS:

  • The number of Grouping BY items is limited past the GROUP BY column size, aggregate values, and aggregated columns.

A Work-Around for Text, NText and Image Information Types

If, for some reason, you are stilling using "TEXT", "Ntext", and/or "Paradigm" datatypes in your database, information technology is highly recommended that you alter them to an advisable "current" datatype. If your state of affairs prevents you from updating these antiquated data types but you still need to GROUP BY using one or more than of these data types, there is a work-around for that.

For this sample, we must create a new table, since the AdventureWorks database samples do not come up preloaded with any tables that comprise any "TEXT", "NTEXT", or "Epitome" data type columns.

Apply AdventureWorks2014; Become   CREATE TABLE textTest1(    colID INT IDENTITY NOT NULL    , fName VARCHAR(20)    , Title TEXT    ); Become   INSERT INTO textTest1(fName, Title) VALUES('Bob', 'Programmer') ,('John', 'Manager') ,('Sarah', 'Clerk') ,('Melissa', 'Programmer') ,('Jeff', 'Director') ,('Sam', 'Programmer') ,('Eliot', 'Developer'); Go          

Now that we take a tabular array that contains a "TEXT" data type, permit's trying to run a standard SELECT – COUNT() query with a GROUP By clause.

SELECT Title, COUNT(*) As 'Count' FROM textTest1 GROUP BY Championship; GO          

This volition produce a "level 16, Land ii" error.

            Msg 306, Level sixteen, State ii, Line 20            The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.          

To work around this issue, nosotros tin can use the Cast office to convert the TEXT data type to a VARCHAR data type and become the desired results returned without an mistake.

SELECT CAST(Title AS varchar) As Title, COUNT(*) AS 'Count' FROM textTest1 GROUP By Bandage(Title As varchar); Get          

Note: you lot must use CAST() in both the SELECT statement also as the Group By clause. As mentioned earlier, in the "Group Past with an Expression" section, the GROUP Past clause parameter must be called exactly equally it is in the SELECT statement.

Results:

query results

Here, we accept returned the advisable count for each of the three unique values in the Title cavalcade.

Just for future reference, if yous are even so using the TEXT, NTEXT, and Prototype data types, yous can apply the CAST function to convert them to VARCHAR, NVARCHAR, and VARBINARY respectively.

A all-time exercise would be to create a view from the above SELECT statement to salvage fourth dimension and provide a more efficient manner of group on the tabular array(s) that have these deprecated information types.

Summary

The principal function of the GROUP By clause is to divide the rows inside a table into groups. Consider that a table is in itself a group, the GROUP BY clause just breaks that large group into smaller groups, like mini tables. From there you can manipulate the information within those mini tables (groups) in simply most any way y'all tin imagine.

Opposite to what about books and classes teach you, there are actually nine amass functions, all of which tin can be used with a Group By clause in your code. As nosotros have seen in the samples higher up, you tin have a Group BY clause without an aggregate office as well. As we demonstrated earlier in this article, the Group Past clause can grouping string values too, so it doesn't e'er have to exist a numeric or engagement value.

To sum information technology all up, there is a lot more than to the Grouping By clause than you would commonly learn in an introductory SQL course.

Side by side Steps
  • Group By in SQL Sever with CUBE, ROLLUP and Group SETS Examples
  • Grouping By in SQL Server
  • Aggregate Functions
  • CUBE and ROLLUP in SQL Server
  • HAVING clause in SQL Server

Related Manufactures

Popular Manufactures

About the writer

MSSQLTips author Aubrey Love Aubrey Love has been a Database Administrator for about 8 years and is currently working every bit a Microsoft SQL Server Concern Intelligence specialist.

View all my tips

Commodity Terminal Updated: 2021-09-09

forgeyobby1992.blogspot.com

Source: https://www.mssqltips.com/sqlservertip/6955/learning-sql-group-by-clause/

Enregistrer un commentaire for "Sql Select Then Group by and Then Select and Group by Again"