Aggregate functions
Overview
COUNT
, SUM
, AVG
, MIN
, and MAX
are aggregate functions that can be used with or without the GROUP BY
and HAVING
clauses. When used without GROUP BY
or HAVING
, these functions perform the action that they are named after. For example, let’s look at our database from the joins page:
id | description | customer_id | value |
---|---|---|---|
1 |
Water bottle |
1 |
15.00 |
2 |
Key chain |
1 |
7.50 |
3 |
Computer |
3 |
2000.00 |
4 |
Thumb drive |
3 |
25.00 |
5 |
Notebook |
4 |
9.00 |
6 |
Shampoo |
5.00 |
|
7 |
Paper |
4.00 |
id | first_name | last_name | |
---|---|---|---|
1 |
Natalie |
Wright |
|
2 |
Ana |
Sousa |
|
3 |
Ben |
Schwartz |
|
4 |
Chen |
Xi |
|
5 |
Frank |
Zhang |
|
6 |
Tianchi |
Liu |
|
7 |
Jake |
Jons |
To count the number of rows in a table, for example, the orders
table, you could run the following query.
SELECT COUNT(*) FROM orders;
COUNT(*) |
---|
7 |
As you can see, by default the column header for the resulting row is COUNT(*)
. To rename the column, you can create an alias.
SELECT COUNT(*) AS 'number of results' FROM orders;
number of results |
---|
7 |
COUNT
only returns the number of rows returned by the query. If the query only returns 4 rows, 4 will be the result of COUNT
.
SELECT COUNT(*) AS 'number of results' FROM orders WHERE value > 10;
number of results |
---|
3 |
In COUNT(*)
, the *
is a wildcard that returns all columns from the selected table. In our examples with the orders
table, this is the same as running the following query:
SELECT id, description, customer_id, value FROM orders;
You can, of course, selected a specific column, however, if there are any NULL
values for the selected column, they will not be counted.
SELECT COUNT(customer_id) FROM orders;
COUNT(customer_id) |
---|
5 |
The rest of the functions work the same way. For example, SUM
returns the sum of all of the values in the selected rows.
SELECT SUM(value) AS 'our sum' FROM orders WHERE value > 10;
our sum |
---|
2040.0 |
MIN
, MAX
, and AVG
return the minimum, maximum, and average, respectively.
SELECT MIN(value) AS minimum, MAX(value) AS maximum, AVG(value) AS avg FROM orders WHERE value > 10;
minimum | maximum | avg |
---|---|---|
15.0 |
2000.0 |
680.0 |
GROUP BY
The capabilities of the aggregate functions are not realized until used in conjunction with the GROUP BY
clause. The GROUP BY
clause goes after the WHERE
clause (if WHERE
clause is present). GROUP BY
forces the SQL engine to calculate the aggregate function after the data has been grouped by one or more columns. For example, both of the following queries work. The first calculates the sum of the value
of all orders by the category
of the order. The second does the same thing, but excludes all rows where value
⇐ 10 before performing the operations.
SELECT category, SUM(value) FROM orders GROUP BY category;
category|SUM(value) |
---|
beauty products|5.0 |
electronics|2025.0 |
misc|22.5 |
office|13.0 |
SELECT category, SUM(value) FROM orders WHERE value > 10 GROUP BY category;
category | SUM(value) |
---|---|
electronics |
2025.0 |
misc |
15.0 |
This is powerful when you want to calculate values for many categories. For example, if we wanted to count the number of orders for each category
, we could do so with the following query.
SELECT category, COUNT(*) FROM orders GROUP BY category;
category | COUNT(*) |
---|---|
beauty products |
1 |
electronics |
2 |
misc |
2 |
office |
2 |
Or, you could get all of the statistics when grouped by category
.
SELECT category, COUNT(*), SUM(value), MIN(value), MAX(value), AVG(value) FROM orders GROUP BY category;
category | COUNT(*) | SUM(value) | MIN(value) | MAX(value) | AVG(value) |
---|---|---|---|---|---|
beauty products |
1 |
5.0 |
5.0 |
5.0 |
5.0 |
electronics |
2 |
2025.0 |
25.0 |
2000.0 |
1012.5 |
misc |
2 |
22.5 |
7.5 |
15.0 |
11.25 |
office |
2 |
13.0 |
4.0 |
9.0 |
6.5 |
HAVING
HAVING
adds one more tool to the tool chest. HAVING
is similar to the WHERE
clause but instead of being applied to every row before grouping, it is applied on the aggregated data. A single query can have both a WHERE
clause and a HAVING
clause. The WHERE
clause will filter the data before aggregation, and the HAVING
clause will filter the data after aggregation. The following is an example that tries to demonstrate this slightly subtle difference.
In this example, the SUM
of the value
is calculated for each category
, and afterwards, only results are included where the SUM
of the value
is greater than 10.
SELECT
category,
SUM(value)
FROM
orders
GROUP BY
category
HAVING
value > 10;
The following are also equivalent:
SELECT
category,
SUM(value)
FROM
orders
GROUP BY
category
HAVING
SUM(value) > 10;
SELECT
category,
SUM(value) AS 's_value'
FROM
orders
GROUP BY
category
HAVING
s_value > 10;
category | sum(value) |
---|---|
electronics |
2025.0 |
misc |
22.5 |
In this example, all rows where value ⇐ 10
are excluded, then the remaining rows are grouped by category
, and the SUM
of value
is calculated for the groups.
SELECT
category,
SUM(value)
FROM
orders
WHERE
value > 10
GROUP BY
category;
category | sum(value) |
---|---|
electronics |
2025.0 |
misc |
15.0 |
Examples
All of the following examples are taken from the chinook
database found at /anvil/projects/tdm/data/chinook/Chinook_Sqlite.sqlite
Find the average length (in seconds) of the tracks with genre Pop
.
Click to see solution
SELECT
AVG(Milliseconds / 1000.0) AS avg
FROM
Track
WHERE
genreId = 9;
avg |
---|
229.0341 |
Figure out what the longest track (in seconds) is for the "Bossa Nova" genre.
Click to see solution
SELECT
GenreId
FROM
Genre
WHERE
Name = 'Bossa Nova';
GenreId |
---|
11 |
SELECT
*,
MAX(Milliseconds / 1000.0) AS Seconds
FROM
Track
WHERE
genreId = 11;
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice | Seconds |
---|---|---|---|---|---|---|---|---|---|
646 |
Samba Da Bênção |
52 |
1 |
11 |
NA |
409965 |
13490008 |
0.99 |
409.965 |
Get the average price per hour for Bossa Nova music (genreId
of 11).
Click to see solution
SELECT
AVG(UnitPrice / (Milliseconds / 1000.0 / 3600)) AS 'Price per Hour'
FROM
Track
WHERE
genreId = 11;
Price per Hour |
---|
17.7131539609759 |
Get the average time (in seconds) for tracks by genre.
Click to see solution
SELECT
genreId,
AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
FROM
Track
GROUP BY
genreId;
GenreId | Average seconds per track |
---|---|
1 |
283.9100 |
2 |
291.7554 |
3 |
309.7494 |
4 |
234.3538 |
5 |
134.6435 |
6 |
270.3598 |
7 |
232.8593 |
8 |
247.1778 |
9 |
229.0341 |
10 |
244.3709 |
You can take this 1 step further by getting the genre names. This requires using an inner join, however.
SELECT
g.Name,
track_time. 'Average seconds per track'
FROM
Genre AS g
INNER JOIN (
SELECT
genreId,
AVG(Milliseconds / 1000.0) AS 'Average seconds per track'
FROM
Track
GROUP BY
genreId) AS track_time ON g.GenreId = track_time.GenreId
ORDER BY
track_time. 'Average seconds per track' DESC;
Name | Average seconds per track |
---|---|
Sci Fi & Fantasy |
2911.7830 |
Science Fiction |
2625.5491 |
Drama |
2575.2838 |
TV Shows |
2145.0410 |
Comedy |
1585.2637 |
Metal |
309.7494 |
Electronica/Dance |
302.9858 |
Heavy Metal |
297.4529 |
Classical |
293.8676 |
Jazz |
291.7554 |
Figure out the average price per track for each genre.
Click to see solution
SELECT
genreId,
AVG(UnitPrice) AS 'Average price per track'
FROM
Track
GROUP BY
genreId;
GenreId | Average price per track |
---|---|
1 |
0.99 |
2 |
0.99 |
3 |
0.99 |
4 |
0.99 |
5 |
0.99 |
6 |
0.99 |
7 |
0.99 |
8 |
0.99 |
9 |
0.99 |
10 |
0.99 |
What is the average number of tracks per album?
Click to see solution
SELECT
AVG(trackCount)
FROM (
SELECT
COUNT(*) AS trackCount
FROM
Track
GROUP BY
albumId) AS track_count;
AVG(trackCount) |
---|
10.0951 |
Figure out the average number of tracks per album genre.
Click to see solution
SELECT
genreId,
AVG(trackCount)
FROM (
SELECT
genreId,
COUNT(*) AS trackCount
FROM
Track
GROUP BY
albumId) AS track_count
GROUP BY
genreId;
genreId | AVG(trackCount) |
---|---|
1 |
11.41379 |
2 |
10.00000 |
3 |
10.90625 |
4 |
14.43478 |
5 |
12.00000 |
6 |
13.85714 |
7 |
14.81579 |
8 |
15.00000 |
9 |
16.00000 |
10 |
10.75000 |
Of course, we can add the names of genres in, using joins.
SELECT
Name,
avg_track_count. 'Average Track Count'
FROM
Genre AS g
INNER JOIN (
SELECT
genreId,
AVG(trackCount) AS 'Average Track Count'
FROM (
SELECT
genreId,
COUNT(*) AS trackCount
FROM
Track
GROUP BY
albumId) AS track_count
GROUP BY
genreId) AS avg_track_count ON g.GenreId = avg_track_count.genreId;
Name | Average Track Count |
---|---|
Rock |
11.41379 |
Jazz |
10.00000 |
Metal |
10.90625 |
Alternative & Punk |
14.43478 |
Rock And Roll |
12.00000 |
Blues |
13.85714 |
Latin |
14.81579 |
Reggae |
15.00000 |
Pop |
16.00000 |
Soundtrack |
10.75000 |