Aggregate operations
Standard aggregate operations
A SELECT
expression in the SQL grammar can contain one
or more aggregation functions. Aggregate functions are specified
using the following grammar:
aggregateCall:
agg '(' [ ALL | DISTINCT ] value [, value ]* ')'
[ WITHIN DISTINCT '(' expression [, expression ]* ')' ]
[ FILTER '(' WHERE condition ')' ]
| agg '(' '*' ')' [ FILTER (WHERE condition) ]
where agg
is one of the operators in the following table.
If FILTER
is present, the aggregate function only considers rows for
which condition evaluates to TRUE
.
If DISTINCT
is present, duplicate argument values are eliminated
before being passed to the aggregate function.
If WITHIN DISTINCT
is present, argument values are made distinct
within each value of specified keys before being passed to the
aggregate function.
Important: the aggregate result type is the same as the type of the value aggregated. Since an aggregate combines multiple values, this can cause overflows in the computation, which can cause runtime exceptions. We recommend to use explicit casts in SQL programs converting the aggregated values to a data type wide enough to store all intermediate aggregation results. Example:
Instead of SELECT SUM(col)
, you should write SELECT SUM(CAST col AS DECIMAL(10, 4))
if you expect 10-digit results to be possible.
Aggregate | Description |
---|---|
ARRAY_AGG([ ALL | DISTINCT ] value [ RESPECT NULLS | IGNORE NULLS ] ) | Gathers all values in an array. The order of the values in the array is unspecified (but it is deterministic). |
AVG( [ ALL | DISTINCT ] numeric) | Returns the average (arithmetic mean) of numeric across all input values |
ARG_MAX(value, compared) | Returns a value for one of the rows containing the maximum value of compared in the group.
The rule for selecting the value is not specified if there are multiple rows with the same maximum value. |
ARG_MIN(value, compared) | Returns a value for one of the rows containing the minimum value of compared in the group.
This rule for selecting the value is not specified if there are multiple rows with the same minimum value. |
BIT_AND( [ ALL | DISTINCT ] value) | Returns the bitwise AND of all non-null input values, or null if none; integer and binary types are supported |
BIT_OR( [ ALL | DISTINCT ] value) | Returns the bitwise OR of all non-null input values, or null if none; integer and binary types are supported |
BIT_XOR( [ ALL | DISTINCT ] value) | Returns the bitwise XOR of all non-null input values, or null if none; integer and binary types are supported |
COUNT(*) | Returns the number of input rows |
COUNT( [ ALL | DISTINCT ] value [, value ]*) | Returns the number of input rows for which value is not null. If the argument contains multiple expressions, it counts only expressions where all fields are non-null. |
EVERY(condition) | Returns TRUE if all of the values of condition are TRUE |
LOGICAL_OR or BOOL_OR | Same as SOME |
LOGICAL_AND or BOOL_AND | Same as EVERY |
MAX( [ ALL | DISTINCT ] value) | Returns the maximum value of value across all input values |
MIN( [ ALL | DISTINCT ] value) | Returns the minimum value of value across all input values |
SOME(condition) | Returns TRUE if one or more of the values of condition is TRUE |
SUM( [ ALL | DISTINCT ] numeric) | Returns the sum of numeric across all input values |
STDDEV( [ ALL | DISTINCT ] value) | Synonym for STDDEV_SAMP |
STDDEV_POP( [ ALL | DISTINCT ] value) | Returns the population standard deviation of numeric across all input values |
STDDEV_SAMP( [ ALL | DISTINCT ] value) | Returns the sample standard deviation of numeric across all input values |
Comparisons like MAX
, MIN
, ARG_MIN
, and ARG_MAX
are defined
for all data types, and they use the standard comparison
operations.
If FILTER
is specified, then only the input rows for which the
filter_clause evaluates to true are fed to the aggregate function;
other rows are discarded. For example:
SELECT
count(*) AS unfiltered,
count(*) FILTER (WHERE i < 5) AS filtered
FROM TABLE
In addition, the following two constructors act as aggregates:
Constructor | Description | Example |
---|---|---|
ARRAY(sub-query) | Creates an array from the result of a sub-query. If the subquery returns a tuple, the array will be an array of tuples. | SELECT ARRAY(SELECT empno FROM emp) or SELECT ARRAY(SELECT empno, dept FROM emp) |
MAP(sub-query) | Creates a map from the result of a sub-query that returns two columns. If multiple entries have the same key, the largest value wins. | SELECT MAP(SELECT empno, deptno FROM emp) |
Window aggregate functions
A SELECT
expression in the SQL grammar can also
contain a window aggregate function.
The following window aggregate functions are supported:
Aggregate | Description |
---|---|
AVG(numeric) | Returns the average (arithmetic mean) of numeric across all values in window |
COUNT( value [, value ]*) | Returns the number of rows in window for which value is not null |
COUNT(*) | Returns the number of rows in window |
DENSE_RANK() | Returns the rank of the current row without gaps. DENSE_RANK is currently only supported if
the window is used to compute a TopK aggregate. |
LAG( expression, [offset, [ default ] ]) | Returns expression evaluated at the row that is offset rows before the current row
within the partition; if there is no such row, instead returns default.
Both offset and default are evaluated with respect to the current row.
If omitted, offset defaults to 1 and default to NULL . |
LEAD( expression, [offset, [ default ] ]) | Returns expression evaluated at the row that is offset rows after the current row
within the partition; if there is no such row, instead returns default.
Both offset and default are evaluated with respect to the current row.
If omitted, offset defaults to 1 and default to NULL . |
MAX (expression) | Returns the maximum value of expression across all values in window |
MIN (expression) | Returns the minimum value of expression across all values in window |
RANK() | Returns the rank of the current row with gaps. DENSE_RANK is currently only supported if
the window is used to compute a TopK aggregate. |
ROW_NUMBER() | Returns the number of the current row within its partition, counting from 1.
ROW_NUMBER is currently only supported if the window is used to compute a TopK aggregate. |
SUM (numeric) | Returns the sum of numeric across all values in window |
Currently, the window aggregate functions RANK
, DENSE_RANK
and
ROW_NUMBER
are only supported if the compiler detects that they are
being used to implement a TopK pattern. This pattern is expressed in
SQL with the following structure:
SELECT * FROM (
SELECT empno,
row_number() OVER (ORDER BY empno) rn
FROM empsalary) emp
WHERE rn < 3
Pivots
The SQL PIVOT
operation can be used to turn rows into columns. It
usually replaces a GROUP-BY
operation when the group keys are known
in advance. Instead of producing one row for each group, PIVOT
can
produce one column for each group.
Syntax
PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ]
FOR column_with_data IN ( column_list ) )
Parameters
-
aggregate_expression Specifies an aggregate expression (
SUM
,COUNT(DISTINCT )
, etc.). -
aggregate_expression_alias Specifies a column name for the aggregate expression.
-
column_with_data A column that produces all the values that will become new column names.
-
column_list Columns that show the pivoted data.
Example
CREATE TABLE FURNITURE (
type VARCHAR,
year INTEGER,
count INTEGER
);
INSERT INTO FURNITURE VALUES
('chair', 2020, 4),
('table', 2021, 3),
('chair', 2021, 4),
('desk', 2023, 1),
('table', 2023, 2);
SELECT year, type, SUM(count) FROM FURNITURE GROUP BY year,type;
year | type | sum
-------------------
2020 | chair | 4
2021 | table | 3
2021 | chair | 4
2023 | desk | 1
2023 | table | 2
(5 rows)
SELECT * FROM FURNITURE
PIVOT (
SUM(count) AS ct
FOR type IN ('desk' AS desks, 'table' AS tables, 'chair' as chairs)
);
year | desks | tables | chairs
------------------------------
2020 | | | 4
2021 | | 3 | 4
2023 | 1 | 2 |
(3 rows)
Notice how the same information is presented in a tabular form where
we have a column for each type of object. PIVOTs require all the
possible "type"s to be specified when the query is written. Notice
that if we add an additional type, the GROUP BY
query will produce a
correct result, while the PIVOT
query will produce the same result.
INSERT INTO FURNITURE VALUES ('bed', 2020, 5);
SELECT year, type, SUM(count) FROM FURNITURE GROUP BY year,type;
year | type | sum
-------------------
2020 | chair | 4
2020 | bed | 5
2021 | table | 3
2021 | chair | 4
2023 | desk | 1
2023 | table | 2
(6 rows)
SELECT * FROM FURNITURE
PIVOT (
SUM(count) AS ct
FOR type IN ('desk' AS desks, 'table' AS tables, 'chair' as chairs)
);
year | desks | tables | chairs
------------------------------
2020 | | | 4
2021 | | 3 | 4
2023 | 1 | 2 |
(3 rows)
Grouped auxiliary functions
Grouped auxiliary functions allow you to access properties of a window defined by a grouped window function.
Operator syntax | Description |
---|---|
TUMBLE_END(expression, interval [, time ]) | Returns the value of expression at the end of the window defined by a TUMBLE function call |
TUMBLE_START(expression, interval [, time ]) | Returns the value of expression at the beginning of the window defined by a TUMBLE function call |