SQL Grammar
This is a short formal description of the grammar supported in a BNF form.
- Constructs enclosed between
[]
are optional. *
denotes zero or many repetitions.- Uppercase words (
FUNCTION
) and single-quoted text (')'
) indicate grammar terminals. - Parentheses
()
are used for grouping productions together. - The vertical bar
|
indicates choice between two constructs.
SQL reserved keywords cannot be used as table and view names. In
addition, the following keywords are reserved: USER
, NOW
. All
identifiers starting with "Feldera" are also reserved (in all case
combinations).
statementList:
statement [ ';' statement ]* [ ';' ]
statement
: createTableStatement
| declareRecursiveViewStatement
| createViewStatement
| createFunctionStatement
| createTypeStatement
| latenessStatement
columnDecl
: column generalType
Creating user-defined types
generalType
: type [NOT NULL]
createTypeStatement
: CREATE TYPE name AS '(' typedef ')'
typedef
: generalType
| name generalType [, name type ]*
Creating tables
createTableStatement
: CREATE TABLE name
'(' tableElement [, tableElement ]* ')'
[ 'WITH' keyValueList ]
tableElement
: columnName generalType ( columnConstraint )*
| columnName
| tableConstraint
columnConstraint
: PRIMARY KEY
| FOREIGN KEY REFERENCES identifier '(' identifier ')'
| LATENESS expression
| WATERMARK expression
| DEFAULT expression
tableConstraint
: [ CONSTRAINT name ]
{
CHECK '(' expression ')'
| PRIMARY KEY parensColumnList
}
| FOREIGN KEY parensColumnList REFERENCES identifier parensColumnList
parensColumnList
: '(' columnName [, columnName ]* ')'
keyValueList
: '(' keyValue ( ',' keyValue )* ')'
keyValue
: stringLiteral '=' stringLiteral
Columns that are part of a PRIMARY KEY
cannot have nullable types.
Columns that are part of a FOREIGN KEY
must refer to PRIMARY KEY
columns in some table. Their types must must match (but FOREIGN KEY
columns may be nullable).
CREATE TABLE
is used to declare tables. Tables correspond to input
data sources. A table declaration must list the table columns and
their types. Here is an example:
CREATE TABLE empsalary (
depname varchar not null PRIMARY KEY,
empno bigint FOREIGN KEY REFERENCES employee(empid),
salary int,
enroll_date date
);
A table declaration can have an optional WITH
clause which is used
to specify properties of the connector that provides the source data.
The properties are specified as key-value pairs, each written as a
string. Here is an example:
CREATE TABLE empsalary (
depname varchar,
empno bigint,
salary int,
enroll_date date
) WITH (
'source' = 'kafka',
'url' = 'localhost:8080'
);
Table properties that impact the program semantics
Materialized tables
Unlike a database, Feldera does not normally maintain the contents of
tables; it will only store as much data as necessary to compute future
outputs. By specifying the property 'materialized' = 'true'
a user
instructs Feldera to also maintain the complete contents of the table.
Such materialized tables can be browsed and queried at runtime.
See Materialized Tables and Views for more details.
Append-only tables
The append_only
Boolean property describes tables that only support
insertions. Updates or deletes are not allowed in such tables.
See Streaming SQL Extensions, append-only tables
Size hints
The property expected_size
can be used to pass information to the
SQL compiler about the expected size of a table in steady state
operation. The value of this property should be an integer value.
LATENESS
latenessStatement
: LATENESS view '.' column expression
See Streaming SQL Extensions, LATENESS
WATERMARKS
See Streaming SQL Extensions, WATERMARKS
Creating user-defined functions.
CREATE FUNCTION
is used to declare user-defined functions.
createFunctionStatement
: CREATE FUNCTION name '(' [ columnDecl [, columnDecl ]* ] ')' RETURNS generalType
[ AS expression ]
Creating views
CREATE VIEW
is used to declare a view. The optional LOCAL
keyword can be used to indicate that the declared view is not exposed
to the outside world as an output of the computation. This is useful
for modularizing the SQL code, by declaring intermediate views that
are used in the implementation of other views.
The MATERIALIZED
keyword instructs Feldera to maintain a full copy
of the view's output in addition to producing the
stream of changes.
Such materialized views can be browsed and queried at runtime.
See Materialized Tables and Views for more details.
DECLARE RECURSIVE VIEW
is used to declare a view that can afterwards
be used in a recursive SQL query. The syntax of this statement is
reminiscent of a table declaration, without constraints. Recursive
queries are documented in this section.
declareRecursiveViewStatement:
: DECLARE RECURSIVE VIEW name
'(' columnDecl [, columnDecl ]* ')'
createViewStatement
: CREATE [ LOCAL | MATERIALIZED ] VIEW name
[ '(' columnName [, columnName ]* ')' ]
[ 'WITH' keyValueList ]
AS query
query
: values
| WITH withItem [ , withItem ]* query
| {
select
| selectWithoutFrom
| query UNION [ ALL | DISTINCT ] query
| query EXCEPT [ ALL | DISTINCT ] query
| query MINUS [ ALL | DISTINCT ] query
| query INTERSECT [ ALL | DISTINCT ] query
}
[ ORDER BY orderItem [, orderItem ]* ]
[ LIMIT { count | ALL } ]
withItem
: name
[ '(' column [, column ]* ')' ]
AS '(' query ')'
values
: { VALUES | VALUE } expression [, expression ]*
select
: SELECT [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[ WHERE booleanExpression ]
[ GROUP BY [ ALL | DISTINCT ] { groupItem [, groupItem ]* } ]
[ HAVING booleanExpression ]
tablePrimary
: tableName '(' TABLE tableName ')'
| tablePrimary '(' columnDecl [, columnDecl ]* ')'
| [ LATERAL ] '(' query ')'
| UNNEST '(' expression ')' [ WITH ORDINALITY ]
| TABLE '(' functionName '(' expression [, expression ]* ')' ')'
groupItem:
expression
| '(' ')'
| '(' expression [, expression ]* ')'
| CUBE '(' expression [, expression ]* ')'
| ROLLUP '(' expression [, expression ]* ')'
selectWithoutFrom
: SELECT [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }
orderItem
: expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ]
projectItem
: expression [ [ AS ] columnAlias ]
| tableAlias . *
tableExpression
: tableReference [, tableReference ]*
| tableExpression [ NATURAL ] [ { LEFT | RIGHT | FULL } [ OUTER | ASOF ] ] JOIN tableExpression [ joinCondition ]
| tableExpression CROSS JOIN tableExpression
| tableExpression [ CROSS | OUTER ] APPLY tableExpression
joinCondition
: ON booleanExpression
| USING '(' column [, column ]* ')'
tableReference
: tablePrimary [ pivot ] [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]
pivot
: PIVOT '('
pivotAgg [, pivotAgg ]*
FOR pivotList
IN '(' pivotExpr [, pivotExpr ]* ')'
')'
pivotAgg
: agg '(' [ ALL | DISTINCT ] value [, value ]* ')'
[ [ AS ] alias ]
pivotList
: columnOrList
pivotExpr
: exprOrList [ [ AS ] alias ]
columnOrList
: column
| '(' column [, column ]* ')'
exprOrList
: expr
| '(' expr [, expr ]* ')'
Beware that in the SQL UNION
, EXCEPT
, and INTERSECT
statements
the column names are not used to reorder columns.
In orderItem
, if expression is a positive integer n, it denotes the
nth item in the SELECT
clause.
Aggregate queries
An aggregate query is a query that contains a GROUP BY
or a HAVING
clause, or aggregate functions in the SELECT
clause. In the
SELECT
, HAVING
and ORDER
BY clauses of an aggregate query, all
expressions must be constant within the current group (that is,
grouping constants as defined by the GROUP BY
clause, or constants),
or aggregate functions, or a combination of constants and aggregate
functions. Aggregate and grouping functions may only appear in an
aggregate query, and only in a SELECT
, HAVING
or ORDER BY
clause. Aggregate functions are described in this
section.
Sub-queries
A scalar sub-query is a sub-query used as an expression. If the
sub-query returns no rows, the value is NULL
; if it returns more
than one row, it is an error.
IN
, EXISTS
, and scalar sub-queries can occur in any place where an
expression can occur (such as the SELECT
clause, WHERE
clause,
ON
clause of a JOIN
, or as an argument to an aggregate function).
An IN
, EXISTS
, or scalar sub-query may be correlated; that is,
it may refer to tables in the FROM
clause of an enclosing query.
GROUP BY DISTINCT
removes duplicate grouping sets (for example,
GROUP BY DISTINCT GROUPING SETS ((a), (a, b), (a))
is equivalent to
GROUP BY GROUPING SETS ((a), (a, b))
); GROUP BY ALL
is equivalent
to GROUP BY
.
MINUS
is equivalent to EXCEPT
.
Grouping functions
Function | Description |
---|---|
GROUPING( expression, [ expression ]) | Returns a bit vector of the grouping expressions |
GROUPING_ID( expression, [ expression ]) | Synonym for GROUPING |
Example using GROUPING
:
select deptno, job, count(*) as c, grouping(deptno) as d,
grouping(job) j, grouping(deptno, job) as x
from emp
group by cube(deptno, job);
+--------+-----------+----+---+---+---+
| DEPTNO | JOB | C | D | J | X |
+--------+-----------+----+---+---+---+
| 10 | CLERK | 1 | 0 | 0 | 0 |
| 10 | MANAGER | 1 | 0 | 0 | 0 |
| 10 | PRESIDENT | 1 | 0 | 0 | 0 |
| 10 | | 3 | 0 | 1 | 1 |
| 20 | ANALYST | 2 | 0 | 0 | 0 |
| 20 | CLERK | 2 | 0 | 0 | 0 |
| 20 | MANAGER | 1 | 0 | 0 | 0 |
| 20 | | 5 | 0 | 1 | 1 |
| 30 | CLERK | 1 | 0 | 0 | 0 |
| 30 | MANAGER | 1 | 0 | 0 | 0 |
| 30 | SALESMAN | 4 | 0 | 0 | 0 |
| 30 | | 6 | 0 | 1 | 1 |
| | ANALYST | 2 | 1 | 0 | 2 |
| | CLERK | 4 | 1 | 0 | 2 |
| | MANAGER | 3 | 1 | 0 | 2 |
| | PRESIDENT | 1 | 1 | 0 | 2 |
| | SALESMAN | 4 | 1 | 0 | 2 |
| | | 14 | 1 | 1 | 3 |
+--------+-----------+----+---+---+---+
Window aggregates
One type of expression that can appear in a SELECT
statement is a
window aggregate. The grammar for window aggregates is:
windowedAggregateCall
: agg '(' [ ALL | DISTINCT ] value [, value ]* ')'
[ RESPECT NULLS | IGNORE NULLS ]
OVER windowSpec
| agg '(' '*' ')'
OVER windowSpec
windowSpec
: '('
[ windowName ]
PARTITION BY expression [, expression ]*
[ ORDER BY orderItem [, orderItem ]* ]
[
RANGE BETWEEN windowRange AND windowRange
]
')'
windowRange
: CURRENT
| ( UNBOUNDED | expression ) ( PRECEDING | FOLLOWING )
Where agg
is a window aggregate function as described in the section
on aggregation.
Currently we require window ranges to have constant values. This
precludes ranges such as INTERVAL 1 YEAR
, which have variable sizes.
The window bounds must be non-negative constant values.
Table functions
Table functions are invoked using the syntax TABLE(function(arguments))
.
See Table functions
ASOF
joins
An ASOF JOIN
operation combines rows from two tables based on
timestamp values. For each row in the left table, the join
finds at most a single row in the right table that has the "closest"
timestamp value. The matched row on the right side is the closest
match whose timestamp column is compared using the comparison operator in
the MATCH_CONDITION
clause. Currently, only >=
is supported.
The comparison is performed using SQL semantics, which returns false
when comparing NULL
values with any other values. Thus a NULL
timestamp in the left table will not match any timestamps in the right table.
Currently, only the LEFT
form of the ASOF JOIN
is supported. In this case,
when there is no match for a row in the left table, the columns from
the right table are null-padded.
SELECT *
FROM left_table LEFT ASOF JOIN right_table
MATCH_CONDITION ( left_table.timecol >= right_table.timecol )
ON left_table.col = right_table.col