Dynamically-typed values and JSON support
The VARIANT
type
Values of VARIANT
type are dynamically-typed.
Any such value holds at runtime two pieces of information:
- the data type
- the data value
Values of VARIANT
type can be created by casting any other value to
a VARIANT
, e.g., SELECT CAST(x AS VARIANT)
. Conversely, values of
type VARIANT
can be cast to any other data type, e.g., SELECT CAST(v AS INT)
.
A cast of a value of type VARIANT
to target type T
will
compare the runtime type with T
. If the types are identical or there
is a natural conversion from the runtime type to T
, the original value
is returned. Otherwise the CAST
returns NULL
.
A value of type VARIANT
that stores a MAP
can be converted to a
user-defined type. Each name of a field of the user-defined type is
used as an index into the map. If a field is missing in the map and
the corresponding field of the struct gets the NULL
value. Fields
in the map that do not correspond to the struct field names are
ignored. The user-defined structure field names are case-sensitive,
according to the compiler flag --unquotedCasing
.
Remember that the DECIMAL
type specified without precision is the
same as DECIMAL(0)
, with no digits after the decimal point. When
you cast a VARIANT
value to DECIMAL
you should specify a precision
and scale large enough for the values that you expect in the data.
Values of type ARRAY
, MAP
and user-defined types can be cast to
VARIANT
.
There exists a special value of VARIANT
type called null
. This
value is different from the SQL NULL
value. It is used to implement
the JSON null
value. An important difference is that two VARIANT
null
values are equal, whereas NULL
in SQL is not equal to anything.
Converting a user-defined type to a VARIANT
produces a VARIANT
storing a value of type MAP<VARCHAR, VARIANT>
, where each field of
the map corresponds to a field of the user-defined structure.
VARIANT
values also offer the following operations:
- indexing using array indexing notation
variant[index]
. If theVARIANT
is obtained from anARRAY
value, the indexing operation returns aVARIANT
whose value is the element at the specified index. Otherwise, this operation returnsNULL
- indexing using map element access notation
variant[key]
, wherekey
can have any legalMAP
key type. If theVARIANT
is obtained from aMAP
value that has en element with this key, aVARIANT
value holding the associated value in theMAP
is returned. OtherwiseNULL
is returned. If theVARIANT
is obtained from user-defined structure which has a field with the namekey
, this operation returns aVARIANT
value holding the corresponding field value. OtherwiseNULL
is returned. - field access using the dot notation:
variant.field
. This operation is interpreted as equivalent tovariant['field']
. Note, however, that the field notation is subject to the capitalization rules of the SQL dialect, so for correct operation the field may need to be quoted:variant."field"
Functions that operate on VARIANT
values
Function | Description |
---|---|
VARIANTNULL() | Can be used to create an instance of the VARIANT null value. |
TYPEOF(variant) | Argument must be a VARIANT value. Returns a string describing the runtime type of the value |
PARSE_JSON(string) | Parses a string that represents a JSON value, returns a VARIANT object, or NULL if parsing fails (more details below) |
TO_JSON(variant) | Argument must be a VARIANT value. Returns a string that represents the serialization of a VARIANT value. If the value cannot be represented as JSON, the result is NULL (more details below) |
PARSE_JSON
PARSE_JSON
converts a JSON value as follows:
- JSON
null
is converted to aVARIANT
null
value (not a SQLNULL
!); see above the description ofVARIANT
null
- JSON Booleans are converted to
BOOLEAN
values (wrapped inVARIANT
values) - JSON numbers are converted to
DECIMAL
values (wrapped inVARIANT
values) - JSON strings are converted to
VARCHAR
values (wrapped inVARIANT
values) - JSON arrays are converted to
VARIANT ARRAY
values (wrapped inVARIANT
values). Each array element is aVARIANT
- JSON objects are converted to
MAP<VARIANT, VARIANT>
values (wrapped inVARIANT
values). Each key and each value is aVARIANT
For example, PARSE_JSON("{"a": 1, "b": [2, 3.3, null]}")
generates the same SQL value that would be generated by the following code:
SELECT CAST(
MAP[
CAST('a' AS VARIANT), CAST(1.0 AS VARIANT),
CAST('b' AS VARIANT), CAST(ARRAY[
CAST(2.0 AS VARIANT),
CAST(3.3 AS VARIANT),
VARIANTNULL()
] AS VARIANT)
] AS VARIANT)
TO_JSON
TO_JSON
converts a VARIANT
value to a VARCHAR
:
- the
VARIANT
null
value is converted to the stringnull
- a
VARIANT
wrapping a Boolean value is converted to the respective Boolean stringtrue
orfalse
- a
VARIANT
wrapping any numeric value (DECIMAL
,TINYINT
,SMALLINT
,INTEGER
,BIGINT
,REAL
,DOUBLE
,DECIMAL
) is converted to the string representation of the value as produced using aCAST(value AS VARCHAR)
- a
VARIANT
wrapping aVARCHAR
value is converted to a string with double quotes, and with escape sequences for special characters (e.g., quotes), as mandated by the JSON grammar - a
VARIANT
wrapping anARRAY
with elements of any type is converted to a JSON array, and the elements are recursively converted - a
VARIANT
wrapping aMAP
whose keys have any SQLCHAR
type, orVARIANT
values wrappingCHAR
values will generate a JSON object, by recursively converting each key-value pair. - a
VARIANT
wrapping aDATE
,TIME
, orDATETIME
value will be serialized as a JSON string
Processing JSON data using VARIANT
The VARIANT
type enables efficient JSON processing in SQL. In this sense it is similar to
the JSONB
type in Postgres and other databases. There are two ways to convert
JSON data to and from VARIANT
:
- Use
PARSE_JSON
andTO_JSON
functions to convert strings toVARIANT
and back. - Automatically, when ingesting data to or outputting data from columns of type
VARIANT
.
The following example demonstrates the first approach. Here, input events
contain a field called json
of type string, which carries JSON-encoded data.
We ingest this field as a string and use PARSE_JSON
to convert it to a
VARIANT
and store the result in an intermediate view.
CREATE TABLE json (id INT, json VARCHAR);
CREATE VIEW parsed_json AS SELECT id, PARSE_JSON(json) AS json FROM json;
Input events can use any supported data format. For instance, when
ingesting a JSON stream, a valid input record could look like this
(note the use of escaping in the json
field):
{"id": 123, "json": "{\"foo\": \"bar\"}"}
The second approach parses JSON into VARIANT
directly during ingestion, eliminating
the need for calling PARSE_JSON
explicitly:
CREATE TABLE json (id INT, json VARIANT);
Note that this program has a subtly different semantics from the previous one
depending on the input format used. For most input formats, e.g.,
Avro, Parquet, or CSV,
it is equivalent, i.e., it converts an input field of type string into a VARIANT
.
However, when the input stream carries JSON data using raw
or insert/delete encoding, the json
field can contain
an arbitrary JSON value, which gets parsed into VARIANT
:
{"id": 123, "json": {"name": "John Doe", "scores": [8, 10]}}
This is useful for processing semi-structured data, i.e., data whose schema is only
partially fixed or is too complex to represent in SQL.
In this case, the schema contains an integer field id
and a field called json
, whose
schema is not specified. The VARIANT
type allows us to parse this field and
manipulate its contents. For instance, the following query
extracts name
and scores
fields, interpets the latter as an array of numbers
and computes the average of the first two entries in the array:
CREATE TABLE json (id INT, json VARIANT);
CREATE VIEW average AS SELECT
CAST(json['name'] AS VARCHAR) as name,
((CAST(json['scores'][1] AS DECIMAL(8, 2)) + CAST(json['scores'][2] AS DECIMAL(8, 2))) / 2) as average
FROM json;
Note how object fields are accessed using
map indexing operators ['scores']
, ['name']
, and how array
elements are accessed using indexing with numeric values [1]
.
Recall that array indexes in SQL start from 1!
Finally, notice how the DECIMAL
values that are retrieved need to
specify the precision and scale: CAST(... AS DECIMAL(8, 2))
. Using
CAST(... AS DECIMAL)
would lose all digits after the decimal point.
Examples
Here are some simple SQL query examples using VARIANT
and JSON
values and the expected output values. (Note that these examples
cannot be executed directly, since they involve no views.)
SELECT CAST(1 AS VARIANT)
1
SELECT TYPEOF(CAST(1 AS VARIANT))
INTEGER
SELECT CAST(CAST(1 AS TINYINT) AS VARIANT)
1
-- The runtime knows that this is a TINYINT
SELECT TYPEOF(CAST(CAST(1 AS TINYINT) AS VARIANT))
TINYINT
-- Converting something to VARIANT and back works
SELECT CAST(CAST(1 AS VARIANT) AS INT)
1
-- Conversions between numeric types are allowed
SELECT CAST(CAST(1 AS VARIANT) AS TINYINT)
1
-- Some VARIANT objects when output receive double quotes
select CAST('string' as VARIANT)
"string"
-- CHAR(3) values are represented as VARCHAR in variants
SELECT CAST(CAST('abc' AS VARIANT) AS VARCHAR)
abc
-- VARCHAR and CHAR(N) have the same underlying runtime type
SELECT CAST(CAST('abc' AS VARIANT) AS CHAR(3))
abc
-- The value representing a VARIANT null value (think of a JSON null)
SELECT VARIANTNULL()
null
-- VARIANT null is not the same as SQL NULL
SELECT VARIANTNULL() IS NULL
false
-- Two VARIANT nulls are equal, unlike SQL NULL
SELECT VARIANTNULL() = VARIANTNULL()
true
SELECT TYPEOF(VARIANTNULL())
VARIANT
-- Variants delegate equality to the underlying values
SELECT CAST(1 AS VARIANT) = CAST(1 AS VARIANT)
true
-- To be equal two variants must have the same value and the same runtime type
SELECT CAST(1 AS VARIANT) = CAST(CAST(1 AS TINYINT) AS VARIANT)
false
-- An array of variant values can have values with any underlying type
SELECT ARRAY[CAST(1 AS VARIANT), CAST('abc' AS VARIANT)]
[1, "abc"]
-- A map with VARCHAR keys and VARIANT values
SELECT MAP['a', CAST(1 AS VARIANT), 'b', CAST('abc' AS VARIANT), 'c', CAST(ARRAY[1,2,3] AS VARIANT)]
{a=1, b="abc", c=[1, 2, 3]}
-- Variant values allow access by index, but return null if they are not arrays
SELECT (CAST(1 AS VARIANT))[1]
null
SELECT CAST(ARRAY[1,2,3] AS VARIANT)[1]
1
-- Accessing items in a VARIANT array returns VARIANT values,
-- even if the array itself does not contain VARIANT values
-- (Otherwise TYPEOF would not compile)
SELECT TYPEOF(CAST(ARRAY[1,2,3] AS VARIANT)[1])
INTEGER
SELECT CAST(x'0102' AS VARIANT)
x'0102;
SELECT CAST(CAST(x'0102' AS VARBINARY) AS VARIANT)
x'0102;
SELECT CAST(TIME '10:01:01' AS VARIANT)
10:01:01
-- One can access fields by name in a VARIANT, even if the
-- variant does not have named fields
SELECT CAST(ARRAY[1,2,3] AS VARIANT)['name']
null
-- One can access fields by name in a VARIANT, even if the
-- variant does not have named fields
SELECT CAST(ARRAY[1,2,3] AS VARIANT)."name"
null
-- One can access fields by index in a VARIANT
SELECT CAST(Map[1,'a',2,'b',3,'c'] AS VARIANT)[1]
"a"
SELECT TYPEOF(CAST(Map[1,'a',2,'b',3,'c'] AS VARIANT)[1])
VARCHAR
-- Note that field name is quoted to match the case of the key
SELECT CAST(Map['a',1,'b',2,'c',3] AS VARIANT)."a"
1
-- Unquoted field may not match, depending on the 'unquotedCasing' compiler flag
SELECT CAST(Map['A',1,'b',2,'c',3] AS VARIANT).A
NULL
-- The safest way is to index with a string
SELECT CAST(Map['a',1,'b',2,'c',3] AS VARIANT)['a']
1
-- Maps can have variant keys too
-- (but you have to index with a variant).
SELECT (Map[CAST('a' AS VARIANT), 1, CAST(1 AS VARIANT), 2])[CAST(1 AS VARIANT)]
2
-- Navigating a JSON-like object
SELECT CAST(MAP['a', CAST(1 AS VARIANT), 'b', CAST('abc' AS VARIANT), 'c', CAST(ARRAY[1,2,3] AS VARIANT)]
['c'][1] AS INTEGER)
1
SELECT PARSE_JSON(1)
1
SELECT PARSE_JSON('1')
1
-- Numeric values in JSON are stored as DECIMAL values
SELECT TYPEOF(PARSE_JSON('1'))
DECIMAL
SELECT PARSE_JSON('\"a\"')
"a"
SELECT PARSE_JSON('false')
false
-- A VARIANT null
SELECT PARSE_JSON('null')
null
SELECT TYPEOF(PARSE_JSON('null'))
VARIANT
-- a SQL null
SELECT PARSE_JSON(null)
NULL
SELECT PARSE_JSON('[1,2,3]')
[1, 2, 3]
SELECT PARSE_JSON('{\"a\": 1, \"b\": 2}')
{"a"=1,"b"=2}
-- all the following are strings
SELECT TO_JSON(PARSE_JSON(1))
1
SELECT TO_JSON(null)
NULL
SELECT TO_JSON(PARSE_JSON('1'))
1
SELECT TO_JSON(PARSE_JSON('\"a\"'))
a
SELECT TO_JSON(PARSE_JSON('false'))
false
SELECT TO_JSON(PARSE_JSON('null'))
NULL
SELECT TO_JSON(PARSE_JSON(null))
null
SELECT TO_JSON(PARSE_JSON('[1,2,3]'))
[1,2,3]
SELECT TO_JSON(PARSE_JSON('{ \"a\": 1, \"b\": 2 }'))
{\"a\":1,\"b\":2}
SELECT PARSE_JSON('{ \"a\": 1, \"b\": 2 }') = PARSE_JSON('{\"b\":2,\"a\":1}')
true
-- dates are emitted as strings
SELECT TO_JSON(CAST(DATE '2020-01-01' AS VARIANT))
"2020-01-01"
-- timestamps are emitted as strings
SELECT TO_JSON(CAST(TIMESTAMP '2020-01-01 10:00:00' AS VARIANT))
"2020-01-01 10:00:00"
-- values with user-defined types can be converted to JSON
CREATE TYPE S AS (i INT, s VARCHAR, a INT ARRAY);
SELECT TO_JSON(CAST(s(2, 'a', ARRAY[1, 2, 3]) AS VARIANT));
{"a":[1,2,3],"i":2,"s":"a"}
-- The result of JSON parsing can be converted to user-defined types
SELECT CAST(PARSE_JSON('{"i": 2, "s": "a", "a": [1, 2, 3]}') AS S);
{a=[1,2,3], i=2, s="a"}
-- This works even for nested types, such as user-defined types that
-- contain arrays of user-defined types
CREATE TYPE t AS (sa S ARRAY);
SELECT TO_JSON(CAST(t(ARRAY[s(2, 'a', ARRAY[1, NULL, 3]), s(3, 'b', array())]) AS VARIANT));
{"SA":[{"a":[1,null,3],"i":2,"s":"a"},{"a":[],"i":3,"s":"b"}]}
SELECT CAST(CAST(MAP['i', 0] AS VARIANT) AS S)
-- produces a structure S(I=0, A=NULL, S=NULL); missing fields are set to 'NULL'
SELECT CAST(CAST(MAP['i', 's'] AS VARIANT) AS S)
-- produces a structure S(I=NULL, A=NULL, S=NULL), since the field 'I' has the wrong type
SELECT CAST(CAST(MAP['I', 's'] AS VARIANT) AS S)
-- produces a structure S(I=NULL, A=NULL, S=NULL), since the field 'i' is uppercase
-- yet unquoted field names are converted to lowercase
SELECT CAST(CAST(MAP['i', 0, 'X', 2] AS VARIANT) AS S)
-- produces a structure S(I=NULL, A=NULL, S=NULL), since the extra field 'X' in the map is ignored
SELECT CAST(PARSE_JSON('{"sa": [{"i": 2, "s": "a", "a": [1, 2, 3]}]}') AS T)
-- produces a structure T(sa=[i=2, s="a", "a"={1,2,3}])