User-Defined Functions
The SQL statement CREATE FUNCTION
can be used to declare new
functions. Functions can be implemented either in SQL or in Rust.
User-defined functions written in SQL
The following example shows a user-defined function:
CREATE FUNCTION contains_number(str VARCHAR NOT NULL, value INTEGER)
RETURNS BOOLEAN NOT NULL
AS (str LIKE ('%' || CAST(value AS VARCHAR) || '%'));
CREATE VIEW V0 AS SELECT contains_number(CAST('YES: 10 NO:5' AS VARCHAR), 5)
User-defined functions written in Rust
Rust UDF support is currently experimental and may undergo significant changes, including non-backward-compatible modifications, in future releases of Feldera.
-
Feldera's incremental query engine assumes that all computations are deterministic. Using a non-deterministic UDF is likely to result in incorrect outputs. The SQL compiler cannot verify that Rust functions meet this requirement. Therefore, it is the responsibility of the UDF developer to ensure that their UDFs are deterministic, i.e., the function's return value must depend solely on its arguments and not on clocks, random number generators, external data sources, or other sources of non-determinism.
We are considering adding support for non-deterministic UDFs in the future. If your use case requires a non-deterministic UDF, please share your feedback by commenting on the associated GitHub issue
-
UDFs are compiled into native binary code and executed directly within the address space of the pipeline. Therefore, only trusted code should be included in UDFs. UDFs should not contain panics or invoke undefined behaviors. These can lead to crashes, memory corruption, or unpredictable behavior within the pipeline. UDFs should handle all runtime errors gracefully, by returning a
Result::Err()
(see below). This ensures that errors are propagated in a controlled manner and can be handled by the calling code.
Feldera supports UDFs implemented in Rust. To define a UDF, users declare the function in SQL without providing a body. The corresponding Rust implementation is supplied in a separate Rust file, which contains the logic for all UDFs declared within the SQL program. A complete set of files that defines the pipeline in this case is:
File | Description |
---|---|
program.sql | The SQL code, including table, view, and user-defined function declarations. |
stubs.rs | Auto-generated by the SQL compiler; contains Rust stubs for UDFs (see below). |
udf.rs | User-provided implementation of UDFs in Rust. |
udf.toml | User-provided list of external crates required by the code in udf.rs . This list is appended to the [dependencies] section of the Cargo.toml file. |
We recommend using the Web Console to develop and debug Rust UDFs, as described below. The Web Console code editor has a separate tab for each of these files, and supports an interactive development workflow. Once ready, the creation and deployment of the pipeline can be automated with scripts that make use of the Feldera CLI, the Python API, or the REST API. These scripts and the SQL/Rust code can then be maintained with the usual practices such as version control.
Step 1: Declare a function in the SQL program using CREATE FUNCTION
Declare a SQL function without a body:
-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;
Step 2: Locate the auto-generated Rust prototype
When the SQL compiler encounters such a function, it generates a Rust stub for it in the stubs.rs
file. The stub is an implementation of the UDF in Rust that simply invokes another function with the
same name and signature exported from the udf.rs
module:
/* stubs.rs */
...
pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
udf::base64(
s)
}
The udf::base64
function does not exist yet, so you will see a Rust compiler error similar to this:
error[E0425]: cannot find function `base64` in module `udf`
--> project01926e00-f404-7813-b65c-d66ac9959998/src/stubs.rs:11:10
|
11 | udf::base64(
| ^^^^^^ not found in `udf`
Step 3: Implement the UDF in Rust
Copy the declaration from stubs.rs
to udf.rs
and replace its body with the actual implementation:
/* udf.rs */
use feldera_sqllib::*;
use base64::prelude::*;
pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}
The use feldera_sqllib::*
directive imports the definitions of the
Rust types that the compiler uses to implement some of the SQL
datatypes. The next section explains what these types are.
If your UDF uses external crates, list these external dependencies in udf.toml
.
The contents of this file is appended to the [dependencies]
section in the generated
Cargo.toml
file.
base64 = "0.22.1"
Step 4: Deploy the UDF
Using Feldera CLI:
fda create udf_fda_test --udf-rs udf.rs --udf-toml udf.toml program.sql
Using Feldera Python SDK
Click to see how to create a pipeline with a Rust UDF using the Feldera Python SDK
from feldera import FelderaClient, PipelineBuilder
# SQL program.
sql = """
-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;
CREATE TABLE binary_t(
b VARBINARY
);
CREATE MATERIALIZED VIEW base64_v AS
SELECT
base64(b) as text
FROM
binary_t;
"""
# UDF implementation in Rust.
udf_rust = """
use feldera_sqllib::*;
use base64::prelude::*;
pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}"""
# External Rust dependencies.
udf_toml = """
base64 = "0.22.1"
"""
# Create a pipeline using the above SQL, Rust, and TOML code.
feldera = FelderaClient("http://localhost:8080")
pipeline = PipelineBuilder(
feldera, name="udf_test", sql=sql, udf_rust=udf_rust, udf_toml = udf_toml).create_or_replace()
pipeline.start()
next(pipeline.query("insert into binary_t values(X'0123456789ABCDEF')"))
output = pipeline.query("select * from base64_v")
assert list(output) == [{'text': 'ASNFZ4mrze8='}]
pipeline.shutdown()
pipeline.delete()
Using the REST API
Click to see how to create a pipeline with a Rust UDF using the Feldera REST API
#!/bin/bash
echo "
-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;
CREATE TABLE binary_t(
b VARBINARY
);
CREATE MATERIALIZED VIEW base64_v AS
SELECT
base64(b) as text
FROM
binary_t;
" > program.sql
echo "
use feldera_sqllib::*;
use base64::prelude::*;
pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}" > udf.rs
echo "
base64 = \"0.22.1\"
" > udf.toml
curl -i -X PUT http://localhost:8080/v0/pipelines/udf_api_test \
-H 'Content-Type: application/json' \
-d "$(jq -Rsn \
--rawfile sql program.sql \
--rawfile udf_rust udf.rs \
--rawfile udf_toml udf.toml \
'{
name: "udf_api_test",
description: "Create a UDF using REST API",
program_code: $sql,
runtime_config: {},
program_config: {},
udf_rust: $udf_rust,
udf_toml: $udf_toml
}')"
SQL type representation in Rust
The type representing SQL data structures may change in the future.
The following table shows the Rust representation of standard SQL data
types. A nullable SQL type is represented by the corresponding rust
Option<>
type. Notice that some of these types are not standard
Rust types, but are defined in the
feldera-sqllib
crate, which is part of the Feldera SQL runtime.
SQL | Rust |
---|---|
BOOLEAN | bool |
TINYINT | i8 |
SMALLINT | i16 |
INT | i32 |
BIGINT | i64 |
DECIMAL(p, s) | rust_decimal::Decimal |
REAL | feldera_sqllib::F32 |
DOUBLE | feldera_sqllib::F64 |
CHAR , CHAR(n) | String |
VARCHAR , VARCHAR(n) | String |
BINARY , BINARY(n) , VARBINARY , VARBINARY(n) | feldera_sqllib::ByteArray |
NULL | () |
INTERVAL | feldera_sqllib::ShortInterval , feldera_sqllib::LongInterval |
TIME | feldera_sqllib::Time |
TIMESTAMP | feldera_sqllib::Timestamp |
DATE | feldera_sqllib::Date |
T ARRAY | Vec<T> |
MAP<K, V> | BTreeMap<K, V> |
UUID | feldera_sqllib::Uuid |
VARIANT | feldera_sqllib::Variant |
Multiple SQL types may be represented by the same Rust type. For
example, CHAR
, CHAR(n)
, VARCHAR(n)
, and VARCHAR
are all
represented by the standard Rust String
type.
The SQL family of INTERVAL
types translates to one of two Rust
types: ShortInterval
(representing intervals from days to seconds),
and LongInterval
(representing intervals from years to months).
(Our dialect of SQL does not allow mixing the two kinds of intervals
in a single expression.)
Return types
In the Rust implementation the function always has to return the type
Result<T, Box<dyn std::error::Error>>
, where T
is the Rust
equivalent of the expected return type of the SQL function. The Rust
function should return an Err
only when the function fails at
runtime with a fatal condition, e.g., array index out of
bounds, arithmetic overflows, etc.
Developing complex UDFs
While many useful UDFs can be implemented with just a few lines of Rust,
some may require more complex code that is easier to develop
using a full-featured Rust IDE. To support the development of such
UDFs, we made the feldera-sqllib
crate available on
crates.io.
In order to implement a complex Rust UDF (or a library of UDFs) using
a Rust IDE:
- Create a new Rust crate to serve as the container for your UDFs.
- Add
feldera-sqllib
as a dependency toCargo.toml
(use the crate version that matches the version of Feldera you are working with). - Implement and test your UDFs within this crate.
- Copy the final Rust code and dependencies to the Feldera Web Console.
If your UDFs require a larger Rust project with multiple modules, we
recommend encapsulating the majority of the UDF logic in a crate.
This crate can be hosted on crates.io or GitHub.
Import this crate to your Feldera pipeline via the udf.toml
file,
including only wrapper functions that call the API of this crate in
udf.rs
.
Limitations
-
Currently only limited implicit casts are considered for the function arguments and function result in the SQL program. For example, a call such as
CONTAINS_NUMBER('2010-10-20', '5')
will fail at SQL compilation time because the first argument has typeCHAR(8)
instead ofVARCHAR
, and the second argument has typeCHAR(1)
instead ofINTEGER
. This can be avoided by calling the function using an explicit cast:CONTAINS_NUMBER(CAST('2010-10-20' AS VARCHAR), CAST('5' AS INTEGER))
. -
User-defined functions cannot have names identical to standard SQL library function names.
-
Polymorphic functions are not supported. For example, in SQL the addition operation operates on any numeric types; such an operation cannot be implemented as a single user-defined function.