Skip to main content

User-Defined Functions

The SQL statement CREATE FUNCTION can be used to declare new functions. Functions can be implemented either in SQL or in Rust.

User-defined functions written in SQL

The following example shows a user-defined function:

CREATE FUNCTION contains_number(str VARCHAR NOT NULL, value INTEGER)
RETURNS BOOLEAN NOT NULL
AS (str LIKE ('%' || CAST(value AS VARCHAR) || '%'));

CREATE VIEW V0 AS SELECT contains_number(CAST('YES: 10 NO:5' AS VARCHAR), 5)

User-defined functions written in Rust

Experimental feature

Rust UDF support is currently experimental and may undergo significant changes, including non-backward-compatible modifications, in future releases of Feldera.

danger
  • Feldera's incremental query engine assumes that all computations are deterministic. Using a non-deterministic UDF is likely to result in incorrect outputs. The SQL compiler cannot verify that Rust functions meet this requirement. Therefore, it is the responsibility of the UDF developer to ensure that their UDFs are deterministic, i.e., the function's return value must depend solely on its arguments and not on clocks, random number generators, external data sources, or other sources of non-determinism.

    We are considering adding support for non-deterministic UDFs in the future. If your use case requires a non-deterministic UDF, please share your feedback by commenting on the associated GitHub issue

  • UDFs are compiled into native binary code and executed directly within the address space of the pipeline. Therefore, only trusted code should be included in UDFs. UDFs should not contain panics or invoke undefined behaviors. These can lead to crashes, memory corruption, or unpredictable behavior within the pipeline. UDFs should handle all runtime errors gracefully, by returning a Result::Err() (see below). This ensures that errors are propagated in a controlled manner and can be handled by the calling code.

Feldera supports UDFs implemented in Rust. To define a UDF, users declare the function in SQL without providing a body. The corresponding Rust implementation is supplied in a separate Rust file, which contains the logic for all UDFs declared within the SQL program. A complete set of files that defines the pipeline in this case is:

FileDescription
program.sqlThe SQL code, including table, view, and user-defined function declarations.
stubs.rsAuto-generated by the SQL compiler; contains Rust stubs for UDFs (see below).
udf.rsUser-provided implementation of UDFs in Rust.
udf.tomlUser-provided list of external crates required by the code in udf.rs. This list is appended to the [dependencies] section of the Cargo.toml file.

We recommend using the Web Console to develop and debug Rust UDFs, as described below. The Web Console code editor has a separate tab for each of these files, and supports an interactive development workflow. Once ready, the creation and deployment of the pipeline can be automated with scripts that make use of the Feldera CLI, the Python API, or the REST API. These scripts and the SQL/Rust code can then be maintained with the usual practices such as version control.

Step 1: Declare a function in the SQL program using CREATE FUNCTION

Declare a SQL function without a body:

-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;

Step 2: Locate the auto-generated Rust prototype

When the SQL compiler encounters such a function, it generates a Rust stub for it in the stubs.rs file. The stub is an implementation of the UDF in Rust that simply invokes another function with the same name and signature exported from the udf.rs module:

/* stubs.rs */
...

pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
udf::base64(
s)
}

The udf::base64 function does not exist yet, so you will see a Rust compiler error similar to this:

 error[E0425]: cannot find function `base64` in module `udf`
--> project01926e00-f404-7813-b65c-d66ac9959998/src/stubs.rs:11:10
|
11 | udf::base64(
| ^^^^^^ not found in `udf`

Step 3: Implement the UDF in Rust

Copy the declaration from stubs.rs to udf.rs and replace its body with the actual implementation:

/* udf.rs */
use feldera_sqllib::*;
use base64::prelude::*;

pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}

The use feldera_sqllib::* directive imports the definitions of the Rust types that the compiler uses to implement some of the SQL datatypes. The next section explains what these types are.

If your UDF uses external crates, list these external dependencies in udf.toml. The contents of this file is appended to the [dependencies] section in the generated Cargo.toml file.

base64 = "0.22.1"

Step 4: Deploy the UDF

Using Feldera CLI:

fda create udf_fda_test --udf-rs udf.rs --udf-toml udf.toml program.sql

Using Feldera Python SDK

Click to see how to create a pipeline with a Rust UDF using the Feldera Python SDK
from feldera import FelderaClient, PipelineBuilder

# SQL program.
sql = """
-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;

CREATE TABLE binary_t(
b VARBINARY
);

CREATE MATERIALIZED VIEW base64_v AS
SELECT
base64(b) as text
FROM
binary_t;
"""

# UDF implementation in Rust.
udf_rust = """
use feldera_sqllib::*;
use base64::prelude::*;

pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}"""

# External Rust dependencies.
udf_toml = """
base64 = "0.22.1"
"""

# Create a pipeline using the above SQL, Rust, and TOML code.
feldera = FelderaClient("http://localhost:8080")
pipeline = PipelineBuilder(
feldera, name="udf_test", sql=sql, udf_rust=udf_rust, udf_toml = udf_toml).create_or_replace()

pipeline.start()

next(pipeline.query("insert into binary_t values(X'0123456789ABCDEF')"))
output = pipeline.query("select * from base64_v")

assert list(output) == [{'text': 'ASNFZ4mrze8='}]

pipeline.shutdown()
pipeline.delete()

Using the REST API

Click to see how to create a pipeline with a Rust UDF using the Feldera REST API
#!/bin/bash

echo "
-- UDF which encodes text in base64.
CREATE FUNCTION base64(s VARBINARY) RETURNS VARCHAR;

CREATE TABLE binary_t(
b VARBINARY
);

CREATE MATERIALIZED VIEW base64_v AS
SELECT
base64(b) as text
FROM
binary_t;
" > program.sql

echo "
use feldera_sqllib::*;
use base64::prelude::*;

pub fn base64(s: Option<ByteArray>) -> Result<Option<String>, Box<dyn std::error::Error>> {
Ok(s.map(|v| BASE64_STANDARD.encode(v.as_slice())))
}" > udf.rs

echo "
base64 = \"0.22.1\"
" > udf.toml


curl -i -X PUT http://localhost:8080/v0/pipelines/udf_api_test \
-H 'Content-Type: application/json' \
-d "$(jq -Rsn \
--rawfile sql program.sql \
--rawfile udf_rust udf.rs \
--rawfile udf_toml udf.toml \
'{
name: "udf_api_test",
description: "Create a UDF using REST API",
program_code: $sql,
runtime_config: {},
program_config: {},
udf_rust: $udf_rust,
udf_toml: $udf_toml
}')"

SQL type representation in Rust

Experimental feature

The type representing SQL data structures may change in the future.

The following table shows the Rust representation of standard SQL data types. A nullable SQL type is represented by the corresponding rust Option<> type. Notice that some of these types are not standard Rust types, but are defined in the feldera-sqllib crate, which is part of the Feldera SQL runtime.

SQLRust
BOOLEANbool
TINYINTi8
SMALLINTi16
INTi32
BIGINTi64
DECIMAL(p, s)rust_decimal::Decimal
REALfeldera_sqllib::F32
DOUBLEfeldera_sqllib::F64
CHAR, CHAR(n)String
VARCHAR, VARCHAR(n)String
BINARY, BINARY(n), VARBINARY, VARBINARY(n)feldera_sqllib::ByteArray
NULL()
INTERVALfeldera_sqllib::ShortInterval, feldera_sqllib::LongInterval
TIMEfeldera_sqllib::Time
TIMESTAMPfeldera_sqllib::Timestamp
DATEfeldera_sqllib::Date
T ARRAYVec<T>
MAP<K, V>BTreeMap<K, V>
UUIDfeldera_sqllib::Uuid
VARIANTfeldera_sqllib::Variant

Multiple SQL types may be represented by the same Rust type. For example, CHAR, CHAR(n), VARCHAR(n), and VARCHAR are all represented by the standard Rust String type.

The SQL family of INTERVAL types translates to one of two Rust types: ShortInterval (representing intervals from days to seconds), and LongInterval (representing intervals from years to months). (Our dialect of SQL does not allow mixing the two kinds of intervals in a single expression.)

Return types

In the Rust implementation the function always has to return the type Result<T, Box<dyn std::error::Error>>, where T is the Rust equivalent of the expected return type of the SQL function. The Rust function should return an Err only when the function fails at runtime with a fatal condition, e.g., array index out of bounds, arithmetic overflows, etc.

Developing complex UDFs

While many useful UDFs can be implemented with just a few lines of Rust, some may require more complex code that is easier to develop using a full-featured Rust IDE. To support the development of such UDFs, we made the feldera-sqllib crate available on crates.io. In order to implement a complex Rust UDF (or a library of UDFs) using a Rust IDE:

  • Create a new Rust crate to serve as the container for your UDFs.
  • Add feldera-sqllib as a dependency to Cargo.toml (use the crate version that matches the version of Feldera you are working with).
  • Implement and test your UDFs within this crate.
  • Copy the final Rust code and dependencies to the Feldera Web Console.

If your UDFs require a larger Rust project with multiple modules, we recommend encapsulating the majority of the UDF logic in a crate. This crate can be hosted on crates.io or GitHub. Import this crate to your Feldera pipeline via the udf.toml file, including only wrapper functions that call the API of this crate in udf.rs.

Limitations

  • Currently only limited implicit casts are considered for the function arguments and function result in the SQL program. For example, a call such as CONTAINS_NUMBER('2010-10-20', '5') will fail at SQL compilation time because the first argument has type CHAR(8) instead of VARCHAR, and the second argument has type CHAR(1) instead of INTEGER. This can be avoided by calling the function using an explicit cast: CONTAINS_NUMBER(CAST('2010-10-20' AS VARCHAR), CAST('5' AS INTEGER)).

  • User-defined functions cannot have names identical to standard SQL library function names.

  • Polymorphic functions are not supported. For example, in SQL the addition operation operates on any numeric types; such an operation cannot be implemented as a single user-defined function.