Skip to main content

Pipeline Settings

Pipelines come with a set of configuration settings to toggle features, tune performance, and help with operations. If you're on the Enterprise Edition, you will likely need to configure the resources section to tune the CPU, memory, and storage resources used by the Pipeline depending on your infrastructure needs. Other than that, users rarely need to deviate from the supplied defaults.

Editing configuration

. You can edit all pipeline settings when the pipeline is Shutdown and a limited subset when it is Suspended.

Press the gear button in the top right corner of the code editor to access the dialog where you can edit the runtime and program configuration JSON.

Configure pipeline in web-console

Runtime configuration

important

Make sure to appropriately size resource limits (memory and storage), the number of worker threads and the storage backend to utilize available cluster resources.

checkpoint_during_suspend
boolean
Default: true
  • If true, the suspend operation will first atomically checkpoint the pipeline before deprovisioning the compute resources. When resuming, the pipeline will start from this checkpoint.
  • If false, then the pipeline will be suspended without creating an additional checkpoint. When resuming, it will pick up the latest checkpoint made by the periodic checkpointer or by invoking the /checkpoint API.
clock_resolution_usecs
integer or null <int64> >= 0
Default: 1000000

Real-time clock resolution in microseconds.

This parameter controls the execution of queries that use the NOW() function. The output of such queries depends on the real-time clock and can change over time without any external inputs. The pipeline will update the clock value and trigger incremental recomputation at most each clock_resolution_usecs microseconds.

It is set to 1 second (1,000,000 microseconds) by default.

Set to null to disable periodic clock updates.

cpu_profiler
boolean
Default: true

Enable CPU profiler.

The default value is true.

object
Default: {"model":"none","checkpoint_interval_secs":60}

Fault-tolerance configuration.

The default [FtConfig] (via [FtConfig::default]) disables fault tolerance, which is the configuration that one gets if [RuntimeConfig] omits fault tolerance configuration.

The default value for [FtConfig::model] enables fault tolerance, as Some(FtModel::default()). This is the configuration that one gets if [RuntimeConfig] includes a fault tolerance configuration but does not specify a particular model.

init_containers
any or null

Specification of additional (sidecar) containers.

max_buffering_delay_usecs
integer <int64> >= 0
Default: 0

Maximal delay in microseconds to wait for min_batch_size_records to get buffered by the controller, defaults to 0.

max_parallel_connector_init
integer or null <int64> >= 0
Default: null

The maximum number of connectors initialized in parallel during pipeline startup.

At startup, the pipeline must initialize all of its input and output connectors. Depending on the number and types of connectors, this can take a long time. To accelerate the process, multiple connectors are initialized concurrently. This option controls the maximum number of connectors that can be initialized in parallel.

The default is 10.

min_batch_size_records
integer <int64> >= 0
Default: 0

Minimal input batch size.

The controller delays pushing input records to the circuit until at least min_batch_size_records records have been received (total across all endpoints) or max_buffering_delay_usecs microseconds have passed since at least one input records has been buffered. Defaults to 0.

pin_cpus
Array of integers[ items >= 0 ]
Default: []

Optionally, a list of CPU numbers for CPUs to which the pipeline may pin its worker threads. Specify at least twice as many CPU numbers as workers. CPUs are generally numbered starting from 0. The pipeline might not be able to honor CPU pinning requests.

CPU pinning can make pipelines run faster and perform more consistently, as long as different pipelines running on the same machine are pinned to different CPUs.

provisioning_timeout_secs
integer or null <int64> >= 0
Default: null

Timeout in seconds for the Provisioning phase of the pipeline. Setting this value will override the default of the runner.

object
Default: {"cpu_cores_min":null,"cpu_cores_max":null,"memory_mb_min":null,"memory_mb_max":null,"storage_mb_max":null,"storage_class":null}
object or null
Default: {"backend":{"name":"default"},"min_storage_bytes":null,"min_step_storage_bytes":null,"compression":"default","cache_mib":null}

Storage configuration for a pipeline.

tracing
boolean
Default: false

Enable pipeline tracing.

tracing_endpoint_jaeger
string
Default: "127.0.0.1:6831"

Jaeger tracing endpoint to send tracing information to.

workers
integer <int32> >= 0
Default: 8

Number of DBSP worker threads.

Each DBSP "foreground" worker thread is paired with a "background" thread for LSM merging, making the total number of threads twice the specified number.

The typical sweet spot for the number of workers is between 4 and 16. Each worker increases overall memory consumption for data structures used during a step.

Program configuration

The "optimized" compilation profile (default) should be used when running production pipelines where performance is important.

cache
boolean
Default: true

If true (default), when a prior compilation with the same checksum already exists, the output of that (i.e., binary) is used. Set false to always trigger a new compilation, which might take longer and as well can result in overriding an existing binary.

profile
string or null
Default: null
Enum: "dev" "unoptimized" "optimized"

Enumeration of possible compilation profiles that can be passed to the Rust compiler as an argument via cargo build --profile <>. A compilation profile affects among other things the compilation speed (how long till the program is ready to be run) and runtime speed (the performance while running).