Pipelines come with a set of configuration settings to toggle features, tune performance, and help with operations. If you're on the Enterprise Edition, you will likely need to configure the resources section to tune the CPU, memory, and storage resources used by the Pipeline depending on your infrastructure needs. Other than that, users rarely need to deviate from the supplied defaults.
Make sure to appropriately size resource limits (memory and storage), the number of worker threads and the storage backend to utilize available cluster resources.
checkpoint_during_suspend
boolean
Default: true
If true, the suspend operation will first atomically checkpoint the pipeline before
deprovisioning the compute resources. When resuming, the pipeline will start from this
checkpoint.
If false, then the pipeline will be suspended without creating an additional checkpoint.
When resuming, it will pick up the latest checkpoint made by the periodic checkpointer or
by invoking the /checkpoint API.
clock_resolution_usecs
integer or null <int64> >= 0
Default: 1000000
Real-time clock resolution in microseconds.
This parameter controls the execution of queries that use the NOW() function. The output of such
queries depends on the real-time clock and can change over time without any external
inputs. The pipeline will update the clock value and trigger incremental recomputation
at most each clock_resolution_usecs microseconds.
It is set to 1 second (1,000,000 microseconds) by default.
The default [FtConfig] (via [FtConfig::default]) disables fault tolerance,
which is the configuration that one gets if [RuntimeConfig] omits fault
tolerance configuration.
The default value for [FtConfig::model] enables fault tolerance, as
Some(FtModel::default()). This is the configuration that one gets if
[RuntimeConfig] includes a fault tolerance configuration but does not
specify a particular model.
init_containers
any or null
Specification of additional (sidecar) containers.
max_buffering_delay_usecs
integer <int64> >= 0
Default: 0
Maximal delay in microseconds to wait for min_batch_size_records to
get buffered by the controller, defaults to 0.
max_parallel_connector_init
integer or null <int64> >= 0
Default: null
The maximum number of connectors initialized in parallel during pipeline
startup.
At startup, the pipeline must initialize all of its input and output connectors.
Depending on the number and types of connectors, this can take a long time.
To accelerate the process, multiple connectors are initialized concurrently.
This option controls the maximum number of connectors that can be initialized
in parallel.
The default is 10.
min_batch_size_records
integer <int64> >= 0
Default: 0
Minimal input batch size.
The controller delays pushing input records to the circuit until at
least min_batch_size_records records have been received (total
across all endpoints) or max_buffering_delay_usecs microseconds
have passed since at least one input records has been buffered.
Defaults to 0.
pin_cpus
Array of integers[ items >= 0 ]
Default: []
Optionally, a list of CPU numbers for CPUs to which the pipeline may pin
its worker threads. Specify at least twice as many CPU numbers as
workers. CPUs are generally numbered starting from 0. The pipeline
might not be able to honor CPU pinning requests.
CPU pinning can make pipelines run faster and perform more consistently,
as long as different pipelines running on the same machine are pinned to
different CPUs.
provisioning_timeout_secs
integer or null <int64> >= 0
Default: null
Timeout in seconds for the Provisioning phase of the pipeline.
Setting this value will override the default of the runner.
Jaeger tracing endpoint to send tracing information to.
workers
integer <int32> >= 0
Default: 8
Number of DBSP worker threads.
Each DBSP "foreground" worker thread is paired with a "background"
thread for LSM merging, making the total number of threads twice the
specified number.
The typical sweet spot for the number of workers is between 4 and 16.
Each worker increases overall memory consumption for data structures
used during a step.
The "optimized" compilation profile (default) should be used when running production pipelines where performance is important.
cache
boolean
Default: true
If true (default), when a prior compilation with the same checksum
already exists, the output of that (i.e., binary) is used.
Set false to always trigger a new compilation, which might take longer
and as well can result in overriding an existing binary.
profile
string or null
Default: null
Enum:"dev""unoptimized""optimized"
Enumeration of possible compilation profiles that can be passed to the Rust compiler
as an argument via cargo build --profile <>. A compilation profile affects among
other things the compilation speed (how long till the program is ready to be run)
and runtime speed (the performance while running).