Pipeline Metrics
This reference lists all of the metrics that Feldera exports through
its /metrics
endpoint in Prometheus exposition format. It is
automatically generated using the documentation embedded in Prometheus
output.
All of the metrics exported by a particular Feldera pipeline are
labeled with the pipeline's UUID as pipeline
. Some metrics have
additional labels, as documented below.
See Monitoring and Profiling for a guide to setting up Prometheus and Grafana with Feldera. The Feldera template dashboard is a sample Grafana dashboard for Feldera.
Process Metrics
These metrics report statistics for a running Feldera pipeline process. When a pipeline process is killed and restarts from a checkpoint, the new process's metrics are for it alone, not cumulative with any previous instantiations.
These metrics are intended to match the standard Prometheus definitions.
Name | Type | Description |
---|---|---|
process_cpu_seconds_total | counter | Total user and system CPU time spent in seconds. |
process_max_fds | gauge | Maximum number of open file descriptors. |
process_open_fds | gauge | Number of open file descriptors. |
process_resident_memory_bytes | gauge | Resident set size in bytes. |
process_start_time_seconds | counter | Start time of the process in seconds since the Unix epoch. |
process_threads | gauge | Number of OS threads in the process. |
process_virtual_memory_bytes | gauge | Virtual memory size in bytes. |
process_virtual_memory_max_bytes | gauge | Maximum amount of virtual memory available in bytes. |
Feldera metrics
These metrics report statistics for Feldera operations.
Name | Type | Description |
---|---|---|
feldera_checkpoint_latency_seconds | histogram | Latency of overall checkpoint operations in seconds |
feldera_checkpoint_records_processed_total | counter | Total number of records that had processed when the most recent checkpoint successfully committed. |
feldera_checkpoint_written_bytes | histogram | Amount of data written to storage during checkpoints, in bytes. |
DBSP metrics
These metrics report statistics for DBSP, the low-level mechanism on which Feldera is built.
Name | Type | Description |
---|---|---|
compaction_stall_duration_seconds | counter | Time in seconds a worker was stalled waiting for more merges to complete. |
dbsp_operator_checkpoint_latency_seconds | histogram | Latency of individual operator checkpoint operations in seconds. (Because checkpoints run in parallel across workers, these will not add to feldera_checkpoint_latency_seconds .) |
dbsp_runtime_elapsed_seconds | counter | Time elapsed while the pipeline is executing a step, multiplied by the number of foreground and background threads, in seconds. |
dbsp_step_latency_seconds | histogram | Latency of DBSP steps over the last 60 seconds or 1000 steps, whichever is less, in seconds |
dbsp_steps_total | counter | Total number of DBSP steps executed. |
Record Processing
These metrics report overall counts of records as they pass through the pipeline. They accumulate across checkpoint and resume.
Name | Type | Description |
---|---|---|
output_buffered_batches | gauge | Number of batches of records currently buffered by the output connector. |
records_input_buffered | gauge | Total amount of data currently buffered by all endpoints, in records. |
records_input_buffered_bytes | gauge | Total amount of data currently buffered by all endpoints, in bytes. |
records_input_bytes_total | counter | Total amount of data received from all connectors, in bytes. |
records_input_total | counter | Total amount of data received from all connectors, in records. |
records_late_total | counter | Number of records dropped due to LATENESS annotations. |
records_processed_bytes_total | counter | Total amount of input processed by the pipeline, in bytes. |
records_processed_total | counter | Total amount of input processed by the pipeline, in records. |
Storage Performance
These metrics report the performance of storage, which allows Feldera to work with data larger than memory.
Name | Type | Description |
---|---|---|
files_created_total | counter | Total number of files created. |
files_deleted_total | counter | Total number of files deleted. |
storage_byte_seconds_total | counter | Storage usage integrated over time during this run of the pipeline, in bytes × seconds. |
storage_read_block_bytes | histogram | Sizes in bytes of blocks read from storage. |
storage_read_latency_seconds | histogram | Read latency for storage blocks in seconds |
storage_sync_latency_seconds | histogram | Sync latency in seconds |
storage_usage_bytes | gauge | The number of bytes of storage currently in use |
storage_write_block_bytes | histogram | Sizes in bytes of blocks written to storage. |
storage_write_latency_seconds | histogram | Write latency for storage blocks in seconds |
Pipeline Status
These metrics report the status of the pipeline.
Name | Type | Description |
---|---|---|
pipeline_complete | counter | Transitions from 0 to 1 when pipeline completes. |
pipeline_start_time_seconds | counter | Start time of the pipeline in seconds since the Unix epoch. This will be earlier than process_start_time_seconds if the pipeline resumed from a checkpoint. This will be zero if the pipeline resumed from a checkpoint produced by a pipeline too old to record its start time. |
Input Connectors
These metrics are per-input connector, labeled with endpoint
set to
the name of the input connector, which is either the name assigned in
the SQL program or automatically generated as unnamed-<number>
,
where <number>
counts starting from 1 for the first connector for a
given table.
These metrics accumulate across checkpoint and resume.
For byte counters, for some input connectors, such as columnar formats, bytes are difficult to attribute accurately to records, so Feldera approximates. Feldera also approximately attributes byte counts to records when it processes only some of the records in a batch in a DBSP step. This approximation is corrected when the remainder of the batch is processed in a subsequent step, so it is invisible to users unless a pause or checkpoint happens mid-batch.
Name | Type | Description |
---|---|---|
input_connector_buffered_records | gauge | Amount of data currently buffered by an input connector, in records. |
input_connector_buffered_records_bytes | gauge | Amount of data currently buffered by an input connector, in bytes. |
input_connector_bytes_total | counter | Total number of bytes received by an input connector. |
input_connector_errors_parse_total | counter | Total number of errors encountered parsing records received by the input connector. |
input_connector_errors_transport_total | counter | Total number of errors encountered by the input connector at the transport layer. |
input_connector_records_total | counter | Total number of records received by an input connector. |
input_connector_processing_latency_seconds | histogram | Time elapsed (seconds) between when the connector receives new data and when the pipeline processes this data and computes output updates. The histogram is maintained over at most the last 600 seconds or at most 10,000 samples. This latency includes: (1) parsing the input data, (2) buffering delay while data waits in the connector queue, and (3) processing time in the query engine to evaluate changes and compute outputs. |
input_connector_completion_latency_seconds | histogram | Time elapsed (seconds) between the connector receives new data and when the pipeline processes this data, computes output updates, and sends these updates to all output connectors. The histogram is maintained over at most the last 600 seconds or at most 10,000 samples. This latency includes input_connector_processing_latency_seconds plus the time required for all output connectors to write output updates to their data sinks. |
Output Connectors
These metrics are per-output connector, labeled with endpoint
set to
the name of the output connector, which is either the name assigned in
the SQL program or automatically generated as unnamed-<number>
,
where <number>
counts starting from 1 for the first connector for a
given view.
These metrics accumulate across checkpoint and resume.
Name | Type | Description |
---|---|---|
output_connector_buffered_records | gauge | Number of records currently buffered by the output connector. |
output_connector_bytes_total | counter | Total number of bytes of records sent by the output connector. |
output_connector_errors_encode_total | counter | Total number of errors encountered encoding records to send. |
output_connector_errors_transport_total | counter | Total number of errors encountered at the transport layer sending records. |
output_connector_records_total | counter | Total number of records sent by the output connector. |