Troubleshooting
This guide covers issues Feldera Enterprise users and operators might run into in production, and steps to remedy them.
Common Error Messages
Delta Lake Connection Errors
Error: Table metadata is invalid: Number of checkpoint files '0' is not equal to number of checkpoint metadata parts 'None'
Solution: This usually happens when the Delta Table uses features unsupported by delta-rs
like liquid clustering or deletion vectors. Check the table properties and set the checkpoint policy to "classic":
ALTER TABLE my_table SET TBLPROPERTIES (
'checkpointPolicy' = 'classic'
)
Out-of-Memory Errors
Error: The pipeline container has restarted. This was likely caused by an Out-Of-Memory (OOM) crash.
Feldera runs each pipeline in a separate container with configurable memory limits. Here are some knobs to control memory usage:
-
Adjust the pipeline’s memory reservation and limit:
"resources": {
"memory_mb_min": 32000,
"memory_mb_max": 32000
} -
Throttle the amount of records buffered by the connector using the
max_queued_records
setting:"max_queued_records": 100000
-
Ensure that storage is enabled (it's on by default):
"storage": {
"backend": {
"name": "default"
},
"min_storage_bytes": null,
"compression": "default",
"cache_mib": null
}, -
Optimize your SQL queries to avoid expensive cross-products. Use functions like NOW() sparingly on large relations.
Kubernetes evictions
Error: the pipeline becomes UNAVAILABLE
with no errors in the logs.
Solution: configure resource reservations and limits for the Pipeline.
Kubernetes may evict Pipeline pods under node resource pressure. To confirm, run:
kubectl describe pipeline-<pipeline-id>-0
and look for
Status: Failed
Reason: Evicted
You can also view the eviction event in your cluster monitoring stack (e.g. Datadog).
Evictions typically happen only when running Feldera in shared Kubernetes clusters. The pods to evict are determined by Kubernetes Quality-of-Service classes.
By default, Feldera Pipelines do not reserve any CPU or memory resources, which
puts them in the BestEffort
priority class,
making them eviction candidates. To raise their priority:
-
Burstable
class: reserve a minimum amount of memory and CPU:"resources": {
"cpu_cores_min": 16,
"memory_mb_min": 32000,
} -
Guaranteed
class: set minimum and maximum resources to the same value, for memory and CPU:"resources": {
"cpu_cores_min": 16,
"cpu_cores_max": 16,
"memory_mb_min": 32000,
"memory_mb_max": 32000,
}
Rust Compilation Errors
Error: No space left on device
during Rust compilation
Solution: Ensure the compiler-server has sufficient disk space (20Gib by default, configured via the compilerPvcStorageSize
value in the Helm chart).
Diagnosing Performance Issues
When investigating pipeline performance, Feldera support will typically request:
-
Performance Tab: screenshots of the
Performance
tab in the UI to see memory usage, record counts, and processing times -
Pipeline Logs Tab: for warnings and errors
-
Circuit Profile: from the circuit profile API.
-
Heap Profile: from heap usage API.