Troubleshooting
This guide covers issues Feldera Enterprise users and operators might run into in production, and steps to remedy them.
Diagnosing Performance Issues
When investigating pipeline performance, Feldera support will typically request a support-bundle. The bundle can be downloaded from your installation with one of the following methods:
- The
support-bundlefda command:
fda support-bundle affected-pipeline-name
- the
support_bundlefunction in the Python SDK. - the web-console has a button to download the bundle for a pipeline
- or the
support_bundleendpoint in the REST API.
The support bundle has the following content:
-
Pipeline Logs: for warnings and errors from the logs endpoint.
-
Pipeline Configuration: the pipeline configuration, including the SQL code and connector settings.
-
Pipeline Metrics: from the pipeline metrics endpoint.
-
Endpoint Stats: from the stats endpoint.
-
Circuit Profile: from the circuit profile endpoint.
-
Heap Profile: from heap usage endpoint.
Common Error Messages
Delta Lake Connection Errors
Error: Table metadata is invalid: Number of checkpoint files '0' is not equal to number of checkpoint metadata parts 'None'
Solution: This usually happens when the Delta Table uses features unsupported by delta-rs like liquid clustering or deletion vectors. Check the table properties and set the checkpoint policy to "classic":
ALTER TABLE my_table SET TBLPROPERTIES (
'checkpointPolicy' = 'classic'
)
Out-of-Memory Errors
Error: The pipeline container has restarted. This was likely caused by an Out-Of-Memory (OOM) crash.
Feldera runs each pipeline in a separate container with configurable memory limits. Here are some knobs to control memory usage:
-
Adjust the pipeline’s memory reservation and limit:
"resources": {
"memory_mb_min": 32000,
"memory_mb_max": 32000
} -
Throttle the amount of records buffered by the connector using the
max_queued_recordssetting:"max_queued_records": 100000 -
Ensure that storage is enabled (it's on by default):
"storage": {
"backend": {
"name": "default"
},
"min_storage_bytes": null,
"compression": "default",
"cache_mib": null
}, -
Optimize your SQL queries to avoid expensive cross-products. Use functions like NOW() sparingly on large relations.
Out-of-storage Errors
Error: The pipeline logs contain messages like:
DBSP error: runtime error: One or more worker threads terminated unexpectedly
worker thread 0 panicked
panic message: called `Result::unwrap()` on an `Err` value: StdIo(StorageFull)
Solution: Increase pipeline storage capacity
In the Enterprise edition, Feldera runs each pipeline in a separate pod and, by
default, attaches PVC volumes for storage. The default volume size is 30 GB,
and if your pipelines are encountering StorageFull errors, you should
explicitly request larger volumes for each pipeline:
"resources": {
"storage_mb_max": 128000
}
Kubernetes evictions
Error: the pipeline becomes UNAVAILABLE with no errors in the logs.
Solution: configure resource reservations and limits for the Pipeline.
Kubernetes may evict Pipeline pods under node resource pressure. To confirm, run:
kubectl describe pipeline-<pipeline-id>-0
and look for
Status: Failed
Reason: Evicted
You can also view the eviction event in your cluster monitoring stack (e.g. Datadog).
Evictions typically happen only when running Feldera in shared Kubernetes clusters. The pods to evict are determined by Kubernetes Quality-of-Service classes.
By default, Feldera Pipelines do not reserve any CPU or memory resources, which
puts them in the BestEffort priority class,
making them eviction candidates. To raise their priority:
-
Burstableclass: reserve a minimum amount of memory and CPU:"resources": {
"cpu_cores_min": 16,
"memory_mb_min": 32000,
} -
Guaranteedclass: set minimum and maximum resources to the same value, for memory and CPU:"resources": {
"cpu_cores_min": 16,
"cpu_cores_max": 16,
"memory_mb_min": 32000,
"memory_mb_max": 32000,
}
Rust Compilation Errors
Error: No space left on device during Rust compilation
Solution: Ensure the compiler-server has sufficient disk space (20Gib by default, configured via the compilerPvcStorageSize value in the Helm chart).
Uncommon Problems
Lost or accidentally deleted the Feldera Control-Plane PostgreSQL database
Feldera tracks state about pipelines inside a PostgreSQL database. If for some reason the state in this database is ever lost or otherwise can not be recovered, and the feldera instance had running pipelines at the time, a manual intervention may be necessary to clean-up the leftover pods. Its not possible to reinstantiate these leftover (orphaned) pipelines, therefore the Kubernetes objects backing these pipelines should be manually removed.
- Identify any stale pipelines in the feldera namespace
(e.g., by running
kubectl get pods -n $NS) - For each pipeline delete the k8s definitions that feldera created for it: Statefulset, Service, ConfigMap, Pod and PVC.
Here is an example script that would clean up a stale pipeline. Note: it is generally not enough to just delete the pod since the existing statefulset will try to re-create it.
NAMESPACE=feldera-ns
POD=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-0
NAME=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356
PVC=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-storage-pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-0
# Ensure you have the permissions to perform the delete operations
kubectl auth can-i delete sts -n $NAMESPACE
kubectl auth can-i delete service -n $NAMESPACE
kubectl auth can-i delete configmap -n $NAMESPACE
kubectl auth can-i delete pod -n $NAMESPACE
kubectl auth can-i delete pvc -n $NAMESPACE
# Delete the k8s objects manually
kubectl delete sts -n $NAMESPACE $NAME
kubectl delete service -n $NAMESPACE $NAME
kubectl delete configmap -n $NAMESPACE $NAME
kubectl delete pod -n $NAMESPACE $POD
kubectl delete pvc -n $NAMESPACE $PVC