Skip to main content

Troubleshooting

This guide covers issues Feldera Enterprise users and operators might run into in production, and steps to remedy them.

Diagnosing Performance Issues

When investigating pipeline performance, Feldera support will typically request a support-bundle. The bundle can be downloaded from your installation with one of the following methods:

fda support-bundle affected-pipeline-name

The support bundle has the following content:

  1. Pipeline Logs: for warnings and errors from the logs endpoint.

  2. Pipeline Configuration: the pipeline configuration, including the SQL code and connector settings.

  3. Pipeline Metrics: from the pipeline metrics endpoint.

  4. Endpoint Stats: from the stats endpoint.

  5. Circuit Profile: from the circuit profile endpoint.

  6. Heap Profile: from heap usage endpoint.

Common Error Messages

Delta Lake Connection Errors

Error: Table metadata is invalid: Number of checkpoint files '0' is not equal to number of checkpoint metadata parts 'None'

Solution: This usually happens when the Delta Table uses features unsupported by delta-rs like liquid clustering or deletion vectors. Check the table properties and set the checkpoint policy to "classic":

ALTER TABLE my_table SET TBLPROPERTIES (
'checkpointPolicy' = 'classic'
)

Out-of-Memory Errors

Error: The pipeline container has restarted. This was likely caused by an Out-Of-Memory (OOM) crash.

Feldera runs each pipeline in a separate container with configurable memory limits. Here are some knobs to control memory usage:

  1. Adjust the pipeline’s memory reservation and limit:

    "resources": {
    "memory_mb_min": 32000,
    "memory_mb_max": 32000
    }
  2. Throttle the amount of records buffered by the connector using the max_queued_records setting:

    "max_queued_records": 100000
  3. Ensure that storage is enabled (it's on by default):

    "storage": {
    "backend": {
    "name": "default"
    },
    "min_storage_bytes": null,
    "compression": "default",
    "cache_mib": null
    },
  4. Optimize your SQL queries to avoid expensive cross-products. Use functions like NOW() sparingly on large relations.

Out-of-storage Errors

Error: The pipeline logs contain messages like:

DBSP error: runtime error: One or more worker threads terminated unexpectedly
worker thread 0 panicked
panic message: called `Result::unwrap()` on an `Err` value: StdIo(StorageFull)

Solution: Increase pipeline storage capacity

In the Enterprise edition, Feldera runs each pipeline in a separate pod and, by default, attaches PVC volumes for storage. The default volume size is 30 GB, and if your pipelines are encountering StorageFull errors, you should explicitly request larger volumes for each pipeline:

"resources": {
"storage_mb_max": 128000
}

Kubernetes evictions

Error: the pipeline becomes UNAVAILABLE with no errors in the logs.

Solution: configure resource reservations and limits for the Pipeline.

Kubernetes may evict Pipeline pods under node resource pressure. To confirm, run:

kubectl describe pipeline-<pipeline-id>-0

and look for

Status: Failed
Reason: Evicted

You can also view the eviction event in your cluster monitoring stack (e.g. Datadog).

Evictions typically happen only when running Feldera in shared Kubernetes clusters. The pods to evict are determined by Kubernetes Quality-of-Service classes.

By default, Feldera Pipelines do not reserve any CPU or memory resources, which puts them in the BestEffort priority class, making them eviction candidates. To raise their priority:

  1. Burstable class: reserve a minimum amount of memory and CPU:

    "resources": {
    "cpu_cores_min": 16,
    "memory_mb_min": 32000,
    }
  2. Guaranteed class: set minimum and maximum resources to the same value, for memory and CPU:

    "resources": {
    "cpu_cores_min": 16,
    "cpu_cores_max": 16,
    "memory_mb_min": 32000,
    "memory_mb_max": 32000,
    }

Rust Compilation Errors

Error: No space left on device during Rust compilation

Solution: Ensure the compiler-server has sufficient disk space (20Gib by default, configured via the compilerPvcStorageSize value in the Helm chart).

Uncommon Problems

Lost or accidentally deleted the Feldera Control-Plane PostgreSQL database

Feldera tracks state about pipelines inside a PostgreSQL database. If for some reason the state in this database is ever lost or otherwise can not be recovered, and the feldera instance had running pipelines at the time, a manual intervention may be necessary to clean-up the leftover pods. Its not possible to reinstantiate these leftover (orphaned) pipelines, therefore the Kubernetes objects backing these pipelines should be manually removed.

  1. Identify any stale pipelines in the feldera namespace (e.g., by running kubectl get pods -n $NS)
  2. For each pipeline delete the k8s definitions that feldera created for it: Statefulset, Service, ConfigMap, Pod and PVC.

Here is an example script that would clean up a stale pipeline. Note: it is generally not enough to just delete the pod since the existing statefulset will try to re-create it.

NAMESPACE=feldera-ns
POD=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-0
NAME=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356
PVC=pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-storage-pipeline-019a7c1d-6a0c-7923-afd7-0125fe589356-0

# Ensure you have the permissions to perform the delete operations
kubectl auth can-i delete sts -n $NAMESPACE
kubectl auth can-i delete service -n $NAMESPACE
kubectl auth can-i delete configmap -n $NAMESPACE
kubectl auth can-i delete pod -n $NAMESPACE
kubectl auth can-i delete pvc -n $NAMESPACE

# Delete the k8s objects manually
kubectl delete sts -n $NAMESPACE $NAME
kubectl delete service -n $NAMESPACE $NAME
kubectl delete configmap -n $NAMESPACE $NAME
kubectl delete pod -n $NAMESPACE $POD
kubectl delete pvc -n $NAMESPACE $PVC