S3 input connector
This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.
The S3 input connector is used to load data from an S3 bucket to a Feldera table.
It can be configured to load a single object or multiple objects selected based on a
common S3 prefix. By setting the endpoint_url
, you can also read from non-AWS services
that offer S3 compatible APIs.
When accessing an S3 bucket that stores data in the Delta Lake or Iceberg format, consider using the Delta Lake connector or the Iceberg connector connector instead.
The S3 input connector supports fault tolerance.
Configuration options
Property | Type | Default | Description |
---|---|---|---|
aws_access_key_id | string | AWS Access Key id. This property must be specified unless no_sign_request is set to true . | |
aws_secret_access_key | string | Secret Access Key. This property must be specified unless no_sign_request is set to true . | |
no_sign_request | bool | false | Do not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI. |
key | string | Read a single object specified by a key. Either this property or the prefix property must be set. | |
prefix | string | Read all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set. | |
region * | string | AWS region. | |
bucket_name * | string | S3 bucket name. | |
endpoint_url | string | The endpoint URL used to communicate with this service. Explicitly set it to connect to non-AWS services. For example, use https://storage.googleapis.com to interact with Google Cloud Storage. | |
max_concurrent_fetches | integer | 8 | Controls the number of S3 objects fetched in parallel. Increasing this value can improve throughput by enabling greater concurrency. However, higher concurrency may lead to timeouts or increased memory usage due to in-memory buffering. Recommended range: 1–10. Default: 8. |
*Fields marked with an asterisk are required.
Support for AWS IAM roles for service accounts (IRSA)
To use AWS IAM roles for service accounts (IRSA),
you must skip supplying the fields aws_access_key_id
and aws_secret_access_key
.
Feldera will pick up the credentials from the environment variables.
Examples
Populate a table from a JSON file in a public S3 bucket:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
Populate a table from a JSON file, using access key-based authentication:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
To connect to Google Cloud Storage, explicitly set the endpoint_url
. You also
need to configure an HMAC
key to get an
Access ID and Secret. You will also need to grant the corresponding Principal
access to the bucket.
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "my-bucket",
"region": "us-west1",
"endpoint_url": "https://storage.googleapis.com"
}
},
"format": { "name": "json" }
}]');
Refer to the secret references guide to externalize AWS access keys via Kubernetes.
Additional resources
For more information, see: