Skip to main content

S3 input connector

note

This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.

The S3 input connector is used to load data from an S3 bucket to a Feldera table. It can be configured to load a single object or multiple objects selected based on a common S3 prefix. By setting the endpoint_url, you can also read from non-AWS services that offer S3 compatible APIs.

tip

When accessing an S3 bucket that stores data in the Delta Lake or Iceberg format, consider using the Delta Lake connector or the Iceberg connector connector instead.

The S3 input connector supports fault tolerance.

Configuration options

PropertyTypeDefaultDescription
aws_access_key_idstringAWS Access Key id. This property must be specified unless no_sign_request is set to true.
aws_secret_access_keystringSecret Access Key. This property must be specified unless no_sign_request is set to true.
no_sign_requestboolfalseDo not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI.
keystringRead a single object specified by a key. Either this property or the prefix property must be set.
prefixstringRead all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set.
region*stringAWS region.
bucket_name*stringS3 bucket name.
endpoint_urlstringThe endpoint URL used to communicate with this service. Explicitly set it to connect to non-AWS services. For example, use https://storage.googleapis.com to interact with Google Cloud Storage.
max_concurrent_fetchesinteger8

Controls the number of S3 objects fetched in parallel.

Increasing this value can improve throughput by enabling greater concurrency. However, higher concurrency may lead to timeouts or increased memory usage due to in-memory buffering.

Recommended range: 1–10. Default: 8.

*Fields marked with an asterisk are required.

Support for AWS IAM roles for service accounts (IRSA)

To use AWS IAM roles for service accounts (IRSA), you must skip supplying the fields aws_access_key_id and aws_secret_access_key.

Feldera will pick up the credentials from the environment variables.

Examples

Populate a table from a JSON file in a public S3 bucket:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');

Populate a table from a JSON file, using access key-based authentication:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');

To connect to Google Cloud Storage, explicitly set the endpoint_url. You also need to configure an HMAC key to get an Access ID and Secret. You will also need to grant the corresponding Principal access to the bucket.

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "my-bucket",
"region": "us-west1",
"endpoint_url": "https://storage.googleapis.com"
}
},
"format": { "name": "json" }
}]');
tip

Refer to the secret references guide to externalize AWS access keys via Kubernetes.

Additional resources

For more information, see: