Skip to main content

AWS S3 input connector

note

This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.

The AWS S3 input connector is used to load data from an S3 bucket to a Feldera table. It can be configured to load a single object or multiple objects selected based on a common S3 prefix.

tip

When accessing an S3 bucket that stores data in the Delta Lake format, consider using the Delta Lake connector instead.

The S3 input connector supports fault tolerance.

Configuration options

PropertyTypeDefaultDescription
aws_access_key_idstringAWS Access Key id. This property must be specified unless no_sign_request is set to true.
aws_secret_access_keystringSecret Access Key. This property must be specified unless no_sign_request is set to true.
no_sign_requestboolfalseDo not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI.
keystringRead a single object specified by a key. Either this property or the prefix property must be set.
prefixstringRead all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set.
region*stringAWS region.
bucket_name*stringS3 bucket name.
streamingboolfalseDetermines how the connector ingests an individual S3 object. When true, the connector pushes the object to the pipeline chunk-by-chunk, so that the pipeline can parse and process initial chunks of the object before the entire object has been retrieved. This mode is suitable for streaming formats such as newline-delimited JSON. When false, the connector buffers the entire object in memory and pushes it to the pipeline as a single chunk. Appropriate for formats like Parquet that cannot be streamed.

*Fields marked with an asterisk are required.

Examples

Populate a table from a JSON file in a public S3 bucket:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');

Populate a table from a JSON file, using access key-based authentication:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
tip

Refer to the secret management guide to externalize AWS access keys via Kubernetes.

Additional resources

For more information, see: