Skip to main content

AWS S3 input connector

note

This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.

The AWS S3 input connector is used to load data from an S3 bucket to a Feldera table. It can be configured to load a single object or multiple objects selected based on a common S3 prefix.

tip

When accessing an S3 bucket that stores data in the Delta Lake format, consider using the Delta Lake connector instead.

Configuration options

PropertyTypeDefaultDescription
aws_access_key_idstringAWS Access Key id. This property must be specified unless no_sign_request is set to true.
aws_secret_access_keystringSecret Access Key. This property must be specified unless no_sign_request is set to true.
no_sign_requestboolfalseDo not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI.
keystringRead a single object specified by a key. Either this property or the prefix property must be set.
prefixstringRead all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set.
region*stringAWS region.
bucket_name*stringS3 bucket name.
streamingboolfalseDetermines how the connector ingests an individual S3 object. When true, the connector pushes the object to the pipeline chunk-by-chunk, so that the pipeline can parse and process initial chunks of the object before the entire object has been retrieved. This mode is suitable for streaming formats such as newline-delimited JSON. When false, the connector buffers the entire object in memory and pushes it to the pipeline as a single chunk. Appropriate for formats like Parquet that cannot be streamed.

*Fields marked with an asterisk are required.

Examples

Populate a table from a JSON file in a public S3 bucket:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');

Populate a table from a JSON file, using access key-based authentication:

CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
tip

Refer to the secret management guide to externalize AWS access keys via Kubernetes.

Additional resources

For more information, see: