AWS S3 input connector
This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.
The AWS S3 input connector is used to load data from an S3 bucket to a Feldera table. It can be configured to load a single object or multiple objects selected based on a common S3 prefix.
When accessing an S3 bucket that stores data in the Delta Lake format, consider using the Delta Lake connector instead.
The S3 input connector supports fault tolerance.
Configuration options
Property | Type | Default | Description |
---|---|---|---|
aws_access_key_id | string | AWS Access Key id. This property must be specified unless no_sign_request is set to true . | |
aws_secret_access_key | string | Secret Access Key. This property must be specified unless no_sign_request is set to true . | |
no_sign_request | bool | false | Do not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI. |
key | string | Read a single object specified by a key. Either this property or the prefix property must be set. | |
prefix | string | Read all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set. | |
region * | string | AWS region. | |
bucket_name * | string | S3 bucket name. | |
streaming | bool | false | Determines how the connector ingests an individual S3 object. When true , the connector pushes the object to the pipeline chunk-by-chunk, so that the pipeline can parse and process initial chunks of the object before the entire object has been retrieved. This mode is suitable for streaming formats such as newline-delimited JSON. When false , the connector buffers the entire object in memory and pushes it to the pipeline as a single chunk. Appropriate for formats like Parquet that cannot be streamed. |
*Fields marked with an asterisk are required.
Examples
Populate a table from a JSON file in a public S3 bucket:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
Populate a table from a JSON file, using access key-based authentication:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
Refer to the secret management guide to externalize AWS access keys via Kubernetes.
Additional resources
For more information, see: