AWS S3 input connector
This page describes configuration options specific to the S3 input connector. See top-level connector documentation for general information about configuring input and output connectors.
The AWS S3 input connector is used to load data from an S3 bucket to a Feldera table. It can be configured to load a single object or multiple objects selected based on a common S3 prefix.
When accessing an S3 bucket that stores data in the Delta Lake format, consider using the Delta Lake connector instead.
Configuration options
Property | Type | Default | Description |
---|---|---|---|
aws_access_key_id | string | AWS Access Key id. This property must be specified unless no_sign_request is set to true . | |
aws_secret_access_key | string | Secret Access Key. This property must be specified unless no_sign_request is set to true . | |
no_sign_request | bool | false | Do not sign requests. This is equivalent to the --no-sign-request flag in the AWS CLI. |
key | string | Read a single object specified by a key. Either this property or the prefix property must be set. | |
prefix | string | Read all objects whose keys match a prefix. Set to an empty string to read all objects in the bucket. Either this property or the key property must be set. | |
region * | string | AWS region. | |
bucket_name * | string | S3 bucket name. | |
streaming | bool | false | Determines how the connector ingests an individual S3 object. When true , the connector pushes the object to the pipeline chunk-by-chunk, so that the pipeline can parse and process initial chunks of the object before the entire object has been retrieved. This mode is suitable for streaming formats such as newline-delimited JSON. When false , the connector buffers the entire object in memory and pushes it to the pipeline as a single chunk. Appropriate for formats like Parquet that cannot be streamed. |
*Fields marked with an asterisk are required.
Examples
Populate a table from a JSON file in a public S3 bucket:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"no_sign_request": true,
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
Populate a table from a JSON file, using access key-based authentication:
CREATE TABLE vendor (
id BIGINT NOT NULL PRIMARY KEY,
name VARCHAR,
address VARCHAR
) WITH ('connectors' = '[{
"transport": {
"name": "s3_input",
"config": {
"key": "vendor.json",
"aws_access_key_id": "YOUR_ACCESS_KEY_ID",
"aws_secret_access_key": "YOUR_SECRET_ACCESS_KEY",
"bucket_name": "feldera-basics-tutorial",
"region": "us-west-1"
}
},
"format": { "name": "json" }
}]');
Refer to the secret management guide to externalize AWS access keys via Kubernetes.
Additional resources
For more information, see: