Postgres Loader Configuration Reference
This is a complete list of the options that can be configured in the postgres loader's HOCON config file. The example configs in github show how to prepare an input file.
input.type | Required. Can be "Kinesis", "PubSub" or "Local". Configures where input events will be read from. |
input.streamName | Required when input.type is Kinesis. Name of the Kinesis stream to read from. |
input.region | Required when input.type is Kinesis. AWS region in which the Kinesis stream resides. |
input.initialPosition | Optional. Used when input.type is Kinesis. Use "TRIM_HORIZON" (the default) to start streaming at the last untrimmed record in the shard, which is the oldest data record in the shard. Or use "LATEST" to start streaming just after the most recent record in the shard. |
input.retrievalMode.type | Optional. When input.type is Kinesis, this sets the polling mode for retrieving records. Can be "FanOut" (the default) or "Polling". |
input.retrievalMode.maxRecords | Optional. Used when input.retrievalMode.type is "Polling". Configures how many records are fetched in each poll of the kinesis stream. Default 10000. |
input.projectId | Required when input.type is PubSub. The name of your GCP project. |
input.subscriptionId | Required when input.type is PubSub. Id of the PubSub subscription to read events from |
input.path | Required when input.type is Local. Path for event source. It can be directory or file. If it is directory, all the files under given directory will be read recursively. Also, given path can be both absolute path or relative path w.r.t. executable. |
output.good.host | Required. Hostname of the postgres database. |
output.good.port | Optional. Port number of the postgres database. Default 5432. |
output.good.database | Required. Name of the postgres database. |
output.good.username | Required. Postgres role name to use when connecting to the database |
output.good.password | Required. Password for the postgres user. |
output.good.schema | Required. The Postgres schema in which to create tables and write events. |
output.good.sslMode | Optional. Configures how the client and server agree on ssl protection. Default "REQUIRE" |
output.bad.type | Optional. Can be "Kinesis", "PubSub", "Local" or "Noop". Configures where bad rows will be sent. Default is "Noop" which means bad rows will be discarded |
output.bad.streamName | Required when bad.type is Kinesis. Name of the Kinesis stream to write to. |
output.bad.region | Required when bad.type is Kinesis. AWS region in which the Kinesis stream resides. |
output.bad.projectId | Required when bad.type is PubSub. The name of your GCP project. |
output.bad.topicId | Required when bad.type is PubSub. Id of the PubSub topic to write bad rows to |
output.bad.path | Required when bad.type is Local. Path of the file to write bad rows |
purpose | Optional. Set this to "ENRICHED_EVENTS" (the default) when reading the stream of enriched events in tsv format. Set this to "JSON" when reading a stream of self-describing json, e.g. snowplow bad rows. |
monitoring.metrics.cloudWatch | Optional boolean, with default true. For kinesis input, this is used to disable sending metrics to cloudwatch. |
Advanced options
We believe these advanced options are set to sensible defaults, and hopefully you won't need to ever change them.
backoffPolicy.minBackoff | If producer (PubSub or Kinesis) fails to send item, it will retry to send it again. This field configures backoff time for first retry. Every retry will double the backoff time of previous one. |
backoffPolicy.maxBackoff | Maximum backoff time for retry. After this value is reached, backoff time will no more increase. |
input.checkpointSettings.maxBatchSize | Used when input.type is Kinesis. Determines the max number of records to aggregate before checkpointing the records. Default is 1000. |
input.checkpointSettings.maxBatchWait | Used when input.type is Kinesis. Determines the max amount of time to wait before checkpointing the records. Default is 10 seconds. |
input.checkpointSettings.maxConcurrent | Used when input.type is PubSub. The max number of concurrent evaluation for checkpointer. |
output.good.maxConnections | Maximum number of connections database pool is allowed to reach. Default 10 |
output.good.threadPoolSize | Size of the thread pool for blocking database operations. Default is value of "maxConnections" |
output.bad.delayThreshold | Set the delay threshold to use for batching. After this amount of time has elapsed (counting from the first element added), the elements will be wrapped up in a batch and sent. Default 200 milliseconds |
output.bad.maxBatchSize | A batch of messages will be emitted when the number of events in batch reaches the given size. Default 500 |
output.bad.maxBatchBytes | A batch of messages will be emitted when the size of the batch reaches the given size. Default 5 MB |