How schemas translate to database types

Redshift and Postgres
Databricks
BigQuery
Snowflake
Elastic

note

The row order in this table is important. Type lookup stops after the first match is found scanning from top to bottom (with the two exceptions of "null" and "required" — the first two rows in the table).

Json Schema	Redshift/Postgres Type
`{ "type": ["null", T1, ...] }` OR `{ "enum": ["null", E1, ...] }` `"null"` is not considered for type casting logic, only for the nullability constraint. Type lookup will continue down the table. `"null"`’s position in a list (`type` or `enum`) does not matter. Note that each major schema version (`1-0-0`, `2-0-0`, etc) results in a new column (name ending with `_1`, `_2`, etc). Once the loader creates a column for a given schema version as `NULLABLE` or `NOT NULL`, it will never alter the nullability constraint for that column. For example, if a field is nullable in schema version `1-0-0` and not nullable in version `1-0-1`, the column will remain nullable. (In this example, the Enrich application will still validate data according to the schema, accepting `null` values for `1-0-0` and rejecting them for `1-0-1`.)	`NULLABLE`
`{ "properties": { "f1": {"type": "T"} }, "required": [ "f1" ] }` `"required"` is not considered for type casting logic, only for the nullability constraint. Type lookup will continue down the table. Fields that are not listed as `"required"` are nullable. Fields are also nullable when they are listed as `"required"` but have `"null"` in their `type` or `enum` definition. (In the latter case, the Enrich application will still validate that the field is present, even if it’s `null`.) Note that each major schema version (`1-0-0`, `2-0-0`, etc) results in a new column (name ending with `_1`, `_2`, etc). Once the loader creates a column for a given schema version as `NULLABLE` or `NOT NULL`, it will never alter the nullability constraint for that column. For example, if a field is not required in schema version `1-0-0` and required in version `1-0-1`, the column will remain nullable. (In this example, the Enrich application will still validate data according to the schema, accepting `null` values for `1-0-0` and rejecting them for `1-0-1`.)	`NOT NULL`
`{ "enum": [E1, E2, ...] }` The `enum` can contain more than one JavaScript type: `string`, `number\|integer`, `boolean`. For the purposes of this `number` and `integer` are the same. `array`, `object`, `NaN` and other types in the `enum` will be cast as fallback `VARCHAR(4096)`. If content size is longer than 4096 it would be truncated when inserted into the Redshift.	`VARCHAR(M)` `M` is the maximum size of `json.stringify(E*)`
`{ "type": ["boolean", "integer"] }` OR `{ "type": ["integer", "boolean"] }`	`VARCHAR(10)`
`{ "type": [T1, T2, ...] }`	`VARCHAR(4096)` If content size is longer than 4096 it would be truncated when inserted into the Redshift.
`{ "type": "string", "format": "date-time" }`	`TIMESTAMP`
`{ "type": "string", "format": "date" }`	`DATE`
`{ "type": "array" }`	`VARCHAR(65535)` Content is stringified and quoted. If content size is longer than 65535 it would be truncated when inserted into the Redshift.
`{ "type": "integer", "maximum": M }` `M` ≤ 32767	`SMALLINT`
`{ "type": "integer", "maximum": M }` 32767 < `M` ≤ 2147483647	`INT`
`{ "type": "integer", "maximum": M }` `M` >2147483647	`BIGINT`
`{ "type": "integer", "enum": [E1, E2, ...] }` Maximum `E*` ≤ 32767	`SMALLINT`
`{ "type": "integer", "enum": [E1, E2, ...] }` 32767 < maximum `E*` ≤ 2147483647	`INT`
`{ "type": "integer", "enum": [E1, E2, ...] }` Maximum `E*` > 2147483647	`BIGINT`
`{ "type": "integer" }`	`BIGINT`
`{ "multipleOf": B }`	`INT`
`{ "type": "number", "multipleOf": B }` Only works for `B`=2	`DECIMAL(36,2)`
`{ "type": "number" }`	`DOUBLE`
`{ "type": "boolean" }`	`BOOLEAN`
`{ "type": "string", "minLength": M, "maxLength": M }` `M` is the same in minLength and maxLength	`CHAR(M)`
`{ "type": "string", "format": "uuid" }`	`CHAR(36)`
`{ "type": "string", "format": "ipv6" }`	`VARCHAR(39)`
`{ "type": "string", "format": "ipv4" }`	`VARCHAR(15)`
`{ "type": "string", "format": "email" }`	`VARCHAR(255)`
`{ "type": "string", "maxLength": M }` `enum` is not defined	`VARCHAR(M)`
`{ "enum": ["E1"] }` `E1` is the only element	`CHAR(M)` `M` is the size of `json.stringify("E1")`
If nothing matches above, this is a catch-all.	`VARCHAR(4096)` Values will be quoted as in JSON. If content size is longer than 4096 it would be truncated when inserted into the Redshift.

note

All fields in databricks are nullable. Having "null" in the "type" or "enum" does not affect the warehouse type, and is ignored for the purposes of type casting as per the table below.

The row order in this table is important. Type lookup stops after the first match is found scanning from top to bottom (with the single exception of "null" — the first row in the table).

Json Schema	Databricks Type
`{ "type": "string", "format": "date-time" }`	`TIMESTAMP`
`{ "type": "string", "format": "date" }`	`DATE`
`{ "type": "boolean" }`	`BOOLEAN`
`{ "type": "string" }`	`STRING`
`{ "type": "integer", "minimum": N, "maximum": M }` `M` ≤ 2147483647 `N` ≥ -2147483648	`INT`
`{ "type": "integer", "minimum": N, "maximum": M }` `M` ≤ 9223372036854775807 `N` ≥ -9223372036854775808	`BIGINT`
`{ "type": "integer", "minimum": N, "maximum": M }` `M` >1e38-1 `N` <-1e38	`DECIMAL(38,0)`
`{ "type": "integer", "minimum": N, "maximum": M }` `M` < 1e38-1 `N` >-1e38	`DOUBLE`
`{ "type": "integer" }`	`BIGINT`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `M` ≤ 2147483647 `N` ≥ -2147483648 `F` is integer	`INT`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `M` ≤ 9223372036854775807 `N` ≥ -9223372036854775808 `F` is integer	`BIGINT`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `M` > 1e38-1 `N` < -1e38 `F` is integer	`DECIMAL(38,0)`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `M` < 1e38-1 `N` > -1e38 `F` is integer	`DOUBLE`
`{ "type": "number", // OR ["number", "integer"] "multipleOf": F }` `F` is integer	`BIGINT`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `P` ≤ 38, where `P` is the maximum precision (total number of digits) of `M` and `N`, adjusted for scale (number of digits after the `.`) of `F`. `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0. More details `P` = `MAX`(`M.precision` - `M.scale` + `F.scale`, `N.precision` - `N.scale` + `F.scale`) `S` = `F.scale` For example, `M=10.9999, N=-10, F=0.1` will be `DECIMAL(9,1)`. Calculation as follows: `M` is `DECIMAL(6,4)`, `N` is `DECIMAL(2,0)`, `F` is `DECIMAL(2,1)` `P` = `MAX`(6 - 4 + 1, 2 + 1) = 3, rounded up to 9 `S` = 1 result is `DECIMAL(9,1)`	`DECIMAL(P,S)` `P` is rounded up to either `9`, `18` or `38`.
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `P` >38, where is the maximum precision (total number of digits) of `M` and `N`, adjusted for scale (number of digits after the `.`) of `F`. `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0. More details `P` = `MAX`(`M.precision` - `M.scale` + `F.scale`, `N.precision` - `N.scale` + `F.scale`) For example, `M=10.9999, N=-1e50, F=0.1` will be `DOUBLE`. Calculation as follows: `M` is `DECIMAL(6,4)`, `N` is `DECIMAL(2,0)`, `F` is `DECIMAL(2,1)` `P` = `MAX`(6 - 4 + 1, 50 + 1) = 51 >38	`DOUBLE`
`{ "type": "number", // OR ["number", "integer"] "minimum": N, "maximum": M, "multipleOf": F }` `M` < 1e38-1 `N` > -1e38 `F` is integer	`DOUBLE`
`{ "type": "number" // OR ["number", "integer"] }`	`DOUBLE`
`{ "enum": [N1, I1, ...] }` All `Nx` and `Ix` are of types number or integer. Maximum scale (number of digits after the `.`) in the enum list is 0. Maximum absolute value of the enum list is lesser or equal than 2147483647.	`INT`
`{ "enum": [N1, I1, ...] }` All `Nx` and `Ix` are of types number or integer. Maximum scale (number of digits after the `.`) in the enum list is 0. Maximum absolute value of the enum list is lesser or equal than 9223372036854775807.	`BIGINT`
`{ "enum": [N1, I1, ...] }` All `Nx` and `Ix` are of types number or integer. Maximum scale (number of digits after the `.`) in the enum list is 0. Maximum absolute value of the enum list is greater than 9223372036854775807.	`BIGINT`
`{ "enum": [N1, I1, ...] }` All `Nx` and `Ix` are of types number or integer. Absolute maximum value of the enum list and less than 1e38. `S` is the maximum scale (number of digits after the `.`) in the enum list and it is greater than 0. `P` is precision (total number of digits in `M`). Rounded up to `9`, `18` or `38`.	`DECIMAL(P,S)` `P` is rounded up to either `9`, `18` or `38`.
`{ "enum": [S1, S2, ...] }` All `Sx` are string	`STRING`
`{ "enum": [A1, A2, ...] }` `Ax` are a mix of different types	`STRING` Values will be quoted as in JSON.
If nothing matches above, this is a catch-all.	`STRING` Values will be quoted as in JSON.

note

The row order in this table is important. Type lookup stops after first match is found scanning from top to bottom (with the single exception of "null" — the first row in the table).

Json Schema	BigQuery Type
`{ "type": ["null", T1, ...] }` OR `{ "enum": ["null", E1, ...] }`	`NULLABLE` `"null"` is not considered for type casting logic. Only for nullability constraint. Type lookup will continue down the table.
`{ "type": "object", "properties": {...} }` If the `"properties"` key is missing, the type for the entire object will be `STRING` instead of `RECORD`. Objects can be nullable. Nested fields can also be nullable (same rules as for everything else).	`RECORD`
`{ "type": "array", "items": {...} }` If the `"items"` key is missing, the type for the entire array will be `STRING` instead of `REPEATED`. Arrays can be nullable. Nested fields can also be nullable (same rules as for everything else).	`REPEATED`
`{ "type": "string", "format": "date-time" }`	`TIMESTAMP`
`{ "type": "string", "format": "date" }`	`DATE`
`{ "type": "boolean" }`	`BOOLEAN`
`{ "type": "string" }`	`STRING`
`{ "type": "integer" }`	`INT`
`{ "type": "number" }` OR `{ "type": [ "integer", "number"] }`	`FLOAT`
`{ "enum": [I1, I2, ...] }` All `Ix` are integer.	`INT`
`{ "enum": [I1, N1, ...] }` All `Ix`, `Nx` are integer or number.	`FLOAT`
`{ "enum": [A1, A2, ...] }` Any of `Ax`, `Ax` has a type other than integer or number.	`STRING` Values will be quoted as in JSON.
If nothing matches above, this is a catch-all.	`STRING` Values will be quoted as in JSON.

All types are JSON.

When loading enriched events, the resulting JSONs are like the Snowplow Canonical Event model with the following changes.

Boolean fields reformatted

All boolean fields like br_features_java are either "0" or "1" in the canonical event model. The JSON converts these values to false and true.

New `geo_location` field

The geo_latitude and geo_longitude fields are combined into a single geo_location field of Elasticsearch's "geo_point" type.

Unstructured events

Unstructured events are expanded into full JSONs. For example, the event

{
    "schema": "iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1",
    "data": {
        "targetUrl": "http://snowplowanalytics.com/analytics/index.html",
        "elementId": "action",
        "elementClasses": [],
        "elementTarget": ""
    }
}

would be converted to the field

{
    "unstruct_com_snowplowanalytics_snowplow_link_click_1": {
        "targetUrl": "http://snowplowanalytics.com/analytics/index.html",
        "elementId": "action",
        "elementClasses": [],
        "elementTarget": ""
    }
}

Custom contexts

Each custom context in an array is similarly expanded to a JSON with its own field. For example, the array

[
    {
        "schema": "iglu:com.acme/contextOne/jsonschema/1-0-0",
        "data": {
            "key": "value"
        }
    }
    {
        "schema": "iglu:com.acme/contextTwo/jsonschema/3-0-0",
        "data": {
            "name": "second"
        }
    }
]

would be converted to

{
    "contexts_com_acme_context_one_1": {
        "key": "value"
    },
    "contexts_com_acme_context_two_3": {
        "name": "second"
    }
}

Boolean fields reformatted​

New geo_location field​

Unstructured events​

Custom contexts​

Boolean fields reformatted

New `geo_location` field

Unstructured events

Custom contexts