Skip to the content.
Inputs
Parsers
Indexes
Services
Outputs

Configuration - Parsers

parsers

Type: Array of Object.Each item in this array must match one of the following definitions.

In RODB, a parser is an object that allows you to convert a string value into another data type. There are many kind of parsers described below.

RODB never tries to guess any data type or magically convert any data. This means for example that every value extracted from a CSV file is a string by default, even if the column only contains digits. To get a number, you would need to apply an integer or float parser to the column.

The parsers are used by the input (data) and output (user inputs) layers.

Example:

parsers:
  - name: booleanYesNo
    type: boolean
    falseValues: ["No", "NO", "no"]
    trueValues: ["Yes", "YES", "yes"]
  - name: integerWithCommas
    type: integer
    ignoreCharacters: ","
  - name: shiftJisString
    type: string
    convertFromCharset: "Shift_JIS"

Integer

parsers[type = “integer”]

Type: Object

Parses a string value to a 64-bytes integer.

Default instance:

A default instance of this parser is already automatically created as such:

name: integer
type: integer
ignoreCharacters: ""

Examples:

name: formattedInteger
type: integer
ignoreCharacters: ", "
name: currencyInteger
type: integer
ignoreCharacters: "€$¥"

Properties:

name

parsers[type = “integer”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “integer”].type

Must have the value: "integer"

ignoreCharacters (optional)

parsers[type = “integer”].ignoreCharacters

Type: String

Default value: ""

This is a list of unicode characters, as a string. All the characters in this string will be stripped from the value before parsing it to an integer. This is useful for example to parse formatted numbers to an integer.

Float

parsers[type = “float”]

Type: Object

Parses a string value to a 64-bytes floating-point number.

Default instance:

A default instance of this parser is already automatically created as such:

name: float
type: float
decimalSeparator: "."
ignoreCharacters: ""

Examples:

name: formattedFloat
type: float
decimalSeparator: "."
ignoreCharacters: ","
name: currencyFloat
type: float
decimalSeparator: "."
ignoreCharacters: "€$¥"

Properties:

name

parsers[type = “float”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “float”].type

Must have the value: "float"

decimalSeparator

parsers[type = “float”].decimalSeparator

Type: String

This is the character sequence that is used in the value to separate the integer part from the fractional part.

ignoreCharacters (optional)

parsers[type = “float”].ignoreCharacters

Type: String

Default value: ""

This is a list of unicode characters, as a string. All the characters in this string will be stripped from the value before parsing it to an integer. This is useful for example to parse formatted numbers to an integer.

Boolean

parsers[type = “boolean”]

Type: Object

Converts a string value to a boolean, according to the configured values. RODB will output an error if it encounters a value that is neither defined in trueValues or falseValues.

Default instance:

A default instance of this parser is already automatically created as such:

name: boolean
type: boolean
trueValues:  ["true", "1", "TRUE"]
falseValues: ["false", "0", "FALSE"]

Examples:

name: booleanYesNo
type: boolean
falseValues: ["No", "NO", "no"]
trueValues: ["Yes", "YES", "yes"]
name: booleanZeroOne
type: boolean
trueValues: ["1"]
falseValues: ["0"]

Properties:

name

parsers[type = “boolean”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “boolean”].type

Must have the value: "boolean"

trueValues

parsers[type = “boolean”].trueValues

Type: Array of String. Empty array not allowed.

This is a list of values that will be converted to ‘true’.

Items of the array

parsers[type = “boolean”].trueValues[]

Type: String

falseValues

parsers[type = “boolean”].falseValues

Type: Array of String. Empty array not allowed.

This is a list of values that will be converted to ‘false’.

Items of the array

parsers[type = “boolean”].falseValues[]

Type: String

String

parsers[type = “string”]

Type: Object

This parser allows to operate some changes on a string.

It is currently only useful if you need to read files with a non-unicode encoding. Otherwise, while it is possible to declare your own, it would be equivalent to the default one.

Default instance:

A default instance of this parser is already automatically created as such:

name: string
type: string
convertFromCharset: ""

Example:

name: shiftJisString
type: string
convertFromCharset: "Shift_JIS"

Properties:

name

parsers[type = “string”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “string”].type

Must have the value: "string"

convertFromCharset (optional)

parsers[type = “string”].convertFromCharset

Type: String

Default value: ""

Internally, RODB only handles UTF-8 strings. If the given value has a different encoding, RODB will convert it automatically from the encoding set in this value.

When this value is an empty string, no conversion is performed.

This setting must match one of the encodings listed in the IANA character sets index (either the MIME or Name columns).

Split

parsers[type = “split”]

Type: Object

Splits a string to an array of values and applies another parser on each value.

This parser is not a primitive parser, and thus cannot be used as a parameter in an output object.

Examples:

name: splitStringsOnSlashes
type: split
delimiter: "/"
parser: string
name: extractOnlyIntegers
type: split
delimiter: "[^0-9]+"
delimiterIsRegexp: true
parser: integer

Properties:

name

parsers[type = “split”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “split”].type

Must have the value: "split"

delimiter

parsers[type = “split”].delimiter

Type: String

If delimiterIsRegexp is false

This is the string on which the value will be splitted to an array. The implementation uses the strings.Split function of GoLang:

Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators.
If s does not contain sep and sep is not empty, Split returns a slice of length 1 whose only element is s.
If sep is empty, Split splits after each UTF-8 sequence. If both s and sep are empty, Split returns an empty slice.

If delimiterIsRegexp is true

This is the regular expression that will be used to split a string. The regexp syntax is the RE2 one described here. More specifically, it uses the native Golang’s regexp engine. You can find more details about the engine’s implementation here.

The specific algorithm that is used is the regexp.Split function of GoLang, with n=-1.

delimiterIsRegexp (optional)

parsers[type = “split”].delimiterIsRegexp

Type: Boolean

Defines whether the delimiter should be interpreted as a regular expression or as a litteral string. Please see the definition of the delimiter setting for more information.

parser

parsers[type = “split”].parser

Type: String

After splitting the value, another parser will be applied to each member of the resulting array. This is the name of this parser. If no specific change is required and an array of strings is expected, please use parser: string.

JSON

parsers[type = “json”]

Type: Object

Parses a JSON value (string) to any type it contains (object, array…). (not to be confused with the JSON input parser)

While it is possible to declare your own, it would be equivalent to the default one, because there are currently no available settings.

This parser is not a primitive parser, and thus cannot be used as a parameter in an output object.

Default instance:

A default instance of this parser is already automatically created as such:

name: json
type: json

Example:

name: customJson
type: json

Properties:

name

parsers[type = “json”].name

Type: String

The name of this parser, which any other component will use to refer to it.

type

parsers[type = “json”].type

Must have the value: "json"