Configuration - Parsers
Type: Array of Object.Each item in this array must match one of the following definitions.
In RODB, a parser is an object that allows you to convert a string value into another data type. There are many kind of parsers described below.
RODB never tries to guess any data type or magically convert any data.
This means for example that every value extracted from a CSV file is a string by default, even if the column only contains digits.
To get a number, you would need to apply an integer
or float
parser to the column.
The parsers are used by the input (data) and output (user inputs) layers.
Example:
parsers:
- name: booleanYesNo
type: boolean
falseValues: ["No", "NO", "no"]
trueValues: ["Yes", "YES", "yes"]
- name: integerWithCommas
type: integer
ignoreCharacters: ","
- name: shiftJisString
type: string
convertFromCharset: "Shift_JIS"
Integer
Type: Object
Parses a string value to a 64-bytes integer.
Default instance:
A default instance of this parser is already automatically created as such:
name: integer
type: integer
ignoreCharacters: ""
Examples:
name: formattedInteger
type: integer
ignoreCharacters: ", "
name: currencyInteger
type: integer
ignoreCharacters: "€$¥"
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"integer"
ignoreCharacters (optional)
Type: String
Default value: ""
This is a list of unicode characters, as a string. All the characters in this string will be stripped from the value before parsing it to an integer. This is useful for example to parse formatted numbers to an integer.
Float
Type: Object
Parses a string value to a 64-bytes floating-point number.
Default instance:
A default instance of this parser is already automatically created as such:
name: float
type: float
decimalSeparator: "."
ignoreCharacters: ""
Examples:
name: formattedFloat
type: float
decimalSeparator: "."
ignoreCharacters: ","
name: currencyFloat
type: float
decimalSeparator: "."
ignoreCharacters: "€$¥"
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"float"
decimalSeparator
Type: String
This is the character sequence that is used in the value to separate the integer part from the fractional part.
ignoreCharacters (optional)
Type: String
Default value: ""
This is a list of unicode characters, as a string. All the characters in this string will be stripped from the value before parsing it to an integer. This is useful for example to parse formatted numbers to an integer.
Boolean
Type: Object
Converts a string value to a boolean, according to the configured values.
RODB will output an error if it encounters a value that is neither defined in trueValues
or falseValues
.
Default instance:
A default instance of this parser is already automatically created as such:
name: boolean
type: boolean
trueValues: ["true", "1", "TRUE"]
falseValues: ["false", "0", "FALSE"]
Examples:
name: booleanYesNo
type: boolean
falseValues: ["No", "NO", "no"]
trueValues: ["Yes", "YES", "yes"]
name: booleanZeroOne
type: boolean
trueValues: ["1"]
falseValues: ["0"]
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"boolean"
trueValues
Type: Array of String. Empty array not allowed.
This is a list of values that will be converted to ‘true’.
Items of the array
Type: String
falseValues
Type: Array of String. Empty array not allowed.
This is a list of values that will be converted to ‘false’.
Items of the array
Type: String
String
Type: Object
This parser allows to operate some changes on a string.
It is currently only useful if you need to read files with a non-unicode encoding. Otherwise, while it is possible to declare your own, it would be equivalent to the default one.
Default instance:
A default instance of this parser is already automatically created as such:
name: string
type: string
convertFromCharset: ""
Example:
name: shiftJisString
type: string
convertFromCharset: "Shift_JIS"
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"string"
convertFromCharset (optional)
Type: String
Default value: ""
Internally, RODB only handles UTF-8 strings. If the given value has a different encoding, RODB will convert it automatically from the encoding set in this value.
When this value is an empty string, no conversion is performed.
This setting must match one of the encodings listed in the IANA character sets index (either the MIME or Name columns).
Split
Type: Object
Splits a string to an array of values and applies another parser on each value.
This parser is not a primitive parser, and thus cannot be used as a parameter in an output object.
Examples:
name: splitStringsOnSlashes
type: split
delimiter: "/"
parser: string
name: extractOnlyIntegers
type: split
delimiter: "[^0-9]+"
delimiterIsRegexp: true
parser: integer
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"split"
delimiter
Type: String
If delimiterIsRegexp
is false
This is the string on which the value will be splitted to an array. The implementation uses the strings.Split function of GoLang:
Split slices s into all substrings separated by sep and returns a slice of the substrings between those separators.
If s does not contain sep and sep is not empty, Split returns a slice of length 1 whose only element is s.
If sep is empty, Split splits after each UTF-8 sequence. If both s and sep are empty, Split returns an empty slice.
If delimiterIsRegexp
is true
This is the regular expression that will be used to split a string. The regexp syntax is the RE2 one described here. More specifically, it uses the native Golang’s regexp engine. You can find more details about the engine’s implementation here.
The specific algorithm that is used is the regexp.Split function of GoLang, with n=-1
.
delimiterIsRegexp (optional)
Type: Boolean
Defines whether the delimiter should be interpreted as a regular expression or as a litteral string.
Please see the definition of the delimiter
setting for more information.
parser
Type: String
After splitting the value, another parser will be applied to each member of the resulting array.
This is the name of this parser.
If no specific change is required and an array of strings is expected, please use parser: string
.
JSON
Type: Object
Parses a JSON value (string) to any type it contains (object, array…). (not to be confused with the JSON input parser)
While it is possible to declare your own, it would be equivalent to the default one, because there are currently no available settings.
This parser is not a primitive parser, and thus cannot be used as a parameter in an output object.
Default instance:
A default instance of this parser is already automatically created as such:
name: json
type: json
Example:
name: customJson
type: json
Properties:
name
Type: String
The name of this parser, which any other component will use to refer to it.
type
Must have the value:
"json"