Class CommonSettings<F extends Format>

  • Type Parameters:
    F - the format supported by this settings class.
    All Implemented Interfaces:
    java.lang.Cloneable
    Direct Known Subclasses:
    CommonParserSettings, CommonWriterSettings

    public abstract class CommonSettings<F extends Format>
    extends java.lang.Object
    implements java.lang.Cloneable
    This is the parent class for all configuration classes used by parsers (AbstractParser) and writers (AbstractWriter)

    By default, all parsers and writers work with, at least, the following configuration options:

    • format (each file format provides its default): the input/output format of a given file
    • nullValue (defaults to null):

      when reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

      when writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

    • maxCharsPerColumn (defaults to 4096): The maximum number of characters allowed for any given value being written/read.

      You need this to avoid OutOfMemoryErrors in case a file does not have a valid format. In such cases the parser might just keep reading from the input until its end or the memory is exhausted. This sets a limit which avoids unwanted JVM crashes.

    • maxColumns (defaults to 512): a hard limit on how many columns a record can have. You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with
    • skipEmptyLines (defaults to true):

      when reading, if the parser reads a line that is empty, it will be skipped.

      when writing, if the writer receives an empty or null row to write to the output, it will be ignored

    • ignoreTrailingWhitespaces (defaults to true): removes trailing whitespaces from values being read/written
    • ignoreLeadingWhitespaces (defaults to true): removes leading whitespaces from values being read/written
    • headers (defaults to null): the field names in the input/output, in the sequence they occur.

      when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

      when writing, the given header names will be used to refer to each column and can be used for writing the header row

    • field selection (defaults to none): a selection of fields for reading and writing. Fields can be selected by their name or their position.

      when reading, the selected fields only will be parsed and the remaining fields will be discarded.

      when writing, the selected fields only will be written and the remaining fields will be discarded

    See Also:
    CommonParserSettings, CommonWriterSettings, CsvParserSettings, CsvWriterSettings, FixedWidthParserSettings, FixedWidthWriterSettings
    • Field Detail

      • format

        private F extends Format format
      • nullValue

        private java.lang.String nullValue
      • maxCharsPerColumn

        private int maxCharsPerColumn
      • maxColumns

        private int maxColumns
      • skipEmptyLines

        private boolean skipEmptyLines
      • ignoreTrailingWhitespaces

        private boolean ignoreTrailingWhitespaces
      • ignoreLeadingWhitespaces

        private boolean ignoreLeadingWhitespaces
      • autoConfigurationEnabled

        private boolean autoConfigurationEnabled
      • errorContentLength

        private int errorContentLength
      • skipBitsAsWhitespace

        private boolean skipBitsAsWhitespace
      • headers

        private java.lang.String[] headers
      • headerSourceClass

        java.lang.Class<?> headerSourceClass
    • Constructor Detail

      • CommonSettings

        public CommonSettings()
        Creates a new instance of this settings object using the default format specified by the concrete class that inherits from CommonSettings
    • Method Detail

      • getNullValue

        public java.lang.String getNullValue()
        Returns the String representation of a null value (defaults to null)

        When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

        When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

        Returns:
        the String representation of a null value
      • setNullValue

        public void setNullValue​(java.lang.String emptyValue)
        Sets the String representation of a null value (defaults to null)

        When reading, if the parser does not read any character from the input, the nullValue is used instead of an empty string

        When writing, if the writer has a null object to write to the output, the nullValue is used instead of an empty string

        Parameters:
        emptyValue - the String representation of a null value
      • getMaxCharsPerColumn

        public int getMaxCharsPerColumn()
        The maximum number of characters allowed for any given value being written/read. Used to avoid OutOfMemoryErrors (defaults to 4096).

        If set to -1, then the internal internal array will expand automatically, up to the limit allowed by the JVM

        Returns:
        The maximum number of characters allowed for any given value being written/read
      • setMaxCharsPerColumn

        public void setMaxCharsPerColumn​(int maxCharsPerColumn)
        Defines the maximum number of characters allowed for any given value being written/read. Used to avoid OutOfMemoryErrors (defaults to 4096).

        To enable auto-expansion of the internal array, set this property to -1

        Parameters:
        maxCharsPerColumn - The maximum number of characters allowed for any given value being written/read
      • getSkipEmptyLines

        public boolean getSkipEmptyLines()
        Returns whether or not empty lines should be ignored (defaults to true)

        when reading, if the parser reads a line that is empty, it will be skipped.

        when writing, if the writer receives an empty or null row to write to the output, it will be ignored

        Returns:
        true if empty lines are configured to be ignored, false otherwise
      • setSkipEmptyLines

        public void setSkipEmptyLines​(boolean skipEmptyLines)
        Defines whether or not empty lines should be ignored (defaults to true)

        when reading, if the parser reads a line that is empty, it will be skipped.

        when writing, if the writer receives an empty or null row to write to the output, it will be ignored

        Parameters:
        skipEmptyLines - true if empty lines should be ignored, false otherwise
      • getIgnoreTrailingWhitespaces

        public boolean getIgnoreTrailingWhitespaces()
        Returns whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
        Returns:
        true if trailing whitespaces from values being read/written should be skipped, false otherwise
      • setIgnoreTrailingWhitespaces

        public void setIgnoreTrailingWhitespaces​(boolean ignoreTrailingWhitespaces)
        Defines whether or not trailing whitespaces from values being read/written should be skipped (defaults to true)
        Parameters:
        ignoreTrailingWhitespaces - true if trailing whitespaces from values being read/written should be skipped, false otherwise
      • getIgnoreLeadingWhitespaces

        public boolean getIgnoreLeadingWhitespaces()
        Returns whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
        Returns:
        true if leading whitespaces from values being read/written should be skipped, false otherwise
      • setIgnoreLeadingWhitespaces

        public void setIgnoreLeadingWhitespaces​(boolean ignoreLeadingWhitespaces)
        Defines whether or not leading whitespaces from values being read/written should be skipped (defaults to true)
        Parameters:
        ignoreLeadingWhitespaces - true if leading whitespaces from values being read/written should be skipped, false otherwise
      • setHeaders

        public void setHeaders​(java.lang.String... headers)
        Defines the field names in the input/output, in the sequence they occur (defaults to null).

        when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

        when writing, the given header names will be used to refer to each column and can be used for writing the header row

        Parameters:
        headers - the field name sequence associated with each column in the input/output.
      • setHeadersDerivedFromClass

        void setHeadersDerivedFromClass​(java.lang.Class<?> headerSourceClass,
                                        java.lang.String... headers)
        Defines the field names in the input/output derived from a given class with Parsed annotated attributes/methods.

        when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

        when writing, the given header names will be used to refer to each column and can be used for writing the header row

        Parameters:
        headerSourceClass - the class from which the headers have been derived.
        headers - the field name sequence associated with each column in the input/output.
      • deriveHeadersFrom

        boolean deriveHeadersFrom​(java.lang.Class<?> beanClass)
        Indicates whether headers should be derived from a given class.
        Parameters:
        beanClass - the class to derive headers from
        Returns:
        true if the headers used for parsing/writing should be derived from the given class; otherwise false
      • getHeaders

        public java.lang.String[] getHeaders()
        Returns the field names in the input/output, in the sequence they occur (defaults to null).

        when reading, the given header names will be used to refer to each column irrespective of whether or not the input contains a header row

        when writing, the given header names will be used to refer to each column and can be used for writing the header row

        Returns:
        the field name sequence associated with each column in the input/output.
      • getMaxColumns

        public int getMaxColumns()
        Returns the hard limit of how many columns a record can have (defaults to 512). You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with .
        Returns:
        The maximum number of columns a record can have.
      • setMaxColumns

        public void setMaxColumns​(int maxColumns)
        Defines a hard limit of how many columns a record can have (defaults to 512). You need this to avoid OutOfMemory errors in case of inputs that might be inconsistent with the format you are dealing with.
        Parameters:
        maxColumns - The maximum number of columns a record can have.
      • getFormat

        public F getFormat()
        The format of the file to be parsed/written (returns the format's defaults).
        Returns:
        The format of the file to be parsed/written
      • setFormat

        public void setFormat​(F format)
        Defines the format of the file to be parsed/written (returns the format's defaults).
        Parameters:
        format - The format of the file to be parsed/written
      • selectFields

        public FieldSet<java.lang.String> selectFields​(java.lang.String... fieldNames)
        Selects a sequence of fields for reading/writing by their names.

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        fieldNames - The field names to read/write
        Returns:
        the (modifiable) set of selected fields
      • excludeFields

        public FieldSet<java.lang.String> excludeFields​(java.lang.String... fieldNames)
        Selects fields which will not be read/written, by their names

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        fieldNames - The field names to exclude from the parsing/writing process
        Returns:
        the (modifiable) set of ignored fields
      • selectIndexes

        public FieldSet<java.lang.Integer> selectIndexes​(java.lang.Integer... fieldIndexes)
        Selects a sequence of fields for reading/writing by their positions.

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting indexes "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        fieldIndexes - The indexes to read/write
        Returns:
        the (modifiable) set of selected fields
      • excludeIndexes

        public FieldSet<java.lang.Integer> excludeIndexes​(java.lang.Integer... fieldIndexes)
        Selects columns which will not be read/written, by their positions

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields by index, such as "2" and "0" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        fieldIndexes - indexes of columns to exclude from the parsing/writing process
        Returns:
        the (modifiable) set of ignored fields
      • selectFields

        public FieldSet<java.lang.Enum> selectFields​(java.lang.Enum... columns)
        Selects a sequence of fields for reading/writing by their names

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence provided represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        columns - The columns to read/write
        Returns:
        the (modifiable) set of selected fields
      • excludeFields

        public FieldSet<java.lang.Enum> excludeFields​(java.lang.Enum... columns)
        Selects columns which will not be read/written, by their names

        When reading, only the values of the selected columns will be parsed, and the content of the other columns ignored. The resulting rows will be returned with the selected columns only, in the order specified. If you want to obtain the original row format, with all columns included and nulls in the fields that have not been selected, set CommonParserSettings.setColumnReorderingEnabled(boolean) with false.

        When writing, the sequence of non-excluded fields represents the expected format of the input rows. For example, headers can be "H1,H2,H3", but the input data is coming with values for two columns and in a different order, such as "V_H3, V_H1". Selecting fields "H3" and "H1" will allow the writer to write values in the expected locations. Using the given example, the output row will be generated as: "V_H1,null,V_H3"

        Parameters:
        columns - The columns to exclude from the parsing/writing process
        Returns:
        the (modifiable) set of ignored fields
      • setFieldSet

        private <T> FieldSet<T> setFieldSet​(FieldSet<T> fieldSet,
                                            T... values)
        Replaces the current field selection
        Parameters:
        fieldSet - the new set of selected fields
        values - the values to include to the selection
        Returns:
        the set of selected fields given in as a parameter.
      • getFieldSet

        FieldSet<?> getFieldSet()
        Returns the set of selected fields, if any
        Returns:
        the set of selected fields. Null if no field was selected/excluded
      • getFieldSelector

        FieldSelector getFieldSelector()
        Returns the FieldSelector object, which handles selected fields.
        Returns:
        the FieldSelector object, which handles selected fields. Null if no field was selected/excluded
      • isAutoConfigurationEnabled

        public final boolean isAutoConfigurationEnabled()
        Indicates whether this settings object can automatically derive configuration options. This is used, for example, to define the headers when the user provides a BeanWriterProcessor where the bean class contains a Headers annotation, or to enable header extraction when the bean class of a BeanProcessor has attributes mapping to header names.

        Defaults to true

        Returns:
        true if the automatic configuration feature is enabled, false otherwise
      • setAutoConfigurationEnabled

        public final void setAutoConfigurationEnabled​(boolean autoConfigurationEnabled)
        Indicates whether this settings object can automatically derive configuration options. This is used, for example, to define the headers when the user provides a BeanWriterProcessor where the bean class contains a Headers annotation, or to enable header extraction when the bean class of a BeanProcessor has attributes mapping to header names.
        Parameters:
        autoConfigurationEnabled - a flag to turn the automatic configuration feature on/off.
      • getProcessorErrorHandler

        public <T extends ContextProcessorErrorHandler<T> getProcessorErrorHandler()
        Returns the custom error handler to be used to capture and handle errors that might happen while processing records with a Processor or a RowWriterProcessor (i.e. non-fatal DataProcessingExceptions).

        The parsing/writing process won't stop (unless the error handler rethrows the DataProcessingException or manually stops the process).

        Type Parameters:
        T - the Context type provided by the parser implementation.
        Returns:
        the callback error handler with custom code to manage occurrences of DataProcessingException.
      • createDefaultFormat

        protected abstract F createDefaultFormat()
        Extending classes must implement this method to return the default format settings for their parser/writer
        Returns:
        Default format configuration for the given parser/writer settings.
      • autoConfigure

        final void autoConfigure()
      • trimValues

        public final void trimValues​(boolean trim)
        Configures the parser/writer to trim or keep leading and trailing whitespaces around values This has the same effect as invoking both setIgnoreLeadingWhitespaces(boolean) and setIgnoreTrailingWhitespaces(boolean) with the same value.
        Parameters:
        trim - a flag indicating whether the whitespaces should remove whitespaces around values parsed/written.
      • getErrorContentLength

        public int getErrorContentLength()
        Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs

        If set to 0, then no exceptions will include the content being manipulated in their attributes, and the "<omitted>" string will appear in error messages as the parsed/written content.

        defaults to -1 (no limit)

        .
        Returns:
        the maximum length of contents displayed in exception messages in case of errors while parsing/writing.
      • setErrorContentLength

        public void setErrorContentLength​(int errorContentLength)
        Configures the parser/writer to limit the length of displayed contents being parsed/written in the exception message when an error occurs.

        If set to 0, then no exceptions will include the content being manipulated in their attributes, and the "<omitted>" string will appear in error messages as the parsed/written content.

        defaults to -1 (no limit)

        .
        Parameters:
        errorContentLength - maximum length of contents displayed in exception messages in case of errors while parsing/writing.
      • runAutomaticConfiguration

        void runAutomaticConfiguration()
      • getSkipBitsAsWhitespace

        public final boolean getSkipBitsAsWhitespace()
        Returns a flag indicating whether the parser/writer should skip bit values as whitespace. By default the parser/writer removes control characters and considers a whitespace any character where character <= ' ' evaluates to true. This includes bit values, i.e. 0 (the \0 character) and 1 which might be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters when getIgnoreLeadingWhitespaces() or getIgnoreTrailingWhitespaces() evaluate to true.

        defaults to true

        Returns:
        a flag indicating whether bit values (0 or 1) should be considered whitespace.
      • setSkipBitsAsWhitespace

        public final void setSkipBitsAsWhitespace​(boolean skipBitsAsWhitespace)
        Configures the parser to skip bit values as whitespace. By default the parser/writer removes control characters and considers a whitespace any character where character <= ' ' evaluates to true. This includes bit values, i.e. 0 (the \0 character) and 1 which might be produced by database dumps. Disabling this flag will prevent the parser/writer from discarding these characters when getIgnoreLeadingWhitespaces() or getIgnoreTrailingWhitespaces() evaluate to true.

        defaults to true

        Parameters:
        skipBitsAsWhitespace - a flag indicating whether bit values (0 or 1) should be considered whitespace.
      • getWhitespaceRangeStart

        protected final int getWhitespaceRangeStart()
        Returns the starting decimal range for characters <= ' ' that should be skipped as whitespace, as determined by getSkipBitsAsWhitespace()
        Returns:
        the starting range after which characters will be considered whitespace
      • toString

        public final java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • addConfiguration

        protected void addConfiguration​(java.util.Map<java.lang.String,​java.lang.Object> out)
      • clone

        protected CommonSettings clone​(boolean clearInputSpecificSettings)
        Clones this configuration object to reuse user-provided settings. Properties that are specific to a given input (such as header names and selection of fields) can be reset to their defaults if the clearInputSpecificSettings flag is set to true
        Parameters:
        clearInputSpecificSettings - flag indicating whether to clear settings that are likely to be associated with a given input.
        Returns:
        a copy of the configurations applied to the current instance.
      • clone

        protected CommonSettings clone()
        Clones this configuration object. Use alternative clone(boolean) method to reset properties that are specific to a given input, such as header names and selection of fields.
        Overrides:
        clone in class java.lang.Object
        Returns:
        a copy of all configurations applied to the current instance.
      • clearInputSpecificSettings

        protected void clearInputSpecificSettings()
        Clears settings that are likely to be specific to a given input.