Package com.univocity.parsers.csv
Class CsvFormatDetector
- java.lang.Object
-
- com.univocity.parsers.csv.CsvFormatDetector
-
- All Implemented Interfaces:
InputAnalysisProcess
public abstract class CsvFormatDetector extends java.lang.Object implements InputAnalysisProcess
AnInputAnalysisProcess
to detect column delimiters, quotes and quote escapes in a CSV input.
-
-
Field Summary
Fields Modifier and Type Field Description private char[]
allowedDelimiters
private char
comment
private char[]
delimiterPreference
private int
MAX_ROW_SAMPLES
private char
normalizedNewLine
private char
suggestedDelimiter
private char
suggestedQuote
private char
suggestedQuoteEscape
private int
whitespaceRangeStart
-
Constructor Summary
Constructors Constructor Description CsvFormatDetector(int maxRowSamples, CsvParserSettings settings, int whitespaceRangeStart)
Builds a newCsvFormatDetector
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract void
apply(char delimiter, char quote, char quoteEscape)
Applies the discovered CSV format elements to theCsvParser
protected java.util.Map<java.lang.Character,java.lang.Integer>
calculateTotals(java.util.List<java.util.Map<java.lang.Character,java.lang.Integer>> symbolsPerRow)
void
execute(char[] characters, int length)
A sequence of characters of the input buffer to be analyzed.protected char
getChar(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar, boolean min)
Returns the character with the highest or lowest associated number.protected void
increment(java.util.Map<java.lang.Character,java.lang.Integer> map, char symbol)
Increments the number associated with a character in a map by 1protected void
increment(java.util.Map<java.lang.Character,java.lang.Integer> map, char symbol, int incrementSize)
Increments the number associated with a character in a mapprotected boolean
isAllowedDelimiter(char ch)
protected boolean
isSymbol(char ch)
protected char
max(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar)
Returns the character with the highest associated number.protected char
min(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar)
Returns the character with the lowest associated number.protected char
pickDelimiter(java.util.Map<java.lang.Character,java.lang.Integer> sums, java.util.Map<java.lang.Character,java.lang.Integer> totals)
-
-
-
Field Detail
-
MAX_ROW_SAMPLES
private final int MAX_ROW_SAMPLES
-
comment
private final char comment
-
suggestedDelimiter
private final char suggestedDelimiter
-
normalizedNewLine
private final char normalizedNewLine
-
whitespaceRangeStart
private final int whitespaceRangeStart
-
allowedDelimiters
private char[] allowedDelimiters
-
delimiterPreference
private char[] delimiterPreference
-
suggestedQuote
private final char suggestedQuote
-
suggestedQuoteEscape
private final char suggestedQuoteEscape
-
-
Constructor Detail
-
CsvFormatDetector
public CsvFormatDetector(int maxRowSamples, CsvParserSettings settings, int whitespaceRangeStart)
Builds a newCsvFormatDetector
- Parameters:
maxRowSamples
- the number of row samples to collect before analyzing the statisticssettings
- the configuration provided by the user with potential defaults in case the detection is unable to discover the proper column delimiter or quote character.whitespaceRangeStart
- starting range of characters considered to be whitespace.
-
-
Method Detail
-
calculateTotals
protected java.util.Map<java.lang.Character,java.lang.Integer> calculateTotals(java.util.List<java.util.Map<java.lang.Character,java.lang.Integer>> symbolsPerRow)
-
execute
public void execute(char[] characters, int length)
Description copied from interface:InputAnalysisProcess
A sequence of characters of the input buffer to be analyzed.- Specified by:
execute
in interfaceInputAnalysisProcess
- Parameters:
characters
- the input bufferlength
- the last character position loaded into the buffer.
-
pickDelimiter
protected char pickDelimiter(java.util.Map<java.lang.Character,java.lang.Integer> sums, java.util.Map<java.lang.Character,java.lang.Integer> totals)
-
increment
protected void increment(java.util.Map<java.lang.Character,java.lang.Integer> map, char symbol)
Increments the number associated with a character in a map by 1- Parameters:
map
- the map of characters and their numberssymbol
- the character whose number should be increment
-
increment
protected void increment(java.util.Map<java.lang.Character,java.lang.Integer> map, char symbol, int incrementSize)
Increments the number associated with a character in a map- Parameters:
map
- the map of characters and their numberssymbol
- the character whose number should be incrementincrementSize
- the size of the increment
-
min
protected char min(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar)
Returns the character with the lowest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is empty- Returns:
- the character with the lowest number associated.
-
max
protected char max(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar)
Returns the character with the highest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is empty- Returns:
- the character with the highest number associated.
-
getChar
protected char getChar(java.util.Map<java.lang.Character,java.lang.Integer> map, java.util.Map<java.lang.Character,java.lang.Integer> totals, char defaultChar, boolean min)
Returns the character with the highest or lowest associated number.- Parameters:
map
- the map of characters and their numbersdefaultChar
- the default character to return in case the map is emptymin
- a flag indicating whether to return the character associated with the lowest number in the map. Iffalse
then the character associated with the highest number found will be returned.- Returns:
- the character with the highest/lowest number associated.
-
isSymbol
protected boolean isSymbol(char ch)
-
isAllowedDelimiter
protected boolean isAllowedDelimiter(char ch)
-
apply
protected abstract void apply(char delimiter, char quote, char quoteEscape)
Applies the discovered CSV format elements to theCsvParser
- Parameters:
delimiter
- the discovered delimiter characterquote
- the discovered quote characterquoteEscape
- the discovered quote escape character.
-
-