public final class ScannerImpl extends java.lang.Object implements Scanner
Scanner produces tokens of the following types: STREAM-START STREAM-END DIRECTIVE(name, value) DOCUMENT-START DOCUMENT-END BLOCK-SEQUENCE-START BLOCK-MAPPING-START BLOCK-END FLOW-SEQUENCE-START FLOW-MAPPING-START FLOW-SEQUENCE-END FLOW-MAPPING-END BLOCK-ENTRY FLOW-ENTRY KEY VALUE ALIAS(value) ANCHOR(value) TAG(value) SCALAR(value, plain, style) Read comments in the Scanner code for more details.
Modifier and Type | Class and Description |
---|---|
private static class |
ScannerImpl.Chomping
Chomping the tail may have 3 values - yes, no, not defined.
|
Modifier and Type | Field and Description |
---|---|
private boolean |
allowSimpleKey
A simple key is a key that is not denoted by the '?' indicator.
|
private boolean |
done |
static java.util.Map<java.lang.Character,java.lang.Integer> |
ESCAPE_CODES
A mapping from a character to a number of bytes to read-ahead for that
escape sequence.
|
static java.util.Map<java.lang.Character,java.lang.String> |
ESCAPE_REPLACEMENTS
A mapping from an escaped character in the input stream to the character
that they should be replaced with.
|
private int |
flowLevel |
private int |
indent |
private ArrayStack<java.lang.Integer> |
indents |
private static java.util.regex.Pattern |
NOT_HEXA
A regular expression matching characters which are not in the hexadecimal
set (0-9, A-F, a-f).
|
private java.util.Map<java.lang.Integer,SimpleKey> |
possibleSimpleKeys |
private StreamReader |
reader |
private java.util.List<Token> |
tokens |
private int |
tokensTaken |
Constructor and Description |
---|
ScannerImpl(StreamReader reader) |
Modifier and Type | Method and Description |
---|---|
private boolean |
addIndent(int column)
Check if we need to increase indentation.
|
private boolean |
checkBlockEntry()
Returns true if the next thing on the reader is a block token.
|
private boolean |
checkDirective()
Returns true if the next thing on the reader is a directive, given that
the leading '%' has already been checked.
|
private boolean |
checkDocumentEnd()
Returns true if the next thing on the reader is a document-end ("...").
|
private boolean |
checkDocumentStart()
Returns true if the next thing on the reader is a document-start ("---").
|
private boolean |
checkKey()
Returns true if the next thing on the reader is a key token.
|
private boolean |
checkPlain()
Returns true if the next thing on the reader is a plain token.
|
boolean |
checkToken(Token.ID... choices)
Check whether the next token is one of the given types.
|
private boolean |
checkValue()
Returns true if the next thing on the reader is a value token.
|
private void |
fetchAlias()
Fetch an alias, which is a reference to an anchor.
|
private void |
fetchAnchor()
Fetch an anchor.
|
private void |
fetchBlockEntry()
Fetch an entry in the block style.
|
private void |
fetchBlockScalar(char style)
Fetch a block scalar (literal or folded).
|
private void |
fetchDirective()
Fetch a YAML directive.
|
private void |
fetchDocumentEnd()
Fetch a document-end token ("...").
|
private void |
fetchDocumentIndicator(boolean isDocumentStart)
Fetch a document indicator, either "---" for "document-start", or else
"..." for "document-end.
|
private void |
fetchDocumentStart()
Fetch a document-start token ("---").
|
private void |
fetchDouble()
Fetch a double-quoted (") scalar.
|
private void |
fetchFlowCollectionEnd(boolean isMappingEnd)
Fetch a flow-style collection end, which is either a sequence or a
mapping.
|
private void |
fetchFlowCollectionStart(boolean isMappingStart)
Fetch a flow-style collection start, which is either a sequence or a
mapping.
|
private void |
fetchFlowEntry()
Fetch an entry in the flow style.
|
private void |
fetchFlowMappingEnd() |
private void |
fetchFlowMappingStart() |
private void |
fetchFlowScalar(char style)
Fetch a flow scalar (single- or double-quoted).
|
private void |
fetchFlowSequenceEnd() |
private void |
fetchFlowSequenceStart() |
private void |
fetchFolded()
Fetch a folded scalar, denoted with a greater-than sign.
|
private void |
fetchKey()
Fetch a key in a block-style mapping.
|
private void |
fetchLiteral()
Fetch a literal scalar, denoted with a vertical-bar.
|
private void |
fetchMoreTokens()
Fetch one or more tokens from the StreamReader.
|
private void |
fetchPlain()
Fetch a plain scalar.
|
private void |
fetchSingle()
Fetch a single-quoted (') scalar.
|
private void |
fetchStreamEnd() |
private void |
fetchStreamStart()
We always add STREAM-START as the first token and STREAM-END as the last
token.
|
private void |
fetchTag()
Fetch a tag.
|
private void |
fetchValue()
Fetch a value in a block-style mapping.
|
Token |
getToken()
Return the next token, removing it from the queue.
|
private boolean |
needMoreTokens()
Returns true if more tokens should be scanned.
|
private int |
nextPossibleSimpleKey()
Return the number of the nearest possible simple key.
|
Token |
peekToken()
Return the next token, but do not delete it from the queue.
|
private void |
removePossibleSimpleKey()
Remove the saved possible key position at the current flow level.
|
private void |
savePossibleSimpleKey()
The next token may start a simple key.
|
private Token |
scanAnchor(boolean isAnchor)
The specification does not restrict characters for anchors and
aliases.
|
private Token |
scanBlockScalar(char style) |
private java.lang.Object[] |
scanBlockScalarBreaks(int indent) |
private java.lang.String |
scanBlockScalarIgnoredLine(Mark startMark)
Scan to the end of the line after a block scalar has been scanned; the
only things that are permitted at this time are comments and spaces.
|
private java.lang.Object[] |
scanBlockScalarIndentation()
Scans for the indentation of a block scalar implicitly.
|
private ScannerImpl.Chomping |
scanBlockScalarIndicators(Mark startMark)
Scan a block scalar indicator.
|
private Token |
scanDirective() |
private java.lang.String |
scanDirectiveIgnoredLine(Mark startMark) |
private java.lang.String |
scanDirectiveName(Mark startMark)
Scan a directive name.
|
private Token |
scanFlowScalar(char style)
Scan a flow-style scalar.
|
private java.lang.String |
scanFlowScalarBreaks(Mark startMark) |
private java.lang.String |
scanFlowScalarNonSpaces(boolean doubleQuoted,
Mark startMark)
Scan some number of flow-scalar non-space characters.
|
private java.lang.String |
scanFlowScalarSpaces(Mark startMark) |
private java.lang.String |
scanLineBreak()
Scan a line break, transforming:
|
private Token |
scanPlain()
Scan a plain scalar.
|
private java.lang.String |
scanPlainSpaces()
See the specification for details.
|
private Token |
scanTag()
Scan a Tag property.
|
private java.lang.String |
scanTagDirectiveHandle(Mark startMark)
Scan a %TAG directive's handle.
|
private java.lang.String |
scanTagDirectivePrefix(Mark startMark)
Scan a %TAG directive's prefix.
|
private java.util.List<java.lang.String> |
scanTagDirectiveValue(Mark startMark)
Read a %TAG directive value:
|
private java.lang.String |
scanTagHandle(java.lang.String name,
Mark startMark)
Scan a Tag handle.
|
private java.lang.String |
scanTagUri(java.lang.String name,
Mark startMark)
Scan a Tag URI.
|
private void |
scanToNextToken()
We ignore spaces, line breaks and comments.
|
private java.lang.String |
scanUriEscapes(java.lang.String name,
Mark startMark)
Scan a sequence of %-escaped URI escape codes and convert them into a
String representing the unescaped values.
|
private java.lang.Integer |
scanYamlDirectiveNumber(Mark startMark)
Read a %YAML directive number: this is either the major or the minor
part.
|
private java.util.List<java.lang.Integer> |
scanYamlDirectiveValue(Mark startMark) |
private void |
stalePossibleSimpleKeys()
Remove entries that are no longer possible simple keys.
|
private void |
unwindIndent(int col)
* Handle implicitly ending multiple levels of block nodes by decreased
indentation.
|
private static final java.util.regex.Pattern NOT_HEXA
public static final java.util.Map<java.lang.Character,java.lang.String> ESCAPE_REPLACEMENTS
public static final java.util.Map<java.lang.Character,java.lang.Integer> ESCAPE_CODES
\xHH : escaped 8-bit Unicode character \uHHHH : escaped 16-bit Unicode character \UHHHHHHHH : escaped 32-bit Unicode character
private final StreamReader reader
private boolean done
private int flowLevel
private java.util.List<Token> tokens
private int tokensTaken
private int indent
private ArrayStack<java.lang.Integer> indents
private boolean allowSimpleKey
A simple key is a key that is not denoted by the '?' indicator. Example of simple keys: --- block simple key: value ? not a simple key: : { flow simple key: value } We emit the KEY token before all keys, so when we find a potential simple key, we try to locate the corresponding ':' indicator. Simple keys should be limited to a single line and 1024 characters. Can a simple key start at the current position? A simple key may start: - at the beginning of the line, not counting indentation spaces (in block context), - after '{', '[', ',' (in the flow context), - after '?', ':', '-' (in the block context). In the block context, this flag also signifies if a block collection may start at the current position.
private java.util.Map<java.lang.Integer,SimpleKey> possibleSimpleKeys
public ScannerImpl(StreamReader reader)
public boolean checkToken(Token.ID... choices)
checkToken
in interface Scanner
choices
- token IDs.true
if the next token can be assigned to a variable
of at least one of the given types. Returns false
if
no more tokens are available.public Token peekToken()
peekToken
in interface Scanner
Scanner.getToken()
public Token getToken()
private boolean needMoreTokens()
private void fetchMoreTokens()
private int nextPossibleSimpleKey()
private void stalePossibleSimpleKeys()
Remove entries that are no longer possible simple keys. According to the YAML specification, simple keys - should be limited to a single line, - should be no longer than 1024 characters. Disabling this procedure will allow simple keys of any length and height (may cause problems if indentation is broken though).
private void savePossibleSimpleKey()
private void removePossibleSimpleKey()
private void unwindIndent(int col)
1) book one: 2) part one: 3) chapter one 4) part two: 5) chapter one 6) chapter two 7) book two:In flow context, tokens should respect indentation. Actually the condition should be `self.indent >= column` according to the spec. But this condition will prohibit intuitively correct constructions such as key : { }
private boolean addIndent(int column)
private void fetchStreamStart()
private void fetchStreamEnd()
private void fetchDirective()
private void fetchDocumentStart()
private void fetchDocumentEnd()
private void fetchDocumentIndicator(boolean isDocumentStart)
private void fetchFlowSequenceStart()
private void fetchFlowMappingStart()
private void fetchFlowCollectionStart(boolean isMappingStart)
private void fetchFlowSequenceEnd()
private void fetchFlowMappingEnd()
private void fetchFlowCollectionEnd(boolean isMappingEnd)
private void fetchFlowEntry()
private void fetchAlias()
*(anchor name)
private void fetchAnchor()
&(anchor name)
private void fetchLiteral()
private void fetchFolded()
private void fetchBlockScalar(char style)
private void fetchSingle()
private void fetchDouble()
private void fetchFlowScalar(char style)
private void fetchPlain()
private boolean checkDirective()
private boolean checkDocumentStart()
private boolean checkDocumentEnd()
private boolean checkBlockEntry()
private boolean checkKey()
private boolean checkValue()
private boolean checkPlain()
private void scanToNextToken()
We ignore spaces, line breaks and comments. If we find a line break in the block context, we set the flag `allow_simple_key` on. The byte order mark is stripped if it's the first character in the stream. We do not yet support BOM inside the stream as the specification requires. Any such mark will be considered as a part of the document. TODO: We need to make tab handling rules more sane. A good rule is Tabs cannot precede tokens BLOCK-SEQUENCE-START, BLOCK-MAPPING-START, BLOCK-END, KEY(block), VALUE(block), BLOCK-ENTRY So the checking code is if <TAB>: self.allow_simple_keys = False We also need to add the check for `allow_simple_keys == True` to `unwind_indent` before issuing BLOCK-END. Scanners for block, flow, and plain scalars need to be modified.
private Token scanDirective()
private java.lang.String scanDirectiveName(Mark startMark)
private java.util.List<java.lang.Integer> scanYamlDirectiveValue(Mark startMark)
private java.lang.Integer scanYamlDirectiveNumber(Mark startMark)
private java.util.List<java.lang.String> scanTagDirectiveValue(Mark startMark)
Read a %TAG directive value:
s-ignored-space+ c-tag-handle s-ignored-space+ ns-tag-prefix s-l-comments
private java.lang.String scanTagDirectiveHandle(Mark startMark)
private java.lang.String scanTagDirectivePrefix(Mark startMark)
private java.lang.String scanDirectiveIgnoredLine(Mark startMark)
private Token scanAnchor(boolean isAnchor)
The specification does not restrict characters for anchors and aliases. This may lead to problems, for instance, the document: [ *alias, value ] can be interpreted in two ways, as [ "value" ] and [ *alias , "value" ] Therefore we restrict aliases to numbers and ASCII letters.
private Token scanTag()
Scan a Tag property. A Tag property may be specified in one of three ways: c-verbatim-tag, c-ns-shorthand-tag, or c-ns-non-specific-tag
c-verbatim-tag takes the form !<ns-uri-char+> and must be delivered verbatim (as-is) to the application. In particular, verbatim tags are not subject to tag resolution.
c-ns-shorthand-tag is a valid tag handle followed by a non-empty suffix. If the tag handle is a c-primary-tag-handle ('!') then the suffix must have all exclamation marks properly URI-escaped (%21); otherwise, the string will look like a named tag handle: !foo!bar would be interpreted as (handle="!foo!", suffix="bar").
c-ns-non-specific-tag is always a lone '!'; this is only useful for plain scalars, where its specification means that the scalar MUST be resolved to have type tag:yaml.org,2002:str.
TODO SnakeYaml incorrectly ignores c-ns-non-specific-tag right now.private Token scanBlockScalar(char style)
private ScannerImpl.Chomping scanBlockScalarIndicators(Mark startMark)
private java.lang.String scanBlockScalarIgnoredLine(Mark startMark)
private java.lang.Object[] scanBlockScalarIndentation()
private java.lang.Object[] scanBlockScalarBreaks(int indent)
private Token scanFlowScalar(char style)
See the specification for details. Note that we loose indentation rules for quoted scalars. Quoted scalars don't need to adhere indentation because " and ' clearly mark the beginning and the end of them. Therefore we are less restrictive then the specification requires. We only need to check that document separators are not included in scalars.
private java.lang.String scanFlowScalarNonSpaces(boolean doubleQuoted, Mark startMark)
private java.lang.String scanFlowScalarSpaces(Mark startMark)
private java.lang.String scanFlowScalarBreaks(Mark startMark)
private Token scanPlain()
See the specification for details. We add an additional restriction for the flow context: plain scalars in the flow context cannot contain ',', ':' and '?'. We also keep track of the `allow_simple_key` flag here. Indentation rules are loosed for the flow context.
private java.lang.String scanPlainSpaces()
private java.lang.String scanTagHandle(java.lang.String name, Mark startMark)
Scan a Tag handle. A Tag handle takes one of three forms:
"!" (c-primary-tag-handle) "!!" (ns-secondary-tag-handle) "!(name)!" (c-named-tag-handle)Where (name) must be formatted as an ns-word-char.
private java.lang.String scanTagUri(java.lang.String name, Mark startMark)
Scan a Tag URI. This scanning is valid for both local and global tag directives, because both appear to be valid URIs as far as scanning is concerned. The difference may be distinguished later, in parsing. This method will scan for ns-uri-char*, which covers both cases.
This method performs no verification that the scanned URI conforms to any particular kind of URI specification.
private java.lang.String scanUriEscapes(java.lang.String name, Mark startMark)
Scan a sequence of %-escaped URI escape codes and convert them into a String representing the unescaped values.
FIXME This method fails for more than 256 bytes' worth of URI-encoded characters in a row. Is this possible? Is this a use-case?private java.lang.String scanLineBreak()
'\r\n' : '\n' '\r' : '\n' '\n' : '\n' '\x85' : '\n' default : ''