Class Lexer
String delimiter characters take precedence over regular delimiters. Raw String delimiter characters take precedence over regular string delimiters. Delimiter characters take parsing priority over other characters. Delimiter evaluation priority goes: Comment Delimiter, Delimiter. Identifier evaluation priority goes: Keyword, CaseInsensitiveKeyword, Identifier.
Other implementations of this class may manipulate the stack as well (such as ones that do in-language stream inclusion).
If the system property net.mtrop.doom.struct.Lexer.debug
is set to true
, this does debugging output to System.out
.
Lexer functions are NOT thread-safe.
- Author:
- Matthew Tropiano
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
This is a info kernel that tells aLexer
how to interpret certain characters and identifiers.static class
Abstract parser class.static class
This holds a series ofReader
streams such that the stream on top is the current active stream.static class
Lexer token object. -
Field Summary
Modifier and TypeFieldDescriptionstatic boolean
static final char
Lexer end-of-stream char.static final char
Lexer end-of-stream char.static final char
Lexer newline char. -
Constructor Summary
ConstructorDescriptionLexer
(Lexer.Kernel kernel, Reader in) Creates a new lexer around a reader.Lexer
(Lexer.Kernel kernel, String in) Creates a new lexer around a String, that will be wrapped into a StringReader.Lexer
(Lexer.Kernel kernel, String name, Reader in) Creates a new lexer around a reader.Lexer
(Lexer.Kernel kernel, String name, String in) Creates a new lexer around a String, that will be wrapped into a StringReader. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Clears the current token lexeme buffer.protected String
Gets the current token lexeme.int
Gets the lexer's current stream's line number.Gets the current stream.protected char
getRawStringEnd
(char c) Gets the character that ends a raw String, using the starting character.protected int
getState()
protected char
getStringEnd
(char c) Gets the character that ends a String, using the starting character.protected boolean
isCommentEndDelimiterStart
(char c) Checks if this is a (or the start of a) block-comment-ending delimiter character.protected boolean
isDelimiterStart
(char c) Checks if this is a (or the start of a) delimiter character.protected boolean
isDigit
(char c) Convenience method forCharacter.isDigit(char)
.protected boolean
isExponent
(char c) Checks if char is the exponent character in a number.protected boolean
isExponentSign
(char c) Checks if char is the exponent sign character in a number.protected boolean
isHexDigit
(char c) Returns true if this is a hex digit (0-9, A-F, a-f).protected boolean
isLetter
(char c) Convenience method forCharacter.isLetter(char)
.protected boolean
isLexerEnd
(char c) Checks if a char equalsEND_OF_LEXER
.protected boolean
isNewline
(char c) Checks if a char equalsNEWLINE
.protected boolean
isPoint
(char c) Checks if a character is a decimal point (depends on locale/kernel).protected boolean
isRawStringStart
(char c) Checks if this is a character that starts a multiline String.protected boolean
isSpace
(char c) Checks if a char is a space.protected boolean
isStreamEnd
(char c) Checks if a char equalsEND_OF_STREAM
.protected boolean
isStringEnd
(char c) Checks if this is a character that ends a String.protected boolean
isStringEscape
(char c) Checks if this is a character that is a String escape character.protected boolean
isStringStart
(char c) Checks if this is a character that starts a String.protected boolean
isTab
(char c) Checks if a char is a tab.protected boolean
isUnderscore
(char c) Convenience method forc == '_'
.protected boolean
isWhitespace
(char c) Convenience method forCharacter.isWhitespace(char)
.protected boolean
modifyType
(Lexer.Token token) Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.Gets the next token.void
pushStream
(String name, Reader in) Pushes a stream onto the encapsulated reader stack.protected char
readChar()
Reads a character from the stream.protected void
saveChar
(char c) Saves a character for the next token.protected void
setDelimBreak
(char delimChar) Sets if we are in a delimiter break.protected void
setMultilineStringStartAndEnd
(char c) Sets the end character for a string.protected void
setState
(int state) Sets the current state.protected void
setStringStartAndEnd
(char c) Sets the end character for a string.
-
Field Details
-
DEBUG
public static boolean DEBUG -
END_OF_LEXER
public static final char END_OF_LEXERLexer end-of-stream char.- See Also:
-
END_OF_STREAM
public static final char END_OF_STREAMLexer end-of-stream char.- See Also:
-
NEWLINE
public static final char NEWLINELexer newline char.- See Also:
-
-
Constructor Details
-
Lexer
Creates a new lexer around a String, that will be wrapped into a StringReader. This will also assign this lexer a default name.- Parameters:
kernel
- the lexer kernel to use for defining how to parse the input text.in
- the string to read from.
-
Lexer
Creates a new lexer around a String, that will be wrapped into a StringReader.- Parameters:
kernel
- the lexer kernel to use for defining how to parse the input text.name
- the name of this lexer.in
- the reader to read from.
-
Lexer
Creates a new lexer around a reader. This will also assign this lexer a default name.- Parameters:
kernel
- the kernel to use for this lexer.in
- the reader to read from.
-
Lexer
Creates a new lexer around a reader.- Parameters:
kernel
- the kernel to use for this lexer.name
- the name of this lexer.in
- the reader to read from.
-
-
Method Details
-
getCurrentStreamName
- Returns:
- the lexer's current stream name.
-
getCurrentLineNumber
public int getCurrentLineNumber()Gets the lexer's current stream's line number.- Returns:
- the lexer's current stream's line number, or -1 if at Lexer end.
-
getCurrentStream
Gets the current stream.- Returns:
- the name of the current stream.
-
pushStream
Pushes a stream onto the encapsulated reader stack.- Parameters:
name
- the name of the stream.in
- the reader reader.
-
nextToken
Gets the next token. If there are no tokens left to read, this will return null. This method is NOT thread-safe!- Returns:
- the next token, or null if no more tokens to read.
- Throws:
IOException
- if a token cannot be read by the underlying Reader.
-
modifyType
Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.By default, this handles space, tab, newline, delimiter, and identifier.
If this method is overridden, this should have
if (super.modifyType(token)) return true;
right at the beginning.- Parameters:
token
- the original token.- Returns:
- true if the token's contents were changed, false if not.
-
readChar
Reads a character from the stream.- Returns:
- the character read, or
END_OF_LEXER
if no more characters, orEND_OF_STREAM
if end of current stream. - Throws:
IOException
- if a character cannot be read.
-
getState
protected int getState()- Returns:
- the current state.
-
setState
protected void setState(int state) Sets the current state.- Parameters:
state
- the new state.
-
setDelimBreak
protected void setDelimBreak(char delimChar) Sets if we are in a delimiter break.- Parameters:
delimChar
- the delimiter character that starts the break.
-
saveChar
protected void saveChar(char c) Saves a character for the next token.- Parameters:
c
- the character to save into the current token.
-
setStringStartAndEnd
protected void setStringStartAndEnd(char c) Sets the end character for a string.- Parameters:
c
- the character to set.
-
setMultilineStringStartAndEnd
protected void setMultilineStringStartAndEnd(char c) Sets the end character for a string.- Parameters:
c
- the character to set.
-
getCurrentLexeme
Gets the current token lexeme.- Returns:
- the current contents of the token lexeme builder buffer.
-
clearCurrentLexeme
protected void clearCurrentLexeme()Clears the current token lexeme buffer. -
isUnderscore
protected boolean isUnderscore(char c) Convenience method forc == '_'
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isLetter
protected boolean isLetter(char c) Convenience method forCharacter.isLetter(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isDigit
protected boolean isDigit(char c) Convenience method forCharacter.isDigit(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isHexDigit
protected boolean isHexDigit(char c) Returns true if this is a hex digit (0-9, A-F, a-f).- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isWhitespace
protected boolean isWhitespace(char c) Convenience method forCharacter.isWhitespace(char)
.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isPoint
protected boolean isPoint(char c) Checks if a character is a decimal point (depends on locale/kernel).- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isExponent
protected boolean isExponent(char c) Checks if char is the exponent character in a number.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isExponentSign
protected boolean isExponentSign(char c) Checks if char is the exponent sign character in a number.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isSpace
protected boolean isSpace(char c) Checks if a char is a space.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isTab
protected boolean isTab(char c) Checks if a char is a tab.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringEscape
protected boolean isStringEscape(char c) Checks if this is a character that is a String escape character.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringStart
protected boolean isStringStart(char c) Checks if this is a character that starts a String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isRawStringStart
protected boolean isRawStringStart(char c) Checks if this is a character that starts a multiline String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
isStringEnd
protected boolean isStringEnd(char c) Checks if this is a character that ends a String.- Parameters:
c
- the character to test.- Returns:
- true if so, false if not.
-
getStringEnd
protected char getStringEnd(char c) Gets the character that ends a String, using the starting character.- Parameters:
c
- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a string.
-
getRawStringEnd
protected char getRawStringEnd(char c) Gets the character that ends a raw String, using the starting character.- Parameters:
c
- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a multi-line string.
-
isDelimiterStart
protected boolean isDelimiterStart(char c) Checks if this is a (or the start of a) delimiter character.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isCommentEndDelimiterStart
protected boolean isCommentEndDelimiterStart(char c) Checks if this is a (or the start of a) block-comment-ending delimiter character.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isStreamEnd
protected boolean isStreamEnd(char c) Checks if a char equalsEND_OF_STREAM
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isLexerEnd
protected boolean isLexerEnd(char c) Checks if a char equalsEND_OF_LEXER
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-
isNewline
protected boolean isNewline(char c) Checks if a char equalsNEWLINE
.- Parameters:
c
- the character input.- Returns:
- true if so, false if not.
-