Class Lexer
String delimiter characters take precedence over regular delimiters. Raw String delimiter characters take precedence over regular string delimiters. Delimiter characters take parsing priority over other characters. Delimiter evaluation priority goes: Comment Delimiter, Delimiter. Identifier evaluation priority goes: Keyword, CaseInsensitiveKeyword, Identifier.
Other implementations of this class may manipulate the stack as well (such as ones that do in-language stream inclusion).
If the system property com.blackrook.base.Lexer.debug is set to true, this does debugging output to System.out.
Lexer functions are NOT thread-safe.
- Author:
- Matthew Tropiano
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classThis is a info kernel that tells aLexerhow to interpret certain characters and identifiers.static classAbstract parser class.static classThis holds a series ofReaderstreams such that the stream on top is the current active stream.static classLexer token object. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic booleanstatic final charLexer end-of-stream char.static final charLexer end-of-stream char.static final charLexer newline char. -
Constructor Summary
ConstructorsConstructorDescriptionLexer(Lexer.Kernel kernel) Creates a new lexer with no streams.Lexer(Lexer.Kernel kernel, Reader in) Creates a new lexer around a reader.Lexer(Lexer.Kernel kernel, String in) Creates a new lexer around a String, that will be wrapped into a StringReader.Lexer(Lexer.Kernel kernel, String name, Reader in) Creates a new lexer around a reader.Lexer(Lexer.Kernel kernel, String name, String in) Creates a new lexer around a String, that will be wrapped into a StringReader. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidClears the current token lexeme buffer.protected StringGets the current token lexeme.intGets the lexer's current stream's line number.Gets the current stream.protected chargetRawStringEnd(char c) Gets the character that ends a raw String, using the starting character.protected CharactergetRawStringStartAndEnd(char c) Gets the end character for a multi-line, "raw" string start character.protected chargetStringEnd(char c) Gets the character that ends a String, using the starting character.protected CharactergetStringStartAndEnd(char c) Gets the end character for a string start character.protected booleanisDelimiterStart(char c) Checks if this is a (or the start of a) delimiter character.protected booleanisDigit(char c) Convenience method forCharacter.isDigit(char).protected booleanisExponent(char c) Checks if char is the exponent character in a number.protected booleanisExponentSign(char c) Checks if char is the exponent sign character in a number.protected booleanisHexDigit(char c) Returns true if this is a hex digit (0-9, A-F, a-f).protected booleanisLetter(char c) Convenience method forCharacter.isLetter(char).protected booleanisLexerEnd(char c) Checks if a char equalsEND_OF_LEXER.protected booleanisNewline(char c) Checks if a char equalsNEWLINE.protected booleanisPoint(char c) Checks if a character is a decimal point (depends on locale/kernel).protected booleanisRawStringStart(char c) Checks if this is a character that starts a multiline String.protected booleanisSpace(char c) Checks if a char is a space.protected booleanisStreamEnd(char c) Checks if a char equalsEND_OF_STREAM.protected booleanisStringEscape(char c) Checks if this is a character that is a String escape character.protected booleanisStringStart(char c) Checks if this is a character that starts a String.protected booleanisTab(char c) Checks if a char is a tab.protected booleanisUnderscore(char c) Convenience method forc == '_'.protected booleanisWhitespace(char c) Convenience method forCharacter.isWhitespace(char).protected booleanmodifyType(Lexer.Token token) Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.Gets the next token.voidpushStream(String name, Reader in) Pushes a stream onto the encapsulated reader stack.protected charreadChar()Reads a character from the stream.protected voidsaveChar(char c) Saves a character for the next token.protected voidsetDelimBreak(char delimChar) Sets if we are in a delimiter break.
-
Field Details
-
DEBUG
public static boolean DEBUG -
END_OF_LEXER
public static final char END_OF_LEXERLexer end-of-stream char.- See Also:
-
END_OF_STREAM
public static final char END_OF_STREAMLexer end-of-stream char.- See Also:
-
NEWLINE
public static final char NEWLINELexer newline char.- See Also:
-
-
Constructor Details
-
Lexer
Creates a new lexer with no streams. This will also assign this lexer a default name.- Parameters:
kernel- the lexer kernel to use for defining how to parse the input text.
-
Lexer
Creates a new lexer around a String, that will be wrapped into a StringReader. This will also assign this lexer a default name.- Parameters:
kernel- the lexer kernel to use for defining how to parse the input text.in- the string to read from.
-
Lexer
Creates a new lexer around a String, that will be wrapped into a StringReader.- Parameters:
kernel- the lexer kernel to use for defining how to parse the input text.name- the name of this lexer.in- the reader to read from.
-
Lexer
Creates a new lexer around a reader. This will also assign this lexer a default name.- Parameters:
kernel- the kernel to use for this lexer.in- the reader to read from.
-
Lexer
Creates a new lexer around a reader.- Parameters:
kernel- the kernel to use for this lexer.name- the name of this lexer.in- the reader to read from. If null, does not push a stream.
-
-
Method Details
-
getCurrentStreamName
- Returns:
- the lexer's current stream name.
-
getCurrentLineNumber
public int getCurrentLineNumber()Gets the lexer's current stream's line number.- Returns:
- the lexer's current stream's line number, or -1 if at Lexer end.
-
getCurrentStream
Gets the current stream.- Returns:
- the name of the current stream.
-
pushStream
-
nextToken
Gets the next token. If there are no tokens left to read, this will return null. This method is NOT thread-safe!- Returns:
- the next token, or null if no more tokens to read.
- Throws:
IOException- if a token cannot be read by the underlying Reader.
-
modifyType
Called when the lexer wants to create a token, but the lexeme of the token may cause this token to be a different type.By default, this handles space, tab, newline, delimiter, and identifier.
If this method is overridden, this should have
if (super.modifyType(token)) return true;right at the beginning.- Parameters:
token- the original token.- Returns:
- true if the token's contents were changed, false if not.
-
readChar
Reads a character from the stream.- Returns:
- the character read, or
END_OF_LEXERif no more characters, orEND_OF_STREAMif end of current stream. - Throws:
IOException- if a character cannot be read.
-
setDelimBreak
protected void setDelimBreak(char delimChar) Sets if we are in a delimiter break.- Parameters:
delimChar- the delimiter character that starts the break.
-
saveChar
protected void saveChar(char c) Saves a character for the next token.- Parameters:
c- the character to save into the current token.
-
getStringStartAndEnd
Gets the end character for a string start character.- Parameters:
c- the start delimiter character.- Returns:
- the corresponding end, or null if no character.
-
getRawStringStartAndEnd
Gets the end character for a multi-line, "raw" string start character.- Parameters:
c- the start delimiter character.- Returns:
- the corresponding end, or null if no character.
-
getCurrentLexeme
Gets the current token lexeme.- Returns:
- the current contents of the token lexeme builder buffer.
-
clearCurrentLexeme
protected void clearCurrentLexeme()Clears the current token lexeme buffer. -
isUnderscore
protected boolean isUnderscore(char c) Convenience method forc == '_'.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isLetter
protected boolean isLetter(char c) Convenience method forCharacter.isLetter(char).- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isDigit
protected boolean isDigit(char c) Convenience method forCharacter.isDigit(char).- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isHexDigit
protected boolean isHexDigit(char c) Returns true if this is a hex digit (0-9, A-F, a-f).- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isWhitespace
protected boolean isWhitespace(char c) Convenience method forCharacter.isWhitespace(char).- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isPoint
protected boolean isPoint(char c) Checks if a character is a decimal point (depends on locale/kernel).- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isExponent
protected boolean isExponent(char c) Checks if char is the exponent character in a number.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isExponentSign
protected boolean isExponentSign(char c) Checks if char is the exponent sign character in a number.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isSpace
protected boolean isSpace(char c) Checks if a char is a space.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isTab
protected boolean isTab(char c) Checks if a char is a tab.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isStringEscape
protected boolean isStringEscape(char c) Checks if this is a character that is a String escape character.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isStringStart
protected boolean isStringStart(char c) Checks if this is a character that starts a String.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
isRawStringStart
protected boolean isRawStringStart(char c) Checks if this is a character that starts a multiline String.- Parameters:
c- the character to test.- Returns:
- true if so, false if not.
-
getStringEnd
protected char getStringEnd(char c) Gets the character that ends a String, using the starting character.- Parameters:
c- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a string.
-
getRawStringEnd
protected char getRawStringEnd(char c) Gets the character that ends a raw String, using the starting character.- Parameters:
c- the starting character.- Returns:
- the corresponding end character, or the null character ('\0') if this does not end a multi-line string.
-
isDelimiterStart
protected boolean isDelimiterStart(char c) Checks if this is a (or the start of a) delimiter character.- Parameters:
c- the character input.- Returns:
- true if so, false if not.
-
isStreamEnd
protected boolean isStreamEnd(char c) Checks if a char equalsEND_OF_STREAM.- Parameters:
c- the character input.- Returns:
- true if so, false if not.
-
isLexerEnd
protected boolean isLexerEnd(char c) Checks if a char equalsEND_OF_LEXER.- Parameters:
c- the character input.- Returns:
- true if so, false if not.
-
isNewline
protected boolean isNewline(char c) Checks if a char equalsNEWLINE.- Parameters:
c- the character input.- Returns:
- true if so, false if not.
-