Class Lexer

All Implemented Interfaces:
TokenSource
Direct Known Subclasses:
LexerInterpreter, XPathLexer

public abstract class Lexer extends Recognizer<Integer,LexerATNSimulator> implements TokenSource
A lexer is recognizer that draws input symbols from a character stream. lexer grammars result in a subclass of this object. A Lexer object uses simplified match() and error recovery mechanisms in the interest of speed.
  • Field Details

    • DEFAULT_MODE

      public static final int DEFAULT_MODE
      See Also:
    • MORE

      public static final int MORE
      See Also:
    • SKIP

      public static final int SKIP
      See Also:
    • DEFAULT_TOKEN_CHANNEL

      public static final int DEFAULT_TOKEN_CHANNEL
      See Also:
    • HIDDEN

      public static final int HIDDEN
      See Also:
    • MIN_CHAR_VALUE

      public static final int MIN_CHAR_VALUE
      See Also:
    • MAX_CHAR_VALUE

      public static final int MAX_CHAR_VALUE
      See Also:
    • _input

      public CharStream _input
    • _tokenFactorySourcePair

      protected Pair<TokenSource,CharStream> _tokenFactorySourcePair
    • _factory

      protected TokenFactory<?> _factory
      How to create token objects
    • _token

      public Token _token
      The goal of all lexer rules/methods is to create a token object. This is an instance variable as multiple rules may collaborate to create a single token. nextToken will return this object after matching lexer rule(s). If you subclass to allow multiple token emissions, then set this to the last token to be matched or something nonnull so that the auto token emit mechanism will not emit another token.
    • _tokenStartCharIndex

      public int _tokenStartCharIndex
      What character index in the stream did the current token start at? Needed, for example, to get the text for current token. Set at the start of nextToken.
    • _tokenStartLine

      public int _tokenStartLine
      The line on which the first character of the token resides
    • _tokenStartCharPositionInLine

      public int _tokenStartCharPositionInLine
      The character position of first character within the line
    • _hitEOF

      public boolean _hitEOF
      Once we see EOF on char stream, next token will be EOF. If you have DONE : EOF ; then you see DONE EOF.
    • _channel

      public int _channel
      The channel number for the current token
    • _type

      public int _type
      The token type for the current token
    • _modeStack

      public final IntegerStack _modeStack
    • _mode

      public int _mode
    • _text

      public String _text
      You can set the text for the current token to override what is in the input char buffer. Use setText() or can set this instance var.
  • Constructor Details

    • Lexer

      public Lexer()
    • Lexer

      public Lexer(CharStream input)
  • Method Details

    • reset

      public void reset()
    • nextToken

      public Token nextToken()
      Return a token from this source; i.e., match a token on the char stream.
      Specified by:
      nextToken in interface TokenSource
    • skip

      public void skip()
      Instruct the lexer to skip creating a token for current lexer rule and look for another token. nextToken() knows to keep looking when a lexer rule finishes with token set to SKIP_TOKEN. Recall that if token==null at end of any token rule, it creates one for you and emits it.
    • more

      public void more()
    • mode

      public void mode(int m)
    • pushMode

      public void pushMode(int m)
    • popMode

      public int popMode()
    • setTokenFactory

      public void setTokenFactory(TokenFactory<?> factory)
      Description copied from interface: TokenSource
      Set the TokenFactory this token source should use for creating Token objects from the input.
      Specified by:
      setTokenFactory in interface TokenSource
      Specified by:
      setTokenFactory in class Recognizer<Integer,LexerATNSimulator>
      Parameters:
      factory - The TokenFactory to use for creating tokens.
    • getTokenFactory

      public TokenFactory<? extends Token> getTokenFactory()
      Description copied from interface: TokenSource
      Gets the TokenFactory this token source is currently using for creating Token objects from the input.
      Specified by:
      getTokenFactory in interface TokenSource
      Specified by:
      getTokenFactory in class Recognizer<Integer,LexerATNSimulator>
      Returns:
      The TokenFactory currently used by this token source.
    • setInputStream

      public void setInputStream(IntStream input)
      Set the char stream and reset the lexer
      Specified by:
      setInputStream in class Recognizer<Integer,LexerATNSimulator>
    • getSourceName

      public String getSourceName()
      Description copied from interface: TokenSource
      Gets the name of the underlying input source. This method returns a non-null, non-empty string. If such a name is not known, this method returns IntStream.UNKNOWN_SOURCE_NAME.
      Specified by:
      getSourceName in interface TokenSource
    • getInputStream

      public CharStream getInputStream()
      Description copied from interface: TokenSource
      Get the CharStream from which this token source is currently providing tokens.
      Specified by:
      getInputStream in interface TokenSource
      Specified by:
      getInputStream in class Recognizer<Integer,LexerATNSimulator>
      Returns:
      The CharStream associated with the current position in the input, or null if no input stream is available for the token source.
    • emit

      public void emit(Token token)
      By default does not support multiple emits per nextToken invocation for efficiency reasons. Subclass and override this method, nextToken, and getToken (to push tokens into a list and pull from that list rather than a single variable as this implementation does).
    • emit

      public Token emit()
      The standard method called to automatically emit a token at the outermost lexical rule. The token object should point into the char buffer start..stop. If there is a text override in 'text', use that to set the token's text. Override this method to emit custom Token objects or provide a new factory.
    • emitEOF

      public Token emitEOF()
    • getLine

      public int getLine()
      Description copied from interface: TokenSource
      Get the line number for the current position in the input stream. The first line in the input is line 1.
      Specified by:
      getLine in interface TokenSource
      Returns:
      The line number for the current position in the input stream, or 0 if the current token source does not track line numbers.
    • getCharPositionInLine

      public int getCharPositionInLine()
      Description copied from interface: TokenSource
      Get the index into the current line for the current position in the input stream. The first character on a line has position 0.
      Specified by:
      getCharPositionInLine in interface TokenSource
      Returns:
      The line number for the current position in the input stream, or -1 if the current token source does not track character positions.
    • setLine

      public void setLine(int line)
    • setCharPositionInLine

      public void setCharPositionInLine(int charPositionInLine)
    • getCharIndex

      public int getCharIndex()
      What is the index of the current character of lookahead?
    • getText

      public String getText()
      Return the text matched so far for the current token or any text override.
    • setText

      public void setText(String text)
      Set the complete text of this token; it wipes any previous changes to the text.
    • getToken

      public Token getToken()
      Override if emitting multiple tokens.
    • setToken

      public void setToken(Token _token)
    • setType

      public void setType(int ttype)
    • getType

      public int getType()
    • setChannel

      public void setChannel(int channel)
    • getChannel

      public int getChannel()
    • getChannelNames

      public String[] getChannelNames()
    • getModeNames

      public String[] getModeNames()
    • getTokenNames

      @Deprecated public String[] getTokenNames()
      Deprecated.
      Used to print out token names like ID during debugging and error reporting. The generated parsers implement a method that overrides this to point to their String[] tokenNames.
      Specified by:
      getTokenNames in class Recognizer<Integer,LexerATNSimulator>
    • getAllTokens

      public List<? extends Token> getAllTokens()
      Return a list of all Token objects in input char stream. Forces load of all tokens. Does not include EOF token.
    • recover

      public void recover(LexerNoViableAltException e)
    • notifyListeners

      public void notifyListeners(LexerNoViableAltException e)
    • getErrorDisplay

      public String getErrorDisplay(String s)
    • getErrorDisplay

      public String getErrorDisplay(int c)
    • getCharErrorDisplay

      public String getCharErrorDisplay(int c)
    • recover

      public void recover(RecognitionException re)
      Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. You can instead use the rule invocation stack to do sophisticated error recovery if you are in a fragment rule.