Package net.fauxpark.stringes
Class Lexer<T>
- java.lang.Object
-
- net.fauxpark.stringes.Lexer<T>
-
- Type Parameters:
T
- The identifier type to use in tokens created from the context.
public class Lexer<T> extends Object
Represents a set of rules for creating tokens from a Stringe.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
Lexer.RuleMatchValueGenerator<U>
Generates token identifiers for a rule from either a constant value, or a generator function processing a MatchResult.static class
Lexer.SymbolPriority
Used to manipulate the order in which symbol (non-regex) rules are tested.
-
Constructor Summary
Constructors Constructor Description Lexer()
Constructs a new Lexer.
-
Method Summary
Modifier and Type Method Description void
add(String[] symbols, T id)
Defines a lexer rule that returns a token when any of the specified strings are found.void
add(String[] symbols, T id, boolean ignoreCase)
Defines a lexer rule that returns a token when any of the specified strings are found.void
add(String[] symbols, T id, boolean ignoreCase, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when any of the specified strings are found.void
add(String[] symbols, T id, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when any of the specified strings are found.void
add(String symbol, T id)
Defines a lexer rule that returns a token when the specified string is found.void
add(String symbol, T id, boolean ignoreCase)
Defines a lexer rule that returns a token when the specified string is found.void
add(String symbol, T id, boolean ignoreCase, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when the specified string is found.void
add(String symbol, T id, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when the specified string is found.void
add(Function<StringeReader,Boolean> func, T id)
Defines a lexer rule that returns a token when the specified function returns true.void
add(Function<StringeReader,Boolean> func, T id, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when the specified function returns true.void
add(Pattern regex, Function<MatchResult,T> generator)
Defines a lexer rule that returns a token when the specified regular expression finds a match.void
add(Pattern regex, Function<MatchResult,T> generator, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when the specified regular expression finds a match.void
add(Pattern regex, T id)
Defines a lexer rule that returns a token when the specified regular expression finds a match.void
add(Pattern regex, T id, Lexer.SymbolPriority priority)
Defines a lexer rule that returns a token when the specified regular expression finds a match.void
addEndToken(T id)
Defines a lexer rule that returns a token when the end of the input is reached.void
addEndToken(T id, String symbol)
Defines a lexer rule that returns a token when the end of the input is reached.void
addUndefinedCaptureRule(T id, Function<Stringe,Stringe> func)
Defines a lexer rule that captures unrecognized characters as a token.(package private) TwoTuple<String,T>
getEndToken()
Returns the rule for the end token symbol.(package private) List<ThreeTuple<Function<StringeReader,Boolean>,T,Lexer.SymbolPriority>>
getFunctions()
Returns the list of function rules.(package private) List<ThreeTuple<String,T,Boolean>>
getHighSymbols()
Returns the list of high priority symbol rules.HashSet<T>
getIgnoreRules()
Returns the ignore rules.(package private) List<ThreeTuple<String,T,Boolean>>
getNormalSymbols()
Returns the list of normal priority symbol rules.(package private) List<ThreeTuple<Pattern,Lexer.RuleMatchValueGenerator<T>,Lexer.SymbolPriority>>
getRegexes()
Returns the list of regex rules.String
getSymbolForId(T id)
Returns the symbol that represents the specified identifier.(package private) TwoTuple<Function<Stringe,Stringe>,T>
getUndefinedCaptureRule()
Returns the rule for undefined symbols.(package private) boolean
hasPunctuation(char c)
Determines if the lexer contains the specified punctuation mark.(package private) boolean
hasPunctuation(int c)
Determines if the lexer contains the specified punctuation mark.void
ignore(T... ids)
Adds the specified token identifiers to the ignore list.Iterable<Token<T>>
tokenize(String str)
Tokenizes the input string and enumerates the resulting tokens.Iterable<Token<T>>
tokenize(Stringe stre)
Tokenizes the input Stringe and enumerates the resulting tokens.<U> Iterable<U>
tokenize(Stringe stre, BiFunction<Stringe,T,U> tokenEmitter)
Tokenizes the input Stringe and enumerates the resulting tokens using the specified token emitter.
-
-
-
Method Detail
-
getIgnoreRules
public HashSet<T> getIgnoreRules()
Returns the ignore rules.- Returns:
- A list of token identifiers that should be ignored.
-
ignore
public void ignore(T... ids)
Adds the specified token identifiers to the ignore list.- Parameters:
ids
- The token identifiers to ignore.
-
getSymbolForId
public String getSymbolForId(T id)
Returns the symbol that represents the specified identifier. Returns an empty string if the identifier cannot be found.- Parameters:
id
- The identifier to get the symbol for.- Returns:
- A symbol representing the given token identifier.
-
addEndToken
public void addEndToken(T id) throws UnsupportedOperationException
Defines a lexer rule that returns a token when the end of the input is reached.- Parameters:
id
- The token identifier to associate with the rule.- Throws:
UnsupportedOperationException
- If called after after the context is used to create tokens.
-
addEndToken
public void addEndToken(T id, String symbol) throws IllegalArgumentException, UnsupportedOperationException
Defines a lexer rule that returns a token when the end of the input is reached.- Parameters:
id
- The token identifier to associate with the rule.symbol
- The symbol to assign to the end token.- Throws:
IllegalArgumentException
- If the supplied identifier is null.UnsupportedOperationException
- If called after the context is used to create tokens.
-
addUndefinedCaptureRule
public void addUndefinedCaptureRule(T id, Function<Stringe,Stringe> func) throws UnsupportedOperationException
Defines a lexer rule that captures unrecognized characters as a token.- Parameters:
id
- The token identifier to associate with the rule.func
- A function that processes the captured Stringe.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens.
-
add
public void add(String symbol, T id) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified string is found.- Parameters:
symbol
- The symbol to test for.id
- The token identifier to associate with the symbol.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbol is null or empty.
-
add
public void add(String symbol, T id, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified string is found.- Parameters:
symbol
- The symbol to test for.id
- The token identifier to associate with the symbol.priority
- Whether the symbol should be tested before any regex rules.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbol is null or empty.
-
add
public void add(String[] symbols, T id) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when any of the specified strings are found.- Parameters:
symbols
- The symbols to test for.id
- The token identifier to associate with the symbols.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbols array or any of its members are null or empty.
-
add
public void add(String[] symbols, T id, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when any of the specified strings are found.- Parameters:
symbols
- The symbols to test for.id
- The token identifier to associate with the symbols.priority
- Whether the symbol should be tested before any regex rules.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbols array or any of its members are null or empty.
-
add
public void add(String symbol, T id, boolean ignoreCase) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified string is found.- Parameters:
symbol
- The symbol to test for.id
- The token identifier to associate with the symbol.ignoreCase
- Whether the rule should ignore capitalization.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbol is null or empty.
-
add
public void add(String symbol, T id, boolean ignoreCase, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified string is found.- Parameters:
symbol
- The symbol to test for.id
- The token identifier to associate with the symbol.ignoreCase
- Whether the rule should ignore capitalization.priority
- Whether the symbol should be tested before any regex rules.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbol is null or empty.
-
add
public void add(String[] symbols, T id, boolean ignoreCase) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when any of the specified strings are found.- Parameters:
symbols
- The symbols to test for.id
- The token identifier to associate with the symbols.ignoreCase
- Whether the rule should ignore capitalization.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbols array or any of its members are null or empty.
-
add
public void add(String[] symbols, T id, boolean ignoreCase, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when any of the specified strings are found.- Parameters:
symbols
- The symbols to test for.id
- The token identifier to associate with the symbols.ignoreCase
- Whether the rule should ignore capitalization.priority
- Whether the symbol should be tested before any regex rules.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the symbol already exists.IllegalArgumentException
- If the symbols array or any of its members are null or empty.
-
add
public void add(Pattern regex, T id) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified regular expression finds a match.- Parameters:
regex
- The regex to test for.id
- The token identifier to associate with the regex.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the same pattern already exists.IllegalArgumentException
- If the regex is null.
-
add
public void add(Pattern regex, T id, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified regular expression finds a match.- Parameters:
regex
- The regex to test for.id
- The token identifier to associate with the regex.priority
- The priority of the token. Higher values are checked first.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the same pattern already exists.IllegalArgumentException
- If the regex is null.
-
add
public void add(Pattern regex, Function<MatchResult,T> generator) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified regular expression finds a match.- Parameters:
regex
- The regex to test for.generator
- A function that generates a token identifier from the match.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the same pattern already exists.IllegalArgumentException
- If either the regex or generator are null.
-
add
public void add(Pattern regex, Function<MatchResult,T> generator, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified regular expression finds a match.- Parameters:
regex
- The regex to test for.generator
- A function that generates a token identifier from the match.priority
- The priority of the rule. Higher values are checked first.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens, or a rule with the same pattern already exists.IllegalArgumentException
- If either the regex or generator are null.
-
add
public void add(Function<StringeReader,Boolean> func, T id) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified function returns true.- Parameters:
func
- The function to read the token with.id
- The token identifier to associate with the function.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens.IllegalArgumentException
- If the function is null.
-
add
public void add(Function<StringeReader,Boolean> func, T id, Lexer.SymbolPriority priority) throws UnsupportedOperationException, IllegalArgumentException
Defines a lexer rule that returns a token when the specified function returns true.- Parameters:
func
- The function to read the token with.id
- The token identifier to associate with the function.priority
- The priority of the rule. Higher values are checked first.- Throws:
UnsupportedOperationException
- If called after the context is used to create tokens.IllegalArgumentException
- If the function is null.
-
tokenize
public Iterable<Token<T>> tokenize(String str)
Tokenizes the input string and enumerates the resulting tokens.- Parameters:
str
- The string to tokenize.- Returns:
- An
Iterable
containing the tokens.
-
tokenize
public Iterable<Token<T>> tokenize(Stringe stre)
Tokenizes the input Stringe and enumerates the resulting tokens.- Parameters:
stre
- The Stringe to tokenize.- Returns:
- An
Iterable
containing the tokens.
-
tokenize
public <U> Iterable<U> tokenize(Stringe stre, BiFunction<Stringe,T,U> tokenEmitter) throws IllegalArgumentException
Tokenizes the input Stringe and enumerates the resulting tokens using the specified token emitter.- Type Parameters:
U
- The type of token to be created.- Parameters:
stre
- The Stringe to tokenize.tokenEmitter
- The function that will create the tokens.- Returns:
- An
Iterable
containing the tokens. - Throws:
IllegalArgumentException
- If the token emitter is null.
-
hasPunctuation
boolean hasPunctuation(int c)
Determines if the lexer contains the specified punctuation mark.- Parameters:
c
- The character to look for.- Returns:
- True if the lexer's list of punctuation characters contains the given character.
-
hasPunctuation
boolean hasPunctuation(char c)
Determines if the lexer contains the specified punctuation mark.- Parameters:
c
- The character to look for.- Returns:
- True if the lexer's list of punctuation characters contains the given character.
-
getUndefinedCaptureRule
TwoTuple<Function<Stringe,Stringe>,T> getUndefinedCaptureRule()
Returns the rule for undefined symbols.- Returns:
- A
TwoTuple
representing any undefined symbols.
-
getEndToken
TwoTuple<String,T> getEndToken()
Returns the rule for the end token symbol.- Returns:
- A
TwoTuple
representing the end token.
-
getNormalSymbols
List<ThreeTuple<String,T,Boolean>> getNormalSymbols()
Returns the list of normal priority symbol rules.- Returns:
- A list of rules for normal priority symbols.
-
getHighSymbols
List<ThreeTuple<String,T,Boolean>> getHighSymbols()
Returns the list of high priority symbol rules.- Returns:
- A list of rules for high priority symbols.
-
getRegexes
List<ThreeTuple<Pattern,Lexer.RuleMatchValueGenerator<T>,Lexer.SymbolPriority>> getRegexes()
Returns the list of regex rules.- Returns:
- A list of regular expression rules.
-
getFunctions
List<ThreeTuple<Function<StringeReader,Boolean>,T,Lexer.SymbolPriority>> getFunctions()
Returns the list of function rules.- Returns:
- A list of function rules.
-
-