public abstract class StatefulTokenizer extends Object
putRules(String, Rule[])
}, e.g. for string processing. Each rule
in a set produces one token with a type name and it can switch to another
state or switch back to the previous state with the special state
"#pop"
. The is a list of tokens with arbitrary type.
The list of produced tokens can be filtered: tokens of same type can be
joined by adding the type with addJoinedType(Object)
and tokens can
be omitted from the result for easier post-processing by adding with
addIgnoredType(Object)
.Modifier and Type | Class and Description |
---|---|
protected static class |
StatefulTokenizer.Rule
A regular expression based rule for building a parsing grammar.
|
static class |
StatefulTokenizer.Token
A token that designates a certain section of a text input.
|
Modifier and Type | Field and Description |
---|---|
protected static String |
INITIAL_STATE
The name of the initial state.
|
Modifier | Constructor and Description |
---|---|
protected |
StatefulTokenizer()
Initializes the internal data structures of a new instance.
|
Modifier and Type | Method and Description |
---|---|
protected void |
addIgnoredType(Object tokenType)
Adds a token type to the set of tokens that should be ignored in the
tokenizer output.
|
protected void |
addJoinedType(Object tokenType)
Adds a token type to the set of tokens that should get joined in the
tokenizer output.
|
protected void |
putRules(StatefulTokenizer.Rule... rules)
Sets the rules for the initial state in the grammar.
|
protected void |
putRules(String name,
StatefulTokenizer.Rule... rules)
Sets the rules for the specified state in the grammar.
|
List<StatefulTokenizer.Token> |
tokenize(String data)
Analyzes the specified input string using different sets of rules and
returns a list of token objects describing the content structure.
|
protected static final String INITIAL_STATE
protected StatefulTokenizer()
protected void addJoinedType(Object tokenType)
tokenType
- Type of the tokens that should be joined.protected void addIgnoredType(Object tokenType)
tokenType
- Type of the tokens that should be ignored.protected void putRules(StatefulTokenizer.Rule... rules)
rules
- A sequence or an array with rules to be added.protected void putRules(String name, StatefulTokenizer.Rule... rules)
name
- A unique name to identify the rule set.rules
- A sequence or an array with rules to be added.public List<StatefulTokenizer.Token> tokenize(String data)
data
- Input string.Copyright © 2009-2013. All Rights Reserved.