dk.brics.automaton
public class RegExp extends Object
Automaton.
Regular expressions are built from the following abstract syntax:
| regexp | ::= | unionexp | ||
| | | ||||
| unionexp | ::= | interexp | unionexp | (union) | |
| | | interexp | |||
| interexp | ::= | concatexp & interexp | (intersection) | [OPTIONAL] |
| | | concatexp | |||
| concatexp | ::= | repeatexp concatexp | (concatenation) | |
| | | repeatexp | |||
| repeatexp | ::= | repeatexp ? | (zero or one occurrence) | |
| | | repeatexp * | (zero or more occurrences) | ||
| | | repeatexp + | (one or more occurrences) | ||
| | | repeatexp {n} | (n occurrences) | ||
| | | repeatexp {n,} | (n or more occurrences) | ||
| | | repeatexp {n,m} | (n to m occurrences, including both) | ||
| | | complexp | |||
| complexp | ::= | ~ complexp | (complement) | [OPTIONAL] |
| | | charclassexp | |||
| charclassexp | ::= | [ charclasses ] | (character class) | |
| | | [^ charclasses ] | (negated character class) | ||
| | | simpleexp | |||
| charclasses | ::= | charclass charclasses | ||
| | | charclass | |||
| charclass | ::= | charexp - charexp | (character range, including end-points) | |
| | | charexp | |||
| simpleexp | ::= | charexp | ||
| | | . | (any single character) | ||
| | | # | (the empty language) | [OPTIONAL] | |
| | | @ | (any string) | [OPTIONAL] | |
| | | " <Unicode string without double-quotes> " | (a string) | ||
| | | ( ) | (the empty string) | ||
| | | ( unionexp ) | (precedence override) | ||
| | | < <identifier> > | (named automaton) | [OPTIONAL] | |
| | | <n-m> | (numerical interval) | [OPTIONAL] | |
| charexp | ::= | <Unicode character> | (a single non-reserved character) | |
| | | \ <Unicode character> | (a single character) |
The productions marked [OPTIONAL] are only allowed
if specified by the syntax flags passed to the RegExp
constructor. The reserved characters used in the (enabled) syntax
must be escaped with backslash (\) or double-quotes
("..."). (In contrast to other regexp syntaxes,
this is required also in character classes.) Be aware that
dash (-) has a special meaning in charclass expressions.
An identifier is a string not containing right angle bracket
(>) or dash (-). Numerical intervals are
specified by non-negative decimal integers and include both end
points, and if n and m have the
same number of digits, then the conforming strings must have that
length (i.e. prefixed by 0's).
| Field Summary | |
|---|---|
| static int | ALL
Syntax flag, enables all optional regexp syntax. |
| static int | ANYSTRING
Syntax flag, enables anystring (@). |
| static int | AUTOMATON
Syntax flag, enables named automata (<identifier>). |
| static int | COMPLEMENT
Syntax flag, enables complement (~). |
| static int | EMPTY
Syntax flag, enables empty language (#). |
| static int | INTERSECTION
Syntax flag, enables intersection (&). |
| static int | INTERVAL
Syntax flag, enables numerical intervals (<n-m>). |
| static int | NONE
Syntax flag, enables no optional regexp syntax. |
| Constructor Summary | |
|---|---|
| RegExp(String s)
Constructs new RegExp from a string.
| |
| RegExp(String s, int syntax_flags)
Constructs new RegExp from a string. | |
| Method Summary | |
|---|---|
| Set<String> | getIdentifiers()
Returns set of automaton identifiers that occur in this regular expression. |
| boolean | setAllowMutate(boolean flag)
Sets or resets allow mutate flag.
|
| Automaton | toAutomaton()
Constructs new Automaton from this RegExp.
|
| Automaton | toAutomaton(AutomatonProvider automaton_provider)
Constructs new Automaton from this RegExp.
|
| Automaton | toAutomaton(Map<String,Automaton> automata)
Constructs new Automaton from this RegExp.
|
RegExp from a string.
Same as RegExp(s, ALL).Parameters: s regexp string
Throws: IllegalArgumentException if an error occured while parsing the regular expression
RegExp from a string.Parameters: s regexp string syntax_flags boolean 'or' of optional syntax constructs to be enabled
Throws: IllegalArgumentException if an error occured while parsing the regular expression
Parameters: flag if true, the flag is set
Returns: previous value of the flag
Automaton from this RegExp.
Same as toAutomaton(null) (empty automaton map).Automaton from this RegExp.
The constructed automaton is minimal and deterministic and has no
transitions to dead states.Parameters: automaton_provider provider of automata for named identifiers
Throws: IllegalArgumentException if this regular expression uses a named identifier that is not available from the automaton provider
Automaton from this RegExp.
The constructed automaton is minimal and deterministic and has no
transitions to dead states.Parameters: automata a map from automaton identifiers to automata
(of type Automaton).
Throws: IllegalArgumentException if this regular expression uses a named identifier that does not occur in the automaton map