sunlabs.brazil.util
public class LexHTML extends LexML
This class differs slightly from LexML as follows: after certain tags,
like the <script> tag, the body that follows is
uninterpreted data and ends only at the next, in this case,
</script> tag, not at the just the next
"<" or ">" character. This is one way that HTML is not fully
compliant with XML.
The default set of tags that have this special processing is
<script>, <style>, and
<xmp>. The user can change this by retrieving
the Vector of special tags via
getClosingTags, and modifying it as needed.
Version: 2.2
| Constructor Summary | |
|---|---|
| LexHTML(String str)
Creates a new HTML parser, which can be used to iterate over the
tokens in the given string.
| |
| Method Summary | |
|---|---|
| Vector | getClosingTags()
Get the set of HTML tags that have the special body-processing
behavior mentioned above. |
| String | getTag()
Gets the tag name at the begining of the current tag. |
| boolean | nextToken()
Advances to the next token, correctly handling HTML tags that have
the special body-processing behavior mentioned above.
|
| void | replace(String str)
Changes the string that this LexHTML is parsing.
|
Parameters: str The HTML to parse.
Parameters: tags The array of case-insensitive tag names that are only closed by seeing their "slashed" version.
Returns: The lower-cased tag name, or null if the
current token does not have a tag name.
See Also: LexML
This method returns the uninterpreted data making up the body of a
special HTML tag as a token of type LexML.STRING, even
if the body was actually a comment or another tag.
Returns: true if a token was found, false
if there were no more tokens left.
Parameters: str The string that this LexHTML should now parse.