public interface HtmlParser extends Parser
Note: These are the exact methods exposed in the original C++ Parser. The names are simply modified to conform to Java.
| Modifier and Type | Interface and Description |
|---|---|
static class |
HtmlParser.ATTR_TYPE
Indicates the type of HTML attribute that the parser is currently in or
NONE if the parser is not currently in an attribute. |
static class |
HtmlParser.Mode
The Parser Mode requested for parsing a given template.
|
| Modifier and Type | Field and Description |
|---|---|
static ExternalState |
STATE_ATTR |
static ExternalState |
STATE_COMMENT |
static ExternalState |
STATE_CSS_FILE |
static ExternalState |
STATE_JS_FILE |
static ExternalState |
STATE_TAG |
static ExternalState |
STATE_TEXT
All the states in which the parser can be.
|
static ExternalState |
STATE_VALUE |
STATE_ERROR| Modifier and Type | Method and Description |
|---|---|
String |
getAttribute()
Returns the name of the HTML attribute the parser is currently processing.
|
HtmlParser.ATTR_TYPE |
getAttributeType()
Returns the type of the attribute that the parser is in
or
ATTR_TYPE.NONE if we are not parsing an attribute. |
ExternalState |
getJavascriptState()
Returns the state the Javascript parser is in.
|
String |
getTag()
Returns the name of the HTML tag if the parser is currently within one.
|
String |
getValue()
Returns the value of an HTML attribute if the parser is currently
within one.
|
int |
getValueIndex()
Returns the current position of the parser within the HTML attribute
value, zero being the position of the first character in the value.
|
boolean |
inAttribute()
Returns
true if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value. |
boolean |
inCss()
Returns
true if and only if the parser is currently within
a CSS context. |
boolean |
inJavascript()
Returns
true if the parser is currently processing Javascript. |
void |
insertText()
A specialized directive to tell the parser there is some content
that will be inserted here but that it will not get to parse.
|
boolean |
isAttributeQuoted()
Returns
true if and only if the parser is currently within
an attribute value and that attribute value is quoted. |
boolean |
isJavascriptQuoted()
Returns
true if the parser is currently processing
a Javascript litteral that is quoted. |
boolean |
isUrlStart()
Returns
true if and only if the current position of the parser is
at the start of a URL HTML attribute value. |
void |
resetMode(HtmlParser.Mode mode)
Resets the state of the parser, allowing for reuse of the
HtmlParser object. |
getColumnNumber, getLineNumber, getState, parse, parse, reset, setColumnNumber, setLineNumberstatic final ExternalState STATE_TEXT
STATE_TEXT the parser is in HTML proper.
STATE_TAG the parser is inside an HTML tag name.
STATE_COMMENT the parser is inside an HTML comment.
STATE_ATTR the parser is inside an HTML attribute name.
STATE_VALUE the parser is inside an HTML attribute value.
STATE_JS_FILE the parser is inside javascript code.
STATE_CSS_FILE the parser is inside CSS code.
All these states map exactly to those exposed in the C++ (original) version of the HtmlParser.
static final ExternalState STATE_TAG
static final ExternalState STATE_COMMENT
static final ExternalState STATE_ATTR
static final ExternalState STATE_VALUE
static final ExternalState STATE_JS_FILE
static final ExternalState STATE_CSS_FILE
boolean inJavascript()
true if the parser is currently processing Javascript.
Such is the case if and only if, the parser is processing an attribute
that takes Javascript, a Javascript script block or the parser
is (re)set with HtmlParser.Mode.JS.true if the parser is processing Javascript,
false otherwiseboolean isJavascriptQuoted()
true if the parser is currently processing
a Javascript litteral that is quoted. The caller will typically
invoke this method after determining that the parser is processing
Javascript. Knowing whether the element is quoted or not helps
determine which escaping to apply to it when needed.true if and only if the parser is inside a quoted
Javascript literalboolean inAttribute()
true if and only if the parser is currently within
an attribute, be it within the attribute name or the attribute value.true if and only if inside an attributeboolean inCss()
true if and only if the parser is currently within
a CSS context. A CSS context is one of the below:
true if and only if the parser is inside CSSHtmlParser.ATTR_TYPE getAttributeType()
ATTR_TYPE.NONE if we are not parsing an attribute.
The caller will typically invoke this method after determining
that the parser is processing an attribute.
This is useful to determine which escaping to apply based on the type of value this attribute expects.
HtmlParser.ATTR_TYPEboolean isAttributeQuoted()
true if and only if the parser is currently within
an attribute value and that attribute value is quoted.true if and only if the attribute value is quotedString getTag()
String if the parser is not
in a tag as determined by getCurrentExternalState.String if we are
not within an HTML tagString getAttribute()
String if the parser is not
in an attribute as determined by getCurrentExternalState.String
if we are not within an HTML attributeString getValue()
getCurrentExternalState.String if the parser is not
in an HTML attribute valueint getValueIndex()
Parser.getState().boolean isUrlStart()
true if and only if the current position of the parser is
at the start of a URL HTML attribute value. This is the case when the
following three conditions are all met:
getAttributeType() returning .ATTR_TYPE#URI.
This method may be used by an Html Sanitizer or an Auto-Escape system
to determine whether to validate the URL for well-formedness and validate
the scheme of the URL (e.g. HTTP, HTTPS) is safe.
In particular, it is recommended to use this method instead of
checking that getValueIndex() is 0 to support attribute
types where the URL does not start at index zero, such as the
content attribute of the meta HTML tag.
true if and only if the parser is at the start of the URLvoid resetMode(HtmlParser.Mode mode)
HtmlParser object.
See the HtmlParser.Mode enum for information on all
the valid modes.
mode - is an enum representing the high-level state of the parservoid insertText()
throws ParseException
Returns false if and only if the parser encountered
a fatal error which prevents it from continuing further parsing.
Note: The return value is different from the C++ Parser which
always returns true but in my opinion makes more sense.
ParseException - if an unrecoverable error occurred during parsingExternalState getJavascriptState()
See JavascriptParser for more information on the valid
external states. The caller will typically first determine that the
parser is processing Javascript and then invoke this method to
obtain more fine-grained state information.
Copyright © 2010-2012 Google. All Rights Reserved.