Package org.htmlparser.tags
This package has implementations of tags that have functionality beyond the
capability of a generic tag. For example, the invalid input: '{@'.html } tag has methods
to get the CONTENT
and
NAME
attributes (although this could be done with generic attribute manipulation)
and an implementation of
doSemanticAction
that alters the lexer's encoding.
The classes in this package have been added in an ad-hoc fashion, with the
most useful ones having existed a long time, while some obvious ones are rather
new. Please feel free to add your own custom tags, and register them with the
PrototypicalNodeFactory
,
and they will be treated like any other in-built tag. In fact tags do not need
to reside in this package.
Custom Tags
Creating custom tags is fairly straight forward. Simply copy one of the simpler tags you find in this package and alter it as follows.
If the tag can contain other nodes, i.e. invalid input: '{@'.html
My Heading
}, then it should derive from (i.e. be a subclass of)CompositeTag
.
In this way it will inherit the
CompositeTagScanner
and nodes between the start and end tag will be gathered into the list of
children. Most of the tags in this package derive from CompositeTag, and that
is why the nodes returned from the Parser are nested.
If it is a simple tag, i.e. invalid input: '{@'.html
}, then it should derive from
TagNode
. See for example
MetaTag
or ImageTag
.
To be registered with PrototypicalNodeFactory.registerTag(org.htmlparser.Tag)
,
and especially if it is a composite tag, the tag needs to implement
getIds
which returns the UPPERCASE list of names for the tag
(usually only one), for example "HTML". If the tag can be smart enough to know
what other tags can't be contained within it, it should also implement
getEnders()
which returns the
list of other tags that should cause this tag to close itself, and
getEndTagEnders()
which
returns the list of end tags (i.e. invalid input: '{@'.html }), other than it's own name, that
should cause this tag to close itself. When these 'ender' lists cause a tag to
end before seeing it's own end tag, a virtual end tag is created and 'inserted'
at the location where the end tag should have been. These end tags can be
distinguished because their starting
and ending
locations are the same
(i.e. they take up no character length in the HTML stream).
For example, the invalid input: '{@'.html
-
ClassesClassDescriptionAppletTag represents an <Applet> tag.BaseHrefTag represents an <Base> tag.A Body Tag.A bullet tag.A bullet list tag.The base class for tags that have an end tag.A definition list tag (dl).A definition list bullet tag (either DD or DT).A div tag.The HTML Document Declaration Tag can identify <!DOCTYPE> tags.Represents a FORM tag.Identifies an frame set tag.Identifies a frame tagA heading (h1 - h6) tag.A head tag.A html tag.Identifies an image tag.An input tag in a form.The JSP/ASP tags like <%...%> can be identified by this class.A label tag.Identifies a link tag.A Meta TagObjectTag represents an <Object> tag.An option tag within a form.A paragraph (p) tag.The XML processing instructions like <?xml ...A script tag.A select tag within a form.A span tag.A StyleTag represents a <style> tag.A table column tag.A table header tag.A table row tag.A table tag.A text area tag within a form.A title tag.