Package org.htmlparser.util
Class ParserUtils
java.lang.Object
org.htmlparser.util.ParserUtils
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic Parser
Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).static Node[]
findTypeInNode
(Node node, Class type) Search given node and pick up any objects of given type.static String
removeChars
(String s, char occur) static String
removeEscapeCharacters
(String inputString) static String
removeTrailingBlanks
(String text) static String[]
splitButChars
(String input, String charsDoNotBeRemoved) Split the input string considering as string separator all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String[]
splitButDigits
(String input, String charsDoNotBeRemoved) Split the input string considering as string separator all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String[]
splitChars
(String input, String charsToBeRemoved) Split the input string considering as string separator the chars specified in the input variable charsToBeRemoved.static String[]
splitSpaces
(String input, String charsToBeRemoved) Split the input string considering as string separator all the spaces and tabs like chars and the chars specified in the input variable charsToBeRemoved.static String[]
Split the input string in a string array, considering the tags as delimiter for splitting.static String[]
Split the input string in a string array, considering the tags as delimiter for splitting.static String[]
Split the input string in a string array, considering the tags as delimiter for splitting.static String[]
Split the input string in a string array, considering the tags as delimiter for splitting.static String[]
splitTags
(String input, NodeFilter filter) Split the input string in a string array, considering the tags as delimiter for splitting.static String[]
splitTags
(String input, NodeFilter filter, boolean recursive, boolean insideTag) Split the input string in a string array, considering the tags as delimiter for splitting.static String
trimAllTags
(String input, boolean inside) Trim the input string, removing all the tags in the input string.static String
trimButChars
(String input, String charsDoNotBeRemoved) Remove from the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String
trimButCharsBeginEnd
(String input, String charsDoNotBeRemoved) Remove from the beginning and the end of the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String
trimButDigits
(String input, String charsDoNotBeRemoved) Remove from the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String
trimButDigitsBeginEnd
(String input, String charsDoNotBeRemoved) Remove from the beginning and the end of the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.static String
Remove from the input string all the chars specified in the input variable charsToBeRemoved.static String
trimCharsBeginEnd
(String input, String charsToBeRemoved) Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.static String
trimSpaces
(String input, String charsToBeRemoved) Remove from the input string all the spaces and tabs like chars.static String
trimSpacesBeginEnd
(String input, String charsToBeRemoved) Remove from the beginning and the end of the input string all the spaces and tabs like chars.static String
Trim all tags in the input string and return a string like the input one without the tags and their content.static String
Trim all tags in the input string and return a string like the input one without the tags and their content (optional).static String
Trim all tags in the input string and return a string like the input one without the tags and their content.static String
Trim all tags in the input string and return a string like the input one without the tags and their content (optional).static String
trimTags
(String input, NodeFilter filter) Trim all tags in the input string and return a string like the input one without the tags and their content.static String
trimTags
(String input, NodeFilter filter, boolean recursive, boolean insideTag) Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
-
Constructor Details
-
ParserUtils
public ParserUtils()
-
-
Method Details
-
removeChars
-
removeEscapeCharacters
-
removeTrailingBlanks
-
findTypeInNode
Search given node and pick up any objects of given type.- Parameters:
node
- The node to search.type
- The class to search for.- Returns:
- A node array with the matching nodes.
-
splitButDigits
Split the input string considering as string separator all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."),
you obtain an array of strings {"+12.5", "+3.4"} as output (1,2,3,4 and 5 are digits and +,. are chars that do not be removed).- Parameters:
input
- The string in input.charsDoNotBeRemoved
- The chars that do not be removed.- Returns:
- The array of strings as output.
-
trimButDigits
Remove from the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigits("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).- Parameters:
input
- The string in input.charsDoNotBeRemoved
- The chars that do not be removed.- Returns:
- The string as output.
-
trimButDigitsBeginEnd
Remove from the beginning and the end of the input string all the not numerical characters with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButDigitsBeginEnd("<DIV> +12.5 </DIV>", "+."),
you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed).
For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."),
you obtain a string "+1 2 . 5" as output (the spacess inside the string are not removed).- Parameters:
input
- - The string in input.charsDoNotBeRemoved
- - The chars that do not be removed.- Returns:
- The string as output.
-
splitSpaces
Split the input string considering as string separator all the spaces and tabs like chars and the chars specified in the input variable charsToBeRemoved.
For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), <BR>you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The array of strings as output.
-
trimSpaces
Remove from the input string all the spaces and tabs like chars. Remove also the chars specified in the input variable charsToBeRemoved.
For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"),
you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The string as output.
-
trimSpacesBeginEnd
Remove from the beginning and the end of the input string all the spaces and tabs like chars. Remove also the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"),
you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The string as output.
-
splitButChars
Split the input string considering as string separator all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call splitButChars("<DIV> +12.5, +3.4 </DIV>", "+.1234567890"),
you obtain an array of strings {"+12.5", "+3.4"} as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).- Parameters:
input
- The string in input.charsDoNotBeRemoved
- The chars that do not be removed.- Returns:
- The array of strings as output.
-
trimButChars
Remove from the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
For example if you call trimButChars("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButChars("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (the spaces between 1 and 2, 2 and ., . and 5 are removed).- Parameters:
input
- The string in input.charsDoNotBeRemoved
- The chars that do not be removed.- Returns:
- The string as output.
-
trimButCharsBeginEnd
Remove from the beginning and the end of the input string all the characters with the only exception of the characters specified in charsDoNotBeRemoved param.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimButCharsBeginEnd("<DIV> +12.5 </DIV>", "+.1234567890"),
you obtain a string "+12.5" as output (+,.,1,2,3,4,5,6,7,8,9,0 are chars that do not be removed).
For example if you call trimButCharsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+.1234567890"),
you obtain a string "+1 2 . 5" as output (the spaces inside the string are not removed).- Parameters:
input
- The string in input.charsDoNotBeRemoved
- The chars that do not be removed.- Returns:
- The string as output.
-
splitChars
Split the input string considering as string separator the chars specified in the input variable charsToBeRemoved.
For example if you call splitChars("<DIV> +12.5, +3.4 </DIV>", " invalid input: '<'>DIV/,"),
you obtain an array of strings {"+12.5", "+3.4"} as output (space chars and <,>,D,I,V,/ and the comma are chars that must be removed).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The array of strings as output.
-
trimChars
Remove from the input string all the chars specified in the input variable charsToBeRemoved.
For example if you call trimChars("<DIV> +12.5 </DIV>", "invalid input: '<'>DIV/ "),
you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed).
For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "invalid input: '<'>DIV/ "),
you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The string as output.
-
trimCharsBeginEnd
Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved.
The removal process removes only chars at the beginning and at the end of the string.
For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "invalid input: '<'>DIV/ "),
you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed).
For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "invalid input: '<'>DIV/ "),
you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved).- Parameters:
input
- The string in input.charsToBeRemoved
- The chars to be removed.- Returns:
- The string as output.
-
splitTags
public static String[] splitTags(String input, String[] tags) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
splitTags
public static String[] splitTags(String input, String[] tags, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string array {"Begin ", "<DIV> +12.5 </DIV>", " ALL OK"} as output (splitted <DIV> tags and not their content and no recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string array {"Begin ", " +12.5 ", " ALL OK"} as output (splitted <DIV> tags and not their content recursively).
For example if you call splitTags("Begin <DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string array {"Begin ", " ALL OK"} as output (splitted <DIV> tags and their content).- Parameters:
input
- The string in input.tags
- The tags to be used as splitting delimiter.recursive
- Optional parameter (true if not present), if true delete all the tags recursively.insideTag
- Optional parameter (true if not present), if true delete also the content of the tags.- Returns:
- The string array containing the strings delimited by tags.
- Throws:
ParserException
UnsupportedEncodingException
-
splitTags
public static String[] splitTags(String input, Class nodeType) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.
Use Class class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
splitTags
public static String[] splitTags(String input, Class nodeType, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.
Use Class class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
splitTags
public static String[] splitTags(String input, NodeFilter filter) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.
Use NodeFilter class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
splitTags
public static String[] splitTags(String input, NodeFilter filter, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Split the input string in a string array, considering the tags as delimiter for splitting.
Use NodeFilter class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
trimAllTags
Trim the input string, removing all the tags in the input string.
The method trims all the substrings included in the input string of the following type: "<XXX>", where XXX could be a string of any type.
If you set to true the inside parameter, the method deletes also the YYY string in the following input string: "<XXX>YYY<ZZZ>", note that ZZZ is not necessary the closing tag of XXX.- Parameters:
input
- The string in input.inside
- If true, it forces the method to delete also what is inside the tags.- Returns:
- The string without tags.
-
trimTags
public static String trimTags(String input, String[] tags) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
trimTags
public static String trimTags(String input, String[] tags, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, false),
you obtain a string "<DIV> +12.5 </DIV> ALL OK" as output (trimmed <DIV> tags and not their content and no recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, true, false),
you obtain a string " +12.5 ALL OK" as output (trimmed <DIV> tags and not their content recursively).
For example if you call trimTags("<DIV><DIV> +12.5 </DIV></DIV> ALL OK", new String[] {"DIV"}, false, true),
you obtain a string " ALL OK" as output (trimmed <DIV> tags and their content).- Parameters:
input
- The string in input.tags
- The tags to be removed.recursive
- Optional parameter (true if not present), if true delete all the tags recursively.insideTag
- Optional parameter (true if not present), if true delete also the content of the tags.- Returns:
- The string without tags.
- Throws:
ParserException
UnsupportedEncodingException
-
trimTags
public static String trimTags(String input, Class nodeType) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content.
Use Class class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
trimTags
public static String trimTags(String input, Class nodeType, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
Use Class class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
trimTags
public static String trimTags(String input, NodeFilter filter) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content.
Use NodeFilter class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
trimTags
public static String trimTags(String input, NodeFilter filter, boolean recursive, boolean insideTag) throws ParserException, UnsupportedEncodingException Trim all tags in the input string and return a string like the input one without the tags and their content (optional).
Use NodeFilter class as input parameter instead of tags[] string array.- Throws:
ParserException
UnsupportedEncodingException
- See Also:
-
createParserParsingAnInputString
public static Parser createParserParsingAnInputString(String input) throws ParserException, UnsupportedEncodingException Create a Parser Object having a String Object as input (instead of a url or a string representing the url location).
The string will be parsed as it would be a file.- Parameters:
input
- The string in input.- Returns:
- The Parser Object with the string as input stream.
- Throws:
ParserException
UnsupportedEncodingException
-