|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javatools.parsers.Char
public class Char
This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools).
It is licensed under the Creative Commons Attribution License
(see http://creativecommons.org/licenses/by/3.0) by
the YAGO-NAGA team (see http://mpii.de/yago-naga).
This class provides static methods to decode, encode and normalize Strings.
Decoding converts the following codes to Java 16-bit characters (char):
Encoding is the inverse operation. It takes a Java 16-bit character (char) and outputs its encoding in HTML, as a backslash code, as a percentage code or in UTF8.
Normalization converts the following Unicode characters (Java 16-bit chars) to ASCII-characters in the range 0x20-0x7F:
Decoding is done by methods that "eat" a code from the string.
They require as an additional parameter an integer array of length 1,
in which they store the length of the code that they chopped off.
Example:
int[] eatLength=new int[1]; char c=eatPercentage("%2Cblah blah",eatLength); --> c=',' eatLength[0]=3 // the code was 3 characters longThere is a static integer array Char.eatLength, which you can use for this purpose. The methods store 0 in case the String does not start with the correct code. They store -1 in case the String starts with a corrupted code. Of course, you can use the eat... methods also to decode one single code. There are methods decode... that decode the percentage code, the UTF8-codes, the backslash codes or the Ampersand codes, respectively. The method decode(String) decodes all codes of a String.
decode("This String contains some codes: & %2C \ u0041"); --> "This String contains some codes: & , A"
Normalization is done by the method normalize(int c). It converts a Unicode
character (a 16-bit Java character char)
to a sequence of normal characters (i.e. characters in the range 0x20-0x7F).
The transliteration may consist of multiple chars (e.g. for umlauts) and also of no
chars at all (e.g. for Unicode Zero-Space-Characters).
Example:
normalize('ä'); --> "ae"The method normalize(String) normalizes all characters in a String.
normalize("This String contains the umlauts �, � and �"); --> "This String contains the umlauts Ae, Oe and Ue"If the method cannot find a normalization, it calls defaultNormalizer.apply(char c). Decoding and normalizing can be combined by the method decodeAndNormalize(String s).
Encoding is done by methods called encode...(char). These methods take a character
and transform it to a UTF8 code, a percentage code, an ampersand code or a backslash code,
respectively. If the character is normal (i.e. in the range 0x20-0x7F), they simply return the input
character without any change.
Example:
encodePercentage('�'); --> "%C4"There are also methods that work on entire Strings
encodePercentage("This String contains the umlauts �, � and �"); --> "This String contains the umlauts %C4, %D6 and %DC;"
Last, this class provides the character categorization for URIs, as given in
http://tools.ietf.org/html/rfc3986 . It also provides a method to encode only those
characters that are not valid path component characters
Example:
isReserved(';'); --> true encodeURIPathComponent("a: b") --> "a:%20b"
Nested Class Summary | |
---|---|
static interface |
Char.Char2StringFn
Defines just one function from an int to a String |
static interface |
Char.Legal
Used for encoding selected characters |
Field Summary | |
---|---|
static java.util.Map<java.lang.String,java.lang.Character> |
ampersandMap
Maps HTML ampersand sequences to strings |
static java.util.Map<java.lang.Character,java.lang.String> |
charToAmpersand
Maps a special character to a HTML ampersand sequence |
static java.util.Map<java.lang.Character,java.lang.String> |
charToBackslash
Maps a special character to a backslash sequence |
static Char.Char2StringFn |
defaultNormalizer
Called by normalize(int) in case the character cannot be normalized. |
static java.util.Map<java.lang.Character,java.lang.String> |
normalizeMap
Maps characters to normalizations |
static java.lang.String |
UNKNOWN
String returned by the default implementation of defaultNormalizer, "[?]" |
Constructor Summary | |
---|---|
Char()
|
Method Summary | |
---|---|
static java.lang.String |
capitalize(java.lang.String s)
Capitalizes words and lowercases the rest |
static java.lang.String |
cutLast(java.lang.String s)
Returns the String without the last character |
static java.lang.StringBuilder |
cutLast(java.lang.StringBuilder s)
Cuts the last character |
static java.lang.String |
decode(java.lang.String s)
Replaces all codes in a String by the 16 bit Unicode characters |
static java.lang.String |
decodeAmpersand_UNKNOWN(java.lang.String s)
Fabian: This method cannot decode numeric hexadecimal ampersand codes. |
static java.lang.String |
decodeAmpersand(java.lang.String s)
Decodes all ampersand sequences in the string |
static java.lang.String |
decodeAmpersand(java.lang.String s,
PositionTracker posTracker)
|
static java.lang.String |
decodeAndNormalize(java.lang.String s)
Decodes all codes in a String and normalizes all chars |
static java.lang.String |
decodeBackslash(java.lang.String s)
Decodes all backslash characters in the string |
static java.lang.String |
decodePercentage(java.lang.String s)
Decodes all percentage characters in the string |
static java.lang.String |
decodeURIPathComponent(java.lang.String s)
Decodes a URI path component |
static java.lang.String |
decodeUTF8(java.lang.String s)
Decodes all UTF8 characters in the string |
static char |
eatAmpersand(java.lang.String a,
int[] n)
Eats an HTML ampersand code from a String |
static char |
eatBackslash(java.lang.String a,
int[] n)
Eats a backslash sequence from a String |
static char |
eatPercentage(java.lang.String a,
int[] n)
Eats a String of the form "%xx" from a string, where xx is a hexadecimal code. |
static char |
eatUtf8(java.lang.String a,
int[] n)
Eats a UTF8 code from a String. |
static java.lang.String |
encodeAmpersand(char c)
Encodes a character to an HTML-Ampersand code (if necessary) |
static java.lang.String |
encodeAmpersand(java.lang.String c)
Replaces non-normal characters in a String by HTML Ampersand codes |
static java.lang.String |
encodeAmpersandToAlphanumeric(char c)
Encodes a character to an HTML-Ampersand code (if necessary) |
static java.lang.String |
encodeAmpersandToAlphanumeric(java.lang.String c)
Replaces non-normal characters in a String by HTML Ampersand codes |
static java.lang.String |
encodeBackslash(char c)
Encodes a character to a backslash code (if necessary) |
static java.lang.String |
encodeBackslash(java.lang.CharSequence s,
Char.Legal legal)
Encodes with backslash all illegal characters |
static java.lang.String |
encodeBackslash(java.lang.String c)
Replaces non-normal characters in a String by Backslash codes |
static java.lang.String |
encodeBackslashToAlphanumeric(char c)
Encodes a character to a backslash code (if not alphanumeric) |
static java.lang.String |
encodeBackslashToAlphanumeric(java.lang.String c)
Replaces non-normal characters in a String by Backslash codes (if not alphanumeric) |
static java.lang.String |
encodeBackslashToASCII(char c)
Encodes a character to a backslash code (if not ASCII) |
static java.lang.String |
encodeBackslashToASCII(java.lang.String c)
Replaces non-normal characters in a String by Backslash codes (if not ASCII) |
static java.lang.String |
encodeHex(java.lang.String s)
Replaces special characters in the string by hex codes (cannot be undone) |
static java.lang.String |
encodePercentage(char c)
Encodes a character to an Percentage code (if necessary). |
static java.lang.String |
encodePercentage(java.lang.String c)
Replaces non-normal characters in a String by Percentage codes. |
static java.lang.String |
encodeURIPathComponent(char c)
Encodes a char to percentage code, if it is not a path character in the sense of URIs |
static java.lang.String |
encodeURIPathComponent(java.lang.String s)
Encodes a char to percentage code, if it is not a path character in the sense of URIs |
static java.lang.String |
encodeURIPathComponentXML(java.lang.String s)
Encodes a char to percentage code, if it is not a path character in the sense of XMLs |
static java.lang.String |
encodeUTF8(int c)
Encodes a character to UTF8 (if necessary) |
static java.lang.String |
encodeUTF8(java.lang.String c)
Encodes a String to UTF8 |
static java.lang.String |
encodeXmlAttribute(java.lang.String str)
Encodes a String with reserved XML characters into a valid xml string for attributes. |
static boolean |
endsWith(java.lang.CharSequence s,
java.lang.String end)
TRUE if the Charsequence ends with the string |
static java.lang.String |
hexAll(java.lang.String s)
Returns the chars of a String in hex |
static boolean |
in(char c,
char a,
char b)
Tells whether a char is in a range |
static boolean |
in(char c,
java.lang.String s)
Tells whether a char is in a string |
static boolean |
isAlphanumeric(char c)
Tells whether a char is alphanumeric in the sense of URIs |
static boolean |
isEscaped(java.lang.String s)
Tells whether a string is escaped in the sense of URIs |
static boolean |
isGenDelim(char c)
Tells whether a char is a general delimiter in the sense of URIs |
static boolean |
isPchar(char c)
Tells whether a char is a valid path component in the sense of URIs |
static boolean |
isReserved(char c)
Tells whether a char is reserved in the sense of URIs |
static boolean |
isSubDelim(char c)
Tells whether a char is a sub-delimiter in the sense of URIs |
static boolean |
isUnreserved(char c)
Tells whether a char is unreserved in the sense of URIs (not the same as !reserved) |
static char |
last(java.lang.CharSequence s)
Returns the last character of a String or 0 |
static java.lang.String |
lowCaseFirst(java.lang.String s)
Lowcases the first character in a String |
static void |
main(java.lang.String[] argv)
Test routine |
static java.lang.String |
normalize(int c)
Normalizes a character to a String of characters in the range 0x20-0x7F. |
static java.lang.String |
normalize(java.lang.String s)
Normalizes all chars in a String to characters 0x20-0x7F |
static java.lang.String |
toHTML(java.lang.String s)
Returns an HTML-String of the String |
static java.lang.CharSequence |
truncate(java.lang.CharSequence s,
int len)
Returns a string of the given length, fills with spaces if necessary |
static java.lang.String |
upCaseFirst(java.lang.String s)
Upcases the first character in a String |
static int |
Utf8Length(char c)
Tells from the first UTF-8 code character how long the code is. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static Char.Char2StringFn defaultNormalizer
public static java.lang.String UNKNOWN
public static java.util.Map<java.lang.Character,java.lang.String> charToAmpersand
public static java.util.Map<java.lang.Character,java.lang.String> charToBackslash
public static java.util.Map<java.lang.String,java.lang.Character> ampersandMap
public static java.util.Map<java.lang.Character,java.lang.String> normalizeMap
Constructor Detail |
---|
public Char()
Method Detail |
---|
public static java.lang.String normalize(int c)
public static char eatPercentage(java.lang.String a, int[] n)
public static char eatAmpersand(java.lang.String a, int[] n)
public static int Utf8Length(char c)
public static char eatUtf8(java.lang.String a, int[] n)
public static java.lang.String decodeUTF8(java.lang.String s)
public static java.lang.String decodePercentage(java.lang.String s)
public static java.lang.String decodeAmpersand_UNKNOWN(java.lang.String s)
public static java.lang.String decodeAmpersand(java.lang.String s, PositionTracker posTracker)
public static java.lang.String decodeAmpersand(java.lang.String s)
public static java.lang.String decodeBackslash(java.lang.String s)
public static java.lang.String encodeBackslash(java.lang.CharSequence s, Char.Legal legal)
public static char eatBackslash(java.lang.String a, int[] n)
public static java.lang.String decode(java.lang.String s)
public static java.lang.String encodeUTF8(int c)
public static java.lang.String encodeBackslash(char c)
public static java.lang.String encodeBackslashToAlphanumeric(char c)
public static java.lang.String encodeBackslashToASCII(char c)
public static java.lang.String encodeAmpersand(char c)
public static java.lang.String encodeAmpersandToAlphanumeric(char c)
public static java.lang.String encodePercentage(char c)
public static java.lang.String encodeXmlAttribute(java.lang.String str)
str
-
public static boolean in(char c, char a, char b)
public static boolean in(char c, java.lang.String s)
public static boolean isAlphanumeric(char c)
public static boolean isReserved(char c)
public static boolean isUnreserved(char c)
public static boolean isEscaped(java.lang.String s)
public static boolean isSubDelim(char c)
public static boolean isGenDelim(char c)
public static boolean isPchar(char c)
public static java.lang.String encodeURIPathComponent(char c)
public static java.lang.String encodeURIPathComponent(java.lang.String s)
public static java.lang.String encodeURIPathComponentXML(java.lang.String s)
public static java.lang.String decodeURIPathComponent(java.lang.String s)
public static java.lang.String encodeUTF8(java.lang.String c)
public static java.lang.String encodeBackslash(java.lang.String c)
public static java.lang.String encodeBackslashToAlphanumeric(java.lang.String c)
public static java.lang.String encodeBackslashToASCII(java.lang.String c)
public static java.lang.String encodeAmpersand(java.lang.String c)
public static java.lang.String encodeAmpersandToAlphanumeric(java.lang.String c)
public static java.lang.String encodePercentage(java.lang.String c)
public static java.lang.String decodeAndNormalize(java.lang.String s)
public static java.lang.String normalize(java.lang.String s)
public static char last(java.lang.CharSequence s)
public static java.lang.String cutLast(java.lang.String s)
public static java.lang.StringBuilder cutLast(java.lang.StringBuilder s)
public static java.lang.String toHTML(java.lang.String s)
public static java.lang.String hexAll(java.lang.String s)
public static java.lang.String encodeHex(java.lang.String s)
public static java.lang.String upCaseFirst(java.lang.String s)
public static java.lang.String lowCaseFirst(java.lang.String s)
public static java.lang.CharSequence truncate(java.lang.CharSequence s, int len)
public static java.lang.String capitalize(java.lang.String s)
public static boolean endsWith(java.lang.CharSequence s, java.lang.String end)
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |