Class StringUtilities
- java.lang.Object
-
- ghidra.util.StringUtilities
-
public class StringUtilities extends java.lang.Object
Class with static methods that deal with string manipulation.
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.regex.Pattern
DOUBLE_QUOTED_STRING_PATTERN
static java.lang.String
LINE_SEPARATOR
The platform specific string that is the line separator.static int
UNICODE_BE_BYTE_ORDER_MARK
Unicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.static int
UNICODE_LE16_BYTE_ORDER_MARK
static int
UNICODE_LE32_BYTE_ORDER_MARK
static int
UNICODE_REPLACEMENT
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
characterToString(char c)
Converts the character into a string.static boolean
containsAll(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
Returns true if all the given searches are contained in the given string.static boolean
containsAllIgnoreCase(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
Returns true if all the given searches are contained in the given string, ignoring case.static java.lang.String
convertCodePointToEscapeSequence(int codePoint)
Maps known control characters to corresponding escape sequences.static java.lang.String
convertControlCharsToEscapeSequences(java.lang.String str)
Replaces known control characters in a string to corresponding escape sequences.static java.lang.String
convertEscapeSequences(java.lang.String str)
Replaces escaped characters in a string to corresponding control characters.static java.lang.String
convertTabsToSpaces(java.lang.String str, int tabSize)
Convert tabs in the given string to spaces.static int
countOccurrences(java.lang.String string, char occur)
Returns a count of how many times the 'occur' char appears in the strings.static boolean
endsWithIgnoreCase(java.lang.String string, java.lang.String postfix)
Returns true if the given string ends with postfix, ignoring case.static boolean
endsWithWhiteSpace(java.lang.String string)
static boolean
equals(java.lang.String s1, java.lang.String s2, boolean caseSensitive)
static java.lang.String
extractFromDoubleQuotes(java.lang.String str)
If the given string is enclosed in double quotes, extract the inner text.static int
findLastWordPosition(java.lang.String s)
Finds the starting position of the last word in the given string.static java.lang.String
findWord(java.lang.String s, int index)
Finds the word at the given index in the given string.static java.lang.String
findWord(java.lang.String s, int index, char[] charsToAllow)
Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string.static WordLocation
findWordLocation(java.lang.String s, int index, char[] charsToAllow)
static java.lang.String
fixMultipleAsterisks(java.lang.String value)
This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra.static java.lang.String
getLastWord(java.lang.String s, java.lang.String separator)
Takes a path-like string and retrieves the last non-empty item.static java.lang.String
indentLines(java.lang.String s, java.lang.String indent)
Splits the given string into lines using\n
and then pads each string with the given pad string.static int
indexOfWord(java.lang.String text, java.lang.String searchWord)
Returns the index of the first whole word occurrence of the search word within the given text.static boolean
isAllBlank(java.lang.CharSequence... sequences)
Returns true if all the given sequences are either null or only whitespacestatic boolean
isAsciiChar(char c)
Returns true if the given character is within the ascii range.static boolean
isAsciiChar(int codePoint)
Returns true if the given code point is within the ascii range.static boolean
isControlCharacterOrBackslash(char c)
Returns true if the given character is a special character.static boolean
isControlCharacterOrBackslash(int codePoint)
Returns true if the given codePoint (ie.static boolean
isDisplayable(int c)
Returns true if the character is in displayable character rangestatic boolean
isDoubleQuoted(java.lang.String str)
Determines if a string is enclosed in double quotes (ASCII 34 (0x22))static boolean
isUnicodeReplacementCodePoint(int codePoint)
Returns true if the specified code point is the 'replacement' code point 0xFFFD, which is used when decoding bytes into unicode chars and there was a bad or invalid sequence that does not have a mapping.static boolean
isValidCLanguageChar(char c)
Returns true if the character is OK to be contained inside C language string.static boolean
isWholeWord(java.lang.String text, int startIndex, int length)
Returns true if the substring within the text string starting at startIndex and having the given length is a whole word.static boolean
isWordChar(char c, char[] charsToAllow)
Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human.static java.lang.String
mergeStrings(java.lang.String string1, java.lang.String string2)
Merge two strings into one.static java.lang.String
pad(java.lang.String source, char filler, int length)
Pads the source string to the specified length, using the filler string as the pad.static java.lang.String
reverse(java.lang.String s)
Reverse the characters in the given stringstatic boolean
startsWithIgnoreCase(java.lang.String string, java.lang.String prefix)
Returns true if the given string starts with prefix ignoring case.static java.lang.String
toFixedSize(java.lang.String s, char pad, int size)
Enforces the given length upon the given string by trimming and then padding as necessary.static java.lang.String[]
toLines(java.lang.String str)
Parses a string containing multiple lines into an array where each element in the array contains only a single line.static java.lang.String[]
toLines(java.lang.String s, boolean preserveTokens)
Parses a string containing multiple lines into an array where each element in the array contains only a single line.static java.lang.String
toQuotedString(byte[] bytes)
Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.static java.lang.String
toQuotedString(byte[] bytes, int charSize)
Generate a quoted string from US-ASCII characters, where each character is charSize bytes.static java.lang.String
toString(int value)
Converts an integer into a string.static java.lang.String
toString(java.util.Collection<?> collection, java.lang.String separator)
Turn the given data into an attractive string, with the separator of your choosingstatic java.lang.String
toStringWithIndent(java.lang.Object o)
static java.lang.String
trim(java.lang.String original, int max)
Limits the given string to the given max number of characters.static java.lang.String
trimMiddle(java.lang.String s, int max)
Trims the given string themax
number of characters.static java.lang.String
trimTrailingNulls(java.lang.String s)
-
-
-
Field Detail
-
DOUBLE_QUOTED_STRING_PATTERN
public static final java.util.regex.Pattern DOUBLE_QUOTED_STRING_PATTERN
-
LINE_SEPARATOR
public static final java.lang.String LINE_SEPARATOR
The platform specific string that is the line separator.
-
UNICODE_REPLACEMENT
public static final int UNICODE_REPLACEMENT
- See Also:
- Constant Field Values
-
UNICODE_BE_BYTE_ORDER_MARK
public static final int UNICODE_BE_BYTE_ORDER_MARK
Unicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.The value for the BigEndian version (0xFEFF) works for both 16 and 32 bit character values.
There are separate values for Little Endian Byte Order Marks for 16 and 32 bit characters because the 32 bit value is shifted left by 16 bits.
- See Also:
- Constant Field Values
-
UNICODE_LE16_BYTE_ORDER_MARK
public static final int UNICODE_LE16_BYTE_ORDER_MARK
- See Also:
- Constant Field Values
-
UNICODE_LE32_BYTE_ORDER_MARK
public static final int UNICODE_LE32_BYTE_ORDER_MARK
- See Also:
- Constant Field Values
-
-
Method Detail
-
isControlCharacterOrBackslash
public static boolean isControlCharacterOrBackslash(char c)
Returns true if the given character is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.- Parameters:
c
- the character- Returns:
- true if the given character is a special character
-
isControlCharacterOrBackslash
public static boolean isControlCharacterOrBackslash(int codePoint)
Returns true if the given codePoint (ie. full unicode 32bit character) is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.- Parameters:
codePoint
- the codePoint (ie. character), seeString.codePointAt(int)
- Returns:
- true if the given character is a special character
-
isDoubleQuoted
public static boolean isDoubleQuoted(java.lang.String str)
Determines if a string is enclosed in double quotes (ASCII 34 (0x22))- Parameters:
str
- String to test for double-quote enclosure- Returns:
- True if the first and last characters are the double-quote character, false otherwise
-
extractFromDoubleQuotes
public static java.lang.String extractFromDoubleQuotes(java.lang.String str)
If the given string is enclosed in double quotes, extract the inner text. Otherwise, return the given string unmodified.- Parameters:
str
- String to match and extract from- Returns:
- The inner text of a doubly-quoted string, or the original string if not double-quoted.
-
isDisplayable
public static boolean isDisplayable(int c)
Returns true if the character is in displayable character range- Parameters:
c
- the character- Returns:
- true if the character is in displayable character range
-
isAllBlank
public static boolean isAllBlank(java.lang.CharSequence... sequences)
Returns true if all the given sequences are either null or only whitespace- Parameters:
sequences
- the sequences to check- Returns:
- true if all the given sequences are either null or only whitespace.
- See Also:
StringUtils.isNoneBlank(CharSequence...)
,StringUtils.isNoneEmpty(CharSequence...)
,StringUtils.isAnyBlank(CharSequence...)
,StringUtils.isAnyEmpty(CharSequence...)
-
characterToString
public static java.lang.String characterToString(char c)
Converts the character into a string. If the character is special, it will actually render the character. For example, given '\n' the output would be "\\n".- Parameters:
c
- the character to convert into a string- Returns:
- the converted character
-
countOccurrences
public static int countOccurrences(java.lang.String string, char occur)
Returns a count of how many times the 'occur' char appears in the strings.- Parameters:
string
- the string to look insideoccur
- the character to look for/- Returns:
- a count of how many times the 'occur' char appears in the strings
-
equals
public static boolean equals(java.lang.String s1, java.lang.String s2, boolean caseSensitive)
-
endsWithWhiteSpace
public static boolean endsWithWhiteSpace(java.lang.String string)
-
toQuotedString
public static java.lang.String toQuotedString(byte[] bytes)
Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.). If a character size other than 1-byte is required the alternate form of this method should be used.
The result string will be single quoted (ie. "'") if the input byte array is 1 byte long, otherwise the result will be double-quoted ('"').
- Parameters:
bytes
- character string bytes- Returns:
- escaped string for display use
-
toQuotedString
public static java.lang.String toQuotedString(byte[] bytes, int charSize)
Generate a quoted string from US-ASCII characters, where each character is charSize bytes.Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.).
The result string will be single quoted (ie. "'") if the input byte array is 1 character long (ie. charSize), otherwise the result will be double-quoted ('"').
- Parameters:
bytes
- array of bytescharSize
- number of bytes per character (1, 2, 4).- Returns:
- escaped string for display use
-
startsWithIgnoreCase
public static boolean startsWithIgnoreCase(java.lang.String string, java.lang.String prefix)
Returns true if the given string starts with prefix ignoring case.Note: This method is equivalent to calling:
string.regionMatches( true, 0, prefix, 0, prefix.length() );
- Parameters:
string
- the string which may contain the prefixprefix
- the prefix to test against- Returns:
- true if the given string starts with prefix ignoring case.
-
endsWithIgnoreCase
public static boolean endsWithIgnoreCase(java.lang.String string, java.lang.String postfix)
Returns true if the given string ends with postfix, ignoring case.Note: This method is equivalent to calling:
int startIndex = string.length() - postfix.length(); string.regionMatches( true, startOffset, postfix, 0, postfix.length() );
- Parameters:
string
- the string which may end with postfixpostfix
- the string for which to test existence- Returns:
- true if the given string ends with postfix, ignoring case.
-
containsAll
public static boolean containsAll(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
Returns true if all the given searches are contained in the given string.- Parameters:
toSearch
- the string to searchsearches
- the strings to find- Returns:
- true if all the given searches are contained in the given string.
-
containsAllIgnoreCase
public static boolean containsAllIgnoreCase(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
Returns true if all the given searches are contained in the given string, ignoring case.- Parameters:
toSearch
- the string to searchsearches
- the strings to find- Returns:
- true if all the given searches are contained in the given string.
-
indexOfWord
public static int indexOfWord(java.lang.String text, java.lang.String searchWord)
Returns the index of the first whole word occurrence of the search word within the given text. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.- Parameters:
text
- the text to be searched.searchWord
- the word to search for.- Returns:
- the index of the first whole word occurrence of the search word within the given text, or -1 if not found.
-
isWholeWord
public static boolean isWholeWord(java.lang.String text, int startIndex, int length)
Returns true if the substring within the text string starting at startIndex and having the given length is a whole word. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.- Parameters:
text
- the text containing the potential word.startIndex
- the start index of the potential word within the text.length
- the length of the potential word- Returns:
- true if the substring within the text string starting at startIndex and having the given length is a whole word.
-
convertTabsToSpaces
public static java.lang.String convertTabsToSpaces(java.lang.String str, int tabSize)
Convert tabs in the given string to spaces.- Parameters:
str
- string containing tabstabSize
- length of the tab- Returns:
- string that has spaces for tabs
-
toLines
public static java.lang.String[] toLines(java.lang.String str)
Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.This methods creates an empty string entry in the result array for initial and trailing separator chars, as well as for consecutive separators.
- Parameters:
str
- the string to parse- Returns:
- an array of lines; an empty array if the given value is null or empty
- See Also:
StringUtils.splitPreserveAllTokens(String, char)
-
toLines
public static java.lang.String[] toLines(java.lang.String s, boolean preserveTokens)
Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.- Parameters:
s
- the string to parsepreserveTokens
- true signals to treat consecutive newlines as multiple lines; false signals to treat consecutive newlines as a single line break- Returns:
- an array of lines; an empty array if the given value is null or empty
-
toFixedSize
public static java.lang.String toFixedSize(java.lang.String s, char pad, int size)
Enforces the given length upon the given string by trimming and then padding as necessary.- Parameters:
s
- the String to fixpad
- the pad character to use if padding is requiredsize
- the desired size of the string- Returns:
- the fixed string
-
pad
public static java.lang.String pad(java.lang.String source, char filler, int length)
Pads the source string to the specified length, using the filler string as the pad. If length is negative, left justifies the string, appending the filler; if length is positive, right justifies the source string.- Parameters:
source
- the original string to pad.filler
- the type of characters with which to padlength
- the length of padding to add (0 results in no changes)- Returns:
- the padded string
-
indentLines
public static java.lang.String indentLines(java.lang.String s, java.lang.String indent)
Splits the given string into lines using\n
and then pads each string with the given pad string. Finally, the updated lines are formed into a single string.This is useful for constructing complicated
toString()
representations.- Parameters:
s
- the input stringindent
- the indent string; this will be appended as needed- Returns:
- the output string
-
findWord
public static java.lang.String findWord(java.lang.String s, int index)
Finds the word at the given index in the given string. For example, the string "The tree is green" and the index of 5, the result would be "tree".- Parameters:
s
- the string to searchindex
- the index into the string to "seed" the word.- Returns:
- String the word contained at the given index.
-
findWord
public static java.lang.String findWord(java.lang.String s, int index, char[] charsToAllow)
Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string. For example, the string "The tree* is green" and the index of 5, charToAllow is '*', then the result would be "tree*".If the search yields only whitespace, then the empty string will be returned.
- Parameters:
s
- the string to searchindex
- the index into the string to "seed" the word.charsToAllow
- chars that normally would be considered invalid, e.g., '*' so that the word can be returned with the charToAllow- Returns:
- String the word contained at the given index.
-
findWordLocation
public static WordLocation findWordLocation(java.lang.String s, int index, char[] charsToAllow)
-
isWordChar
public static boolean isWordChar(char c, char[] charsToAllow)
Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human. Also, provided allows chars will pass the test.- Parameters:
c
- the char to checkcharsToAllow
- characters that will cause this method to return true- Returns:
- true if it is a 'word char'
-
findLastWordPosition
public static int findLastWordPosition(java.lang.String s)
Finds the starting position of the last word in the given string.- Parameters:
s
- the string to search- Returns:
- int the starting position of the last word, -1 if not found
-
getLastWord
public static java.lang.String getLastWord(java.lang.String s, java.lang.String separator)
Takes a path-like string and retrieves the last non-empty item. Examples:- StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
- StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
- StringUtilities.getLastWord("This.is.my.last.word", ".") returns word
- StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", ".") returns java
- StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", "/") returns MyFile.java
- Parameters:
s
- the string from which to get the last wordseparator
- the separator of words- Returns:
- the last word
-
toString
public static java.lang.String toString(int value)
Converts an integer into a string. For example, given an integer 0x41424344, the returned string would be "ABCD".- Parameters:
value
- the integer value- Returns:
- the converted string
-
toString
public static java.lang.String toString(java.util.Collection<?> collection, java.lang.String separator)
Turn the given data into an attractive string, with the separator of your choosing- Parameters:
collection
- the data from which a string will be generatedseparator
- the string used to separate elements- Returns:
- a string representation of the given list
-
toStringWithIndent
public static java.lang.String toStringWithIndent(java.lang.Object o)
-
reverse
public static java.lang.String reverse(java.lang.String s)
Reverse the characters in the given string- Parameters:
s
- the string to reverse- Returns:
- the reversed string
-
mergeStrings
public static java.lang.String mergeStrings(java.lang.String string1, java.lang.String string2)
Merge two strings into one. If one string contains the other, then the largest is returned. If both strings are null then null is returned. If both strings are empty, the empty string is returned. If the original two strings differ, this adds the second string to the first separated by a newline.- Parameters:
string1
- the first stringstring2
- the second string- Returns:
- the merged string
-
trim
public static java.lang.String trim(java.lang.String original, int max)
Limits the given string to the given max number of characters. If the string is larger than the given length, then it will be trimmed to fit that length after adding ellipsesThe given
max
value must be at least 4. This is to ensure that, at a minimum, we can display the "..." plus one character.- Parameters:
original
- The string to be limitedmax
- The maximum number of characters to display (including ellipses, if trimmed).- Returns:
- the trimmed string
- Throws:
java.lang.IllegalArgumentException
- If the givenmax
value is less than 5.
-
trimTrailingNulls
public static java.lang.String trimTrailingNulls(java.lang.String s)
-
trimMiddle
public static java.lang.String trimMiddle(java.lang.String s, int max)
Trims the given string themax
number of characters. Ellipses will be added to signal that content was removed. Thus, the actual number of removed characters will be(s.length() - max) + "..."
length.If the string fits within the max, then the string will be returned.
The given
max
value must be at least 5. This is to ensure that, at a minimum, we can display the "..." plus one character from the front and back of the string.- Parameters:
s
- the string to trimmax
- the max number of characters to allow.- Returns:
- the trimmed string
-
fixMultipleAsterisks
public static java.lang.String fixMultipleAsterisks(java.lang.String value)
This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra. This is necessary due to some symbol names which cause the pattern matching process to become unusable. An example string that causes this problem is "s_CLSID\{ADB880A6-D8FF-11CF-9377-00AA003B7A11}\InprocServer3_01001400".- Parameters:
value
- The string to be checked.- Returns:
- The updated string.
-
isValidCLanguageChar
public static boolean isValidCLanguageChar(char c)
Returns true if the character is OK to be contained inside C language string. That is, the string should not be tokenized on this char.- Parameters:
c
- the char- Returns:
- boolean true if it is allows in a C string
-
isAsciiChar
public static boolean isAsciiChar(char c)
Returns true if the given character is within the ascii range.- Parameters:
c
- the char to check- Returns:
- true if the given character is within the ascii range.
-
isAsciiChar
public static boolean isAsciiChar(int codePoint)
Returns true if the given code point is within the ascii range.- Parameters:
codePoint
- the codePoint to check- Returns:
- true if the given character is within the ascii range.
-
convertEscapeSequences
public static java.lang.String convertEscapeSequences(java.lang.String str)
Replaces escaped characters in a string to corresponding control characters. For example a string containing a backslash character followed by a 'n' character would be replaced with a single line feed (0x0a) character. One use for this is to to allow users to type strings in a text field and include control characters such as line feeds and tabs. The string that contains 'a','b','c', '\', 'n', 'd', '\', 'u', '0', '0', '0', '1', 'e' would become 'a','b','c',0x0a,'d', 0x01, e"- Parameters:
str
- The string to convert escape sequences to control characters.- Returns:
- a new string with escape sequences converted to control characters.
- See Also:
convertEscapeSequences(String string)
-
convertControlCharsToEscapeSequences
public static java.lang.String convertControlCharsToEscapeSequences(java.lang.String str)
Replaces known control characters in a string to corresponding escape sequences. For example a string containing a line feed character would be converted to backslash character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters. The string that contains 'a','b','c',0x0a,'d', 0x01, 'e' would become 'a','b','c', '\', 'n', 'd', 0x01, 'e'- Parameters:
str
- The string to convert control characters to escape sequences- Returns:
- a new string with all the control characters converted to escape sequences.
-
convertCodePointToEscapeSequence
public static java.lang.String convertCodePointToEscapeSequence(int codePoint)
Maps known control characters to corresponding escape sequences. For example a line feed character would be converted to backslash '\\' character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters.- Parameters:
codePoint
- The character to convert to escape sequence string- Returns:
- a new string with equivalent to escape sequence, or original character (as a string) if not in the control character mapping.
-
isUnicodeReplacementCodePoint
public static boolean isUnicodeReplacementCodePoint(int codePoint)
Returns true if the specified code point is the 'replacement' code point 0xFFFD, which is used when decoding bytes into unicode chars and there was a bad or invalid sequence that does not have a mapping. (ie. decoding byte char 0x80 as US-ASCII)- Parameters:
codePoint
- to test- Returns:
- boolean true if the char is 0xFFFD (ie. UNICODE REPLACEMENT char)
-
-