Package ghidra.util

Class StringUtilities


  • public class StringUtilities
    extends java.lang.Object
    Class with static methods that deal with string manipulation.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String characterToString​(char c)
      Converts the character into a string.
      static boolean containsAll​(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
      Returns true if all the given searches are contained in the given string.
      static boolean containsAllIgnoreCase​(java.lang.CharSequence toSearch, java.lang.CharSequence... searches)
      Returns true if all the given searches are contained in the given string, ignoring case.
      static java.lang.String convertCodePointToEscapeSequence​(int codePoint)
      Maps known control characters to corresponding escape sequences.
      static java.lang.String convertControlCharsToEscapeSequences​(java.lang.String str)
      Replaces known control characters in a string to corresponding escape sequences.
      static java.lang.String convertEscapeSequences​(java.lang.String str)
      Replaces escaped characters in a string to corresponding control characters.
      static java.lang.String convertTabsToSpaces​(java.lang.String str, int tabSize)
      Convert tabs in the given string to spaces.
      static int countOccurrences​(java.lang.String string, char occur)
      Returns a count of how many times the 'occur' char appears in the strings.
      static boolean endsWithIgnoreCase​(java.lang.String string, java.lang.String postfix)
      Returns true if the given string ends with postfix, ignoring case.
      static boolean endsWithWhiteSpace​(java.lang.String string)  
      static boolean equals​(java.lang.String s1, java.lang.String s2, boolean caseSensitive)  
      static java.lang.String extractFromDoubleQuotes​(java.lang.String str)
      If the given string is enclosed in double quotes, extract the inner text.
      static int findLastWordPosition​(java.lang.String s)
      Finds the starting position of the last word in the given string.
      static java.lang.String findWord​(java.lang.String s, int index)
      Finds the word at the given index in the given string.
      static java.lang.String findWord​(java.lang.String s, int index, char[] charsToAllow)
      Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string.
      static WordLocation findWordLocation​(java.lang.String s, int index, char[] charsToAllow)  
      static java.lang.String fixMultipleAsterisks​(java.lang.String value)
      This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra.
      static java.lang.String getLastWord​(java.lang.String s, java.lang.String separator)
      Takes a path-like string and retrieves the last non-empty item.
      static java.lang.String indentLines​(java.lang.String s, java.lang.String indent)
      Splits the given string into lines using \n and then pads each string with the given pad string.
      static int indexOfWord​(java.lang.String text, java.lang.String searchWord)
      Returns the index of the first whole word occurrence of the search word within the given text.
      static boolean isAllBlank​(java.lang.CharSequence... sequences)
      Returns true if all the given sequences are either null or only whitespace
      static boolean isAsciiChar​(char c)
      Returns true if the given character is within the ascii range.
      static boolean isAsciiChar​(int codePoint)
      Returns true if the given code point is within the ascii range.
      static boolean isControlCharacterOrBackslash​(char c)
      Returns true if the given character is a special character.
      static boolean isControlCharacterOrBackslash​(int codePoint)
      Returns true if the given codePoint (ie.
      static boolean isDisplayable​(int c)
      Returns true if the character is in displayable character range
      static boolean isDoubleQuoted​(java.lang.String str)
      Determines if a string is enclosed in double quotes (ASCII 34 (0x22))
      static boolean isUnicodeReplacementCodePoint​(int codePoint)
      Returns true if the specified code point is the 'replacement' code point 0xFFFD, which is used when decoding bytes into unicode chars and there was a bad or invalid sequence that does not have a mapping.
      static boolean isValidCLanguageChar​(char c)
      Returns true if the character is OK to be contained inside C language string.
      static boolean isWholeWord​(java.lang.String text, int startIndex, int length)
      Returns true if the substring within the text string starting at startIndex and having the given length is a whole word.
      static boolean isWordChar​(char c, char[] charsToAllow)
      Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human.
      static java.lang.String mergeStrings​(java.lang.String string1, java.lang.String string2)
      Merge two strings into one.
      static java.lang.String pad​(java.lang.String source, char filler, int length)
      Pads the source string to the specified length, using the filler string as the pad.
      static java.lang.String reverse​(java.lang.String s)
      Reverse the characters in the given string
      static boolean startsWithIgnoreCase​(java.lang.String string, java.lang.String prefix)
      Returns true if the given string starts with prefix ignoring case.
      static java.lang.String toFixedSize​(java.lang.String s, char pad, int size)
      Enforces the given length upon the given string by trimming and then padding as necessary.
      static java.lang.String[] toLines​(java.lang.String str)
      Parses a string containing multiple lines into an array where each element in the array contains only a single line.
      static java.lang.String[] toLines​(java.lang.String s, boolean preserveTokens)
      Parses a string containing multiple lines into an array where each element in the array contains only a single line.
      static java.lang.String toQuotedString​(byte[] bytes)
      Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.
      static java.lang.String toQuotedString​(byte[] bytes, int charSize)
      Generate a quoted string from US-ASCII characters, where each character is charSize bytes.
      static java.lang.String toString​(int value)
      Converts an integer into a string.
      static java.lang.String toString​(java.util.Collection<?> collection, java.lang.String separator)
      Turn the given data into an attractive string, with the separator of your choosing
      static java.lang.String toStringWithIndent​(java.lang.Object o)  
      static java.lang.String trim​(java.lang.String original, int max)
      Limits the given string to the given max number of characters.
      static java.lang.String trimMiddle​(java.lang.String s, int max)
      Trims the given string the max number of characters.
      static java.lang.String trimTrailingNulls​(java.lang.String s)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DOUBLE_QUOTED_STRING_PATTERN

        public static final java.util.regex.Pattern DOUBLE_QUOTED_STRING_PATTERN
      • LINE_SEPARATOR

        public static final java.lang.String LINE_SEPARATOR
        The platform specific string that is the line separator.
      • UNICODE_BE_BYTE_ORDER_MARK

        public static final int UNICODE_BE_BYTE_ORDER_MARK
        Unicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.

        The value for the BigEndian version (0xFEFF) works for both 16 and 32 bit character values.

        There are separate values for Little Endian Byte Order Marks for 16 and 32 bit characters because the 32 bit value is shifted left by 16 bits.

        See Also:
        Constant Field Values
      • UNICODE_LE16_BYTE_ORDER_MARK

        public static final int UNICODE_LE16_BYTE_ORDER_MARK
        See Also:
        Constant Field Values
      • UNICODE_LE32_BYTE_ORDER_MARK

        public static final int UNICODE_LE32_BYTE_ORDER_MARK
        See Also:
        Constant Field Values
    • Method Detail

      • isControlCharacterOrBackslash

        public static boolean isControlCharacterOrBackslash​(char c)
        Returns true if the given character is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.
        Parameters:
        c - the character
        Returns:
        true if the given character is a special character
      • isControlCharacterOrBackslash

        public static boolean isControlCharacterOrBackslash​(int codePoint)
        Returns true if the given codePoint (ie. full unicode 32bit character) is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.
        Parameters:
        codePoint - the codePoint (ie. character), see String.codePointAt(int)
        Returns:
        true if the given character is a special character
      • isDoubleQuoted

        public static boolean isDoubleQuoted​(java.lang.String str)
        Determines if a string is enclosed in double quotes (ASCII 34 (0x22))
        Parameters:
        str - String to test for double-quote enclosure
        Returns:
        True if the first and last characters are the double-quote character, false otherwise
      • extractFromDoubleQuotes

        public static java.lang.String extractFromDoubleQuotes​(java.lang.String str)
        If the given string is enclosed in double quotes, extract the inner text. Otherwise, return the given string unmodified.
        Parameters:
        str - String to match and extract from
        Returns:
        The inner text of a doubly-quoted string, or the original string if not double-quoted.
      • isDisplayable

        public static boolean isDisplayable​(int c)
        Returns true if the character is in displayable character range
        Parameters:
        c - the character
        Returns:
        true if the character is in displayable character range
      • isAllBlank

        public static boolean isAllBlank​(java.lang.CharSequence... sequences)
        Returns true if all the given sequences are either null or only whitespace
        Parameters:
        sequences - the sequences to check
        Returns:
        true if all the given sequences are either null or only whitespace.
        See Also:
        StringUtils.isNoneBlank(CharSequence...), StringUtils.isNoneEmpty(CharSequence...), StringUtils.isAnyBlank(CharSequence...), StringUtils.isAnyEmpty(CharSequence...)
      • characterToString

        public static java.lang.String characterToString​(char c)
        Converts the character into a string. If the character is special, it will actually render the character. For example, given '\n' the output would be "\\n".
        Parameters:
        c - the character to convert into a string
        Returns:
        the converted character
      • countOccurrences

        public static int countOccurrences​(java.lang.String string,
                                           char occur)
        Returns a count of how many times the 'occur' char appears in the strings.
        Parameters:
        string - the string to look inside
        occur - the character to look for/
        Returns:
        a count of how many times the 'occur' char appears in the strings
      • equals

        public static boolean equals​(java.lang.String s1,
                                     java.lang.String s2,
                                     boolean caseSensitive)
      • endsWithWhiteSpace

        public static boolean endsWithWhiteSpace​(java.lang.String string)
      • toQuotedString

        public static java.lang.String toQuotedString​(byte[] bytes)
        Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.

        Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.). If a character size other than 1-byte is required the alternate form of this method should be used.

        The result string will be single quoted (ie. "'") if the input byte array is 1 byte long, otherwise the result will be double-quoted ('"').

        Parameters:
        bytes - character string bytes
        Returns:
        escaped string for display use
      • toQuotedString

        public static java.lang.String toQuotedString​(byte[] bytes,
                                                      int charSize)
        Generate a quoted string from US-ASCII characters, where each character is charSize bytes.

        Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.).

        The result string will be single quoted (ie. "'") if the input byte array is 1 character long (ie. charSize), otherwise the result will be double-quoted ('"').

        Parameters:
        bytes - array of bytes
        charSize - number of bytes per character (1, 2, 4).
        Returns:
        escaped string for display use
      • startsWithIgnoreCase

        public static boolean startsWithIgnoreCase​(java.lang.String string,
                                                   java.lang.String prefix)
        Returns true if the given string starts with prefix ignoring case.

        Note: This method is equivalent to calling:

                string.regionMatches( true, 0, prefix, 0, prefix.length() );
         
        Parameters:
        string - the string which may contain the prefix
        prefix - the prefix to test against
        Returns:
        true if the given string starts with prefix ignoring case.
      • endsWithIgnoreCase

        public static boolean endsWithIgnoreCase​(java.lang.String string,
                                                 java.lang.String postfix)
        Returns true if the given string ends with postfix, ignoring case.

        Note: This method is equivalent to calling:

          int startIndex = string.length() - postfix.length();
                string.regionMatches( true, startOffset, postfix, 0, postfix.length() );
         
        Parameters:
        string - the string which may end with postfix
        postfix - the string for which to test existence
        Returns:
        true if the given string ends with postfix, ignoring case.
      • containsAll

        public static boolean containsAll​(java.lang.CharSequence toSearch,
                                          java.lang.CharSequence... searches)
        Returns true if all the given searches are contained in the given string.
        Parameters:
        toSearch - the string to search
        searches - the strings to find
        Returns:
        true if all the given searches are contained in the given string.
      • containsAllIgnoreCase

        public static boolean containsAllIgnoreCase​(java.lang.CharSequence toSearch,
                                                    java.lang.CharSequence... searches)
        Returns true if all the given searches are contained in the given string, ignoring case.
        Parameters:
        toSearch - the string to search
        searches - the strings to find
        Returns:
        true if all the given searches are contained in the given string.
      • indexOfWord

        public static int indexOfWord​(java.lang.String text,
                                      java.lang.String searchWord)
        Returns the index of the first whole word occurrence of the search word within the given text. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.
        Parameters:
        text - the text to be searched.
        searchWord - the word to search for.
        Returns:
        the index of the first whole word occurrence of the search word within the given text, or -1 if not found.
      • isWholeWord

        public static boolean isWholeWord​(java.lang.String text,
                                          int startIndex,
                                          int length)
        Returns true if the substring within the text string starting at startIndex and having the given length is a whole word. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.
        Parameters:
        text - the text containing the potential word.
        startIndex - the start index of the potential word within the text.
        length - the length of the potential word
        Returns:
        true if the substring within the text string starting at startIndex and having the given length is a whole word.
      • convertTabsToSpaces

        public static java.lang.String convertTabsToSpaces​(java.lang.String str,
                                                           int tabSize)
        Convert tabs in the given string to spaces.
        Parameters:
        str - string containing tabs
        tabSize - length of the tab
        Returns:
        string that has spaces for tabs
      • toLines

        public static java.lang.String[] toLines​(java.lang.String str)
        Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.

        This methods creates an empty string entry in the result array for initial and trailing separator chars, as well as for consecutive separators.

        Parameters:
        str - the string to parse
        Returns:
        an array of lines; an empty array if the given value is null or empty
        See Also:
        StringUtils.splitPreserveAllTokens(String, char)
      • toLines

        public static java.lang.String[] toLines​(java.lang.String s,
                                                 boolean preserveTokens)
        Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.
        Parameters:
        s - the string to parse
        preserveTokens - true signals to treat consecutive newlines as multiple lines; false signals to treat consecutive newlines as a single line break
        Returns:
        an array of lines; an empty array if the given value is null or empty
      • toFixedSize

        public static java.lang.String toFixedSize​(java.lang.String s,
                                                   char pad,
                                                   int size)
        Enforces the given length upon the given string by trimming and then padding as necessary.
        Parameters:
        s - the String to fix
        pad - the pad character to use if padding is required
        size - the desired size of the string
        Returns:
        the fixed string
      • pad

        public static java.lang.String pad​(java.lang.String source,
                                           char filler,
                                           int length)
        Pads the source string to the specified length, using the filler string as the pad. If length is negative, left justifies the string, appending the filler; if length is positive, right justifies the source string.
        Parameters:
        source - the original string to pad.
        filler - the type of characters with which to pad
        length - the length of padding to add (0 results in no changes)
        Returns:
        the padded string
      • indentLines

        public static java.lang.String indentLines​(java.lang.String s,
                                                   java.lang.String indent)
        Splits the given string into lines using \n and then pads each string with the given pad string. Finally, the updated lines are formed into a single string.

        This is useful for constructing complicated toString() representations.

        Parameters:
        s - the input string
        indent - the indent string; this will be appended as needed
        Returns:
        the output string
      • findWord

        public static java.lang.String findWord​(java.lang.String s,
                                                int index)
        Finds the word at the given index in the given string. For example, the string "The tree is green" and the index of 5, the result would be "tree".
        Parameters:
        s - the string to search
        index - the index into the string to "seed" the word.
        Returns:
        String the word contained at the given index.
      • findWord

        public static java.lang.String findWord​(java.lang.String s,
                                                int index,
                                                char[] charsToAllow)
        Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string. For example, the string "The tree* is green" and the index of 5, charToAllow is '*', then the result would be "tree*".

        If the search yields only whitespace, then the empty string will be returned.

        Parameters:
        s - the string to search
        index - the index into the string to "seed" the word.
        charsToAllow - chars that normally would be considered invalid, e.g., '*' so that the word can be returned with the charToAllow
        Returns:
        String the word contained at the given index.
      • findWordLocation

        public static WordLocation findWordLocation​(java.lang.String s,
                                                    int index,
                                                    char[] charsToAllow)
      • isWordChar

        public static boolean isWordChar​(char c,
                                         char[] charsToAllow)
        Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human. Also, provided allows chars will pass the test.
        Parameters:
        c - the char to check
        charsToAllow - characters that will cause this method to return true
        Returns:
        true if it is a 'word char'
      • findLastWordPosition

        public static int findLastWordPosition​(java.lang.String s)
        Finds the starting position of the last word in the given string.
        Parameters:
        s - the string to search
        Returns:
        int the starting position of the last word, -1 if not found
      • getLastWord

        public static java.lang.String getLastWord​(java.lang.String s,
                                                   java.lang.String separator)
        Takes a path-like string and retrieves the last non-empty item. Examples:
        • StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
        • StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
        • StringUtilities.getLastWord("This.is.my.last.word", ".") returns word
        • StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", ".") returns java
        • StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", "/") returns MyFile.java
        Parameters:
        s - the string from which to get the last word
        separator - the separator of words
        Returns:
        the last word
      • toString

        public static java.lang.String toString​(int value)
        Converts an integer into a string. For example, given an integer 0x41424344, the returned string would be "ABCD".
        Parameters:
        value - the integer value
        Returns:
        the converted string
      • toString

        public static java.lang.String toString​(java.util.Collection<?> collection,
                                                java.lang.String separator)
        Turn the given data into an attractive string, with the separator of your choosing
        Parameters:
        collection - the data from which a string will be generated
        separator - the string used to separate elements
        Returns:
        a string representation of the given list
      • toStringWithIndent

        public static java.lang.String toStringWithIndent​(java.lang.Object o)
      • reverse

        public static java.lang.String reverse​(java.lang.String s)
        Reverse the characters in the given string
        Parameters:
        s - the string to reverse
        Returns:
        the reversed string
      • mergeStrings

        public static java.lang.String mergeStrings​(java.lang.String string1,
                                                    java.lang.String string2)
        Merge two strings into one. If one string contains the other, then the largest is returned. If both strings are null then null is returned. If both strings are empty, the empty string is returned. If the original two strings differ, this adds the second string to the first separated by a newline.
        Parameters:
        string1 - the first string
        string2 - the second string
        Returns:
        the merged string
      • trim

        public static java.lang.String trim​(java.lang.String original,
                                            int max)
        Limits the given string to the given max number of characters. If the string is larger than the given length, then it will be trimmed to fit that length after adding ellipses

        The given max value must be at least 4. This is to ensure that, at a minimum, we can display the "..." plus one character.

        Parameters:
        original - The string to be limited
        max - The maximum number of characters to display (including ellipses, if trimmed).
        Returns:
        the trimmed string
        Throws:
        java.lang.IllegalArgumentException - If the given max value is less than 5.
      • trimTrailingNulls

        public static java.lang.String trimTrailingNulls​(java.lang.String s)
      • trimMiddle

        public static java.lang.String trimMiddle​(java.lang.String s,
                                                  int max)
        Trims the given string the max number of characters. Ellipses will be added to signal that content was removed. Thus, the actual number of removed characters will be (s.length() - max) + "..." length.

        If the string fits within the max, then the string will be returned.

        The given max value must be at least 5. This is to ensure that, at a minimum, we can display the "..." plus one character from the front and back of the string.

        Parameters:
        s - the string to trim
        max - the max number of characters to allow.
        Returns:
        the trimmed string
      • fixMultipleAsterisks

        public static java.lang.String fixMultipleAsterisks​(java.lang.String value)
        This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra. This is necessary due to some symbol names which cause the pattern matching process to become unusable. An example string that causes this problem is "s_CLSID\{ADB880A6-D8FF-11CF-9377-00AA003B7A11}\InprocServer3_01001400".
        Parameters:
        value - The string to be checked.
        Returns:
        The updated string.
      • isValidCLanguageChar

        public static boolean isValidCLanguageChar​(char c)
        Returns true if the character is OK to be contained inside C language string. That is, the string should not be tokenized on this char.
        Parameters:
        c - the char
        Returns:
        boolean true if it is allows in a C string
      • isAsciiChar

        public static boolean isAsciiChar​(char c)
        Returns true if the given character is within the ascii range.
        Parameters:
        c - the char to check
        Returns:
        true if the given character is within the ascii range.
      • isAsciiChar

        public static boolean isAsciiChar​(int codePoint)
        Returns true if the given code point is within the ascii range.
        Parameters:
        codePoint - the codePoint to check
        Returns:
        true if the given character is within the ascii range.
      • convertEscapeSequences

        public static java.lang.String convertEscapeSequences​(java.lang.String str)
        Replaces escaped characters in a string to corresponding control characters. For example a string containing a backslash character followed by a 'n' character would be replaced with a single line feed (0x0a) character. One use for this is to to allow users to type strings in a text field and include control characters such as line feeds and tabs. The string that contains 'a','b','c', '\', 'n', 'd', '\', 'u', '0', '0', '0', '1', 'e' would become 'a','b','c',0x0a,'d', 0x01, e"
        Parameters:
        str - The string to convert escape sequences to control characters.
        Returns:
        a new string with escape sequences converted to control characters.
        See Also:
        convertEscapeSequences(String string)
      • convertControlCharsToEscapeSequences

        public static java.lang.String convertControlCharsToEscapeSequences​(java.lang.String str)
        Replaces known control characters in a string to corresponding escape sequences. For example a string containing a line feed character would be converted to backslash character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters. The string that contains 'a','b','c',0x0a,'d', 0x01, 'e' would become 'a','b','c', '\', 'n', 'd', 0x01, 'e'
        Parameters:
        str - The string to convert control characters to escape sequences
        Returns:
        a new string with all the control characters converted to escape sequences.
      • convertCodePointToEscapeSequence

        public static java.lang.String convertCodePointToEscapeSequence​(int codePoint)
        Maps known control characters to corresponding escape sequences. For example a line feed character would be converted to backslash '\\' character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters.
        Parameters:
        codePoint - The character to convert to escape sequence string
        Returns:
        a new string with equivalent to escape sequence, or original character (as a string) if not in the control character mapping.
      • isUnicodeReplacementCodePoint

        public static boolean isUnicodeReplacementCodePoint​(int codePoint)
        Returns true if the specified code point is the 'replacement' code point 0xFFFD, which is used when decoding bytes into unicode chars and there was a bad or invalid sequence that does not have a mapping. (ie. decoding byte char 0x80 as US-ASCII)
        Parameters:
        codePoint - to test
        Returns:
        boolean true if the char is 0xFFFD (ie. UNICODE REPLACEMENT char)