String : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)

String
Strings are quite different from C++. They are immutable, i.e. You can’t change the characters in a string. To look at individual characters, you need to use charAt(). Strings in Java are 16-bit Unicode. To edit strings, you need to use a StringBuffer object or a char[]. In Java version 1.5 or later you use StringBuilder, which works exactly like StringBuffer, but it is faster and not thread-safe.

You get the size of a String (length in chars) with String. length(), not.length or. size() used in other classes.

For manipulating 8-bit characters, you want an array of bytes — byte[].

Empty Strings Trimming
Comparison Validating
Searching Regex
Case-Sensitivity Gotchas
Creating Strings Futures
Literals Learning More
toString Links
Replace

Empty Strings

There are three types of empty string, null, " and ". Here is how to check for each flavour:

String Comparison

The 
if ( "abc".equals (s) ) echo ( "matched" );
is 
if ( s.equals ( "abc" ) ) echo ( "matched" );
because the first form won’t raise an exception if s is null. It will treat the strings as not equal.

Unless Strings have been interned, with String.intern(), you cannot compare them for equality with ==. You have to use equals() instead.

The compiler will not warn you if you inadvertently use ==. Unfortunately, the bug may take a long time to surface if your compiler or virtual machine is doing transparent interning. Interning gets you a reference to the master copy of a String. This allows the duplicates to be garbage collected sooner. However, there are three disadvantages to interning:

  1. It takes extra time to look up the master string in a Hashtable.
  2. In some implementations, you can have a maximum of 64K interned Strings.
  3. In some implementation, interned Strings are never garbage collected, even when they are no longer used. The interning process itself acts as a packratter. The answer is to implement them with weak references.
  4. String comparison does not logically trim leading and trailing whitespace before compare. If you want that effect use.trim().
If you want to compare for < or > you cannot use the usual comparison operators, you have to use compareTo() or compareToIgnoreCase() instead.
String s = "apple";
String t = "orange";
if ( s.compareTo(t) < 0 )
   {
   out.println( "s < t" );
   }
compareTo will return: You can think of it roughly like treating the Strings as numbers and returning s-t.

Novices might be astonished by the following results:

When you write your own classes, the default Object.equals does not do a field by field comparison. You have to write your own version of equals to get that effect. The default version simply tests the equality of the two references — that they both point to the same object.

Case-Sensitive and Case-Insensitive Comparison

Searching

Your basic tools are indexOf and lastIndexOf. They both have variants with a base fromOffset where to start searching. The result is relative to the start of the entire String, not the fromOffset. The common18 package contains a StringSearch class that will search for many different strings. These searches are all case-sensitive. To get case-insensitive searches, convert both Strings to all upper case or all lower case first. You must be  There are variants of the methods that search for a single character. These are faster than the equivalent methods that look for a 1-character String. It would be nice if the compiler were smart enough to optimise a 1-character String constant to a char as the parameter of indexOf. You can abbreviate: x.indexOf( y ) >= 0 as x.contains ( y ).

Creating Strings

Strings are immutable. Therefore they can be reused indefinitely and they can be shared for many purposes. When you assign one String variable to another, no copy is made. Even when you take a substring there is no new String created, though a new String descriptor is. New Strings are created when:

toString

Every Object has a method called toString that makes some sort of attempt to convert the contents of the Object into human-readable form as a Unicode String for display. Normally, when you write a new class, you write you own corresponding toString method for it, even if just for debugging.

You use it like this: String toShow = myThing. toString();

The default Object.toString method is not very clever. It does not display all the primitives in your class with field names as you might expect. If you want that, you must code it yourself. A default toString will typically, instead, do something lame like dump the hashCode or the Object’s address — only mildly interesting.

toString has a magical property. It appears to get invoked automatically to convert to String without you having to mention toString.

  1. In one case, System.out.println (and brothers), it is not really magic. println pulls it off with a plethora of overloaded methods. println has many overloaded methods, one for each of the primitive types and then each overloaded method converts its primitive parameter to a String for you and passes that on to the variant of println that can only handle Strings. But, you say, (glad to see you are so attentive), primitives don’t have a toString method! That is true, but there are static conversion methods to get that effect, such as String. valueOf( double ). For any Object other than a String, println invokes the Object’s usually-overridden custom toString method and passes the result on to the String-eating version of println.
  2. When you use concatenation, toString truly does get called for you magically, sometimes. If ever you try to add two Objects, Java presumes you are really trying to concatenate them and transparently calls each of their toString methods and concatenates the results giving a String. It even works when you try to add a String and a primitive. Concatenation will convert the primitive to a String for you and concatenate the results, transparently. This can lead to surprising results.

Replace

String.replace ( char target, char replacement ) is considerably faster than String. replace( String target, String replacement ). Both replace all occurrences. So if you are replacing just a char, use single quotes. Unfortunately, String. replace( String target, String replacement ) despite the name replaces all substrings. replaceFirst replaces only the first one, but it wants a string-encoded Pattern. Whoever named these methods was a nitwit.

replaceAll( String regex, String replacement ) also replaces all instances. The difference is, replaceAll looks for a regex String not a simple String. Beware of using replaceAll( String regex, String replacement) when you meant replace( String target, String replacement ). The second parameter is not just a simple String. String. replaceAll behaves like Matcher. replaceAll. $ is a reference to a captured String in the search pattern and \ is the regex quote character, meaning literal \ must be coded as \\\\ and literal $ as \\$.

replaceFirst( String regex, String replacement ) also takes a regex. There is no replaceFirst that takes only a simple String.

String.replace in the Javadoc is shown with CharSequence parameters. Don’t let this frighten you. String implements CharSequence, so replace works fine on Strings. replace works on some other things as well such as StringBuilders.

None of the String methods ever modify the String object. They create a new one that you have to save:

e.g.
// String gotcha
   String s = "apple";
   s.replace( 'a', 'b' );      // does nothing
   s = s.replace( 'a', 'b' );  // replaces all a's with b's

   String x = "  apple  ";
   x.trim();     // does nothing
   x = x.trim(); // chops blanks off head and tail

Validating

You can use  com.mindprod.common18.ST. isLegal to ensure a String contains only the characters you consider legal. You can download it. It is pretty simple, using indexOf on the legal String.

You can also use charAt to extract the characters one by one, then categorise them with the Character methods such as isDigit.

Regex

String borrows some convenience regex methods, such as split, matches, replaceAll and replaceFirst. Normally you would use the more efficient java.util.regex methods where you precompile your Pattern and reuse it. The String versions are for one-shot use where efficiency is not a concern.

Not only replaceAll but replace is implemented in an inefficient way, compiling a regex pattern every time it is invoked:

So, if you are going to use replace or replaceAll more than once, you should use a separate regex compile done only once.

Gotchas

Futures

I have three ideas to improve the efficiency of the way String is implemented:

  1. When a String is created the JVM (Java Virtual Machine) copies a char[] into a new char[] attached to the String. It does not simply put a reference to the char[] into the String because it is worried somebody might subsequently change the contents of the char[], thus violating the immutability contract of String. However, almost never does anyone change a char[] after feeding it to new String. I think the JVM could take advantage of that. If it knew for sure no one would change it, it could safely and rapidly just insert a reference. If it was not sure, it might just insert the reference anyway and put a lock on it, so that if anyone ever did try to change it, they would be blocked, the char[] could be copied and the write completed to the old char[] and the String could attach itself to the new copy of the char[].
  2. Interned Strings are dangerous. Both interned and non-Interned Strings have the same type — namely String. You must manually manage the intern method to control precisely when the interning is done. You must use either == or equals. If you get it wrong, there are no error messages, just puzzling results. So I suggest using an Interned separate type for interned Strings. You would use == both for interned and non-interned String compare. They are automatically interned as appropriate (managed in much the way hashCode is).
  3. There are two kinds of String:
    1. sequential: Strings you either treat as an atomic whole, treat char by char starting from the beginning.
    2. random: Strings you operate on randomly with charAt or substring.
    I suggest sequential Strings be stored internally in UTF-8 to conserve RAM (Random Access Memory). I suggest random Strings be stored in 32-bit UTF-32 code points for ease of processing. The compiler would have to guess which type a given String should be. It might have to change its mind and convert the String part way through run time. It might even store some Strings in both forms. The readers and writers when using UTF-8 should be able to take advantage of the fact no translation is necessary. You could have two separate types for sequential (UString) and random (String) in the Java language, but would be to onerous for programmers.

Learning More

Oracle’s Javadoc on String class : available:
Oracle’s Javadoc on StringBuffer class : available:
Oracle’s Javadoc on StringBuilder class : available:
Oracle’s Javadoc on StringJoiner class : available:

This page is posted
on the web at:

http://mindprod.com/jgloss/string.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\string.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[13.59.0.25]
You are visitor number