Strings are quite different from C++. They are
immutable, i.e. You can’t change the characters in a
string. To look at individual characters, you need to use charAt(). Strings in Java are 16-bit
Unicode. To edit strings, you need to use a StringBuffer
object or a char. In Java version 1.5 or later
you use StringBuilder, which works exactly like
StringBuffer, but it is faster and not thread-safe.
You get the size of a String (length in chars) with
String. length(), not.length or. size() used in other
For manipulating 8-bit characters, you want an array of
bytes — byte.
There are three types of empty string, null,
" and ". Here is how to check for each
if ( "abc".equals (s) ) echo ( "matched" );
if ( s.equals ( "abc" ) ) echo ( "matched" );
because the first form won’t raise an exception if s
is null. It will treat the strings as not equal.
Unless Strings have been interned, with String.intern(), you cannot compare them for equality with
==. You have to use equals()
The compiler will not warn you if you inadvertently use ==. Unfortunately, the bug may take a long time to surface if your
compiler or virtual machine is doing transparent interning. Interning gets you a
reference to the master copy of a String. This allows the
duplicates to be garbage collected sooner. However, there are three disadvantages to
If you want to compare for < or > you cannot use the usual comparison
operators, you have to use compareTo() or compareToIgnoreCase() instead.
- It takes extra time to look up the master string in a Hashtable.
- In some implementations, you can have a maximum of 64K interned Strings.
- In some implementation, interned Strings are never
garbage collected, even when they are no longer used. The interning process itself
acts as a packratter. The answer is to implement them with weak references.
- String comparision does not logically trim leading and trailing whitespace
before compare. If you want that effect use.trim().
String s = "apple";
String t = "orange";
if ( s.compareTo(t) < 0 )
out.println( "s < t" );
compareTo will return:
You can think of it roughly like treating the Strings as numbers and returning
- some positive number if string s lexically comes after t.
- 0 if s is the same as t.
- some negative number if s sorts earlier than t.
Novices might be astonished by the following results:
When you write your own classes, the default Object.equals does not do a field by field
comparison. You have to write your own version of equals
to get that effect. The default version simply tests the equality of the two
references — that they both point to the same object.
- abc.compareTo( ABC)
returns abc > ABC. compareTo is
- abc .compareTo ( abc(
returns abc >
abc. Blanks are treated
like any other character.
- "".compareTo( null) raises a java.lang.NullPointerException.
- "" is not the same thing as null. Most
String functions will be happy to handle "",
but very few will accept null.
- The comparison is done by straightforward Unicode numeric character by
character comparison. There is no adjustment for locale collating sequence.
Case-Sensitive and Case-Insensitive Comparison
Your basic tools are indexOf and lastIndexOf. They both have
variants with a base fromOffset where to start searching.
The result is relative to the start of the entire String, not the fromOffset. The common15 package
contains a StringSearch class that will search for many
different strings. These searches are all case-sensitive. To get case-insensitive
searches, convert both Strings to all upper case or all lower case first. You must
There are variants of the methods that search for a single character. These are
faster than the equivalent methods that look for a 1-character String. It would be
nice if the compiler were smart enough to optimise a 1-character String constant to a char as the
parameter of indexOf. You can abbreviate: x.indexOf( y )
>= 0 as x.contains ( y ).
Strings are immutable. Therefore they can be
reused indefinitely, and they can be shared for many purposes. When you assign one
String variable to another, no copy is made. Even when you take a substring there is
no new String created, though a new String descriptor is. New Strings are created
- you concatenate.
- you read Strings from files.
- you foolishly use
new String(String);. There
is one situation where its use is legit. See substring for the explanation.
- you use new String( somethingElse ) ; for
- You use StringBuffer/StringBuilder toString/substring.
Every Object has a method
called toString that makes some sort of attempt to
convert the contents of the Object into human-readable
form as a Unicode String for display. Normally, when you
write a new class, you write you own corresponding toString method for it, even if just for debugging.
You use it like this: String toShow = myThing. toString();
The default Object.toString
method is not very clever. It does not display all the primitives in
your class with field names as you might expect. If you want that, you must code it
yourself. A default toString will typically, instead, do
something lame like dump the hashCode or the Object’s address — only mildly interesting.
toString has a magical property. It appears to get
invoked automatically to convert to String without you
having to mention toString.
- In one case, System.out.println (and brothers), it is not
really magic. println pulls it off with a plethora of
overloaded methods. println has many overloaded methods, one for each of the
primitive types, and then each
overloaded method converts its primitive parameter to a String for you, and passes that on to the variant of println that can only handle Strings.
But, you say, (glad to see you are so attentive), primitives don’t have a
toString method! That is true, but there are
static conversion methods to get that effect, such as String. valueOf( double ). For any Object other than a
String, println invokes
the Object’s usually-overridden custom toString
method and passes the result on to the String-eating
version of println.
- When you use concatenation, toString truly does get
called for you magically, sometimes. If ever you try to add two Objects, Java presumes you are really trying to concatenate them and
transparently calls each of their toString methods and
concatenates the results giving a String. It even works
when you try to add a String and a primitive.
Concatenation will convert the primitive to a String for
you and concatenate the results, transparently. This can lead to surprising results.
( char target, char replacement ) is considerably faster
than String. replace(
String target, String replacement ). Both replace
all occurrences. So if you are replacing just a char, use single quotes. Unforunately, String. replace( String target, String replacement ) is only available
in Java version 1.5 or later.
regex, String replacement ) also replaces all instances. The difference is,
replaceAll looks for a regex String not a simple String. Beware of using
regex, String replacement) when you meant replace(
String target, String replacement ). The second parameter is
not just a simple String. String. replaceAll behaves like
$ is a reference to a captured String in the search
pattern and \ is the regex quote character, meaning
literal \ must be coded as \\\\ and literal $ as \\$.
regex, String replacement ) also takes a regex. There is no replaceFirst that takes only a simple String.
in the Javadoc is shown with CharSequence parameters.
Don’t let this frighten you. String implements
CharSequence, so replace
works fine on Strings. replace
works on some other things as well such as StringBuilders.
You can use
com.mindprod.common17.ST. isLegal to ensure a String contains only
the characters you consider legal. You can download it. It is pretty simple, using indexOf on the legal String.
You can also use charAt to extract the characters one
by one, then categorise them with the Character methods
such as isDigit.
String borrows some convenience
regex methods, such as split, matches, replaceAll and replaceFirst. Normally you
would use the more efficient java.util.regex methods
where you precompile your Pattern and reuse it. The
String versions are for one-shot use where efficiency is
not a concern.
Not only replaceAll but replace is implemented in an inefficient way, compiling a regex
pattern every time it is invoked:
So, if you are going to use replace or replaceAll more than once, you should use a separate regex compile
done only once.
- String.replaceAll( a, b ) is
not the method to use to replace all instances of b in a.
Instead you use String. replace ( a, b ). replaceAll is a
convenience regex method.
( a, b ) does not modify a. It creates a new modified String. This is true of all String
methods. Strings are immutable. No method can modify the
- Consider lastIndexOf( s,
fromIndex). fromIndex is the
offset near the end of the string where to start searching backwards for a match
earlier in the string. It is not the index of a substring to search, i.e. the place
where the reverse searching stops.
Oracle’s Javadoc on String
class : available:
Oracle’s Javadoc on StringBuffer
class : available:
Oracle’s Javadoc on StringBuilder
class : available:
Oracle’s Javadoc on StringJoiner
class : available: