Strings are quite different from C++. They are immutable, i.e. You can’t change the characters in a string. To look at individual
characters, you need to use charAt(). Strings in Java are 16-bit Unicode. To edit strings, you need to use a StringBuffer
object or a char[]. In Java version 1.5 or later you use StringBuilder, which works exactly like StringBuffer, but it is faster
and not thread-safe.
You get the size of a String (length in chars) with String. length(), not .length or.
size() used in other classes.
For manipulating 8-bit characters, you want an array of bytes —
byte[].
Empty Strings
There are three types of empty string, null, " and ". Here is how to check for each
flavour:
String Comparison
The if ( "abc".equals (s) ) echo ( "matched" );
is if ( s.equals ( "abc" ) ) echo ( "matched" );
because the first form won’t raise an exception if s is null. It will treat the
strings as not equal.
Unless Strings have been interned, with String.intern(), you
cannot compare them for equality with ==. You have to use equals() instead.
The compiler will not warn you if you inadvertently use ==. Unfortunately, the
bug may take a long time to surface if your compiler or virtual machine is doing transparent interning. Interning
gets you a reference to the master copy of a String. This allows the duplicates to be
garbage collected sooner. However, there are three disadvantages to interning:
- It takes extra time to look up the master string in a Hashtable.
- In some implementations, you can have a maximum of 64K interned Strings.
- In some implementation, interned Strings are never garbage collected, even when
they are no longer used. The interning process itself acts as a packratter. The answer is to implement them
with weak references.
If you want to compare for < or > you cannot use the usual comparison operators, you have to use
compareTo() or compareToIgnoreCase() instead.
String s = "apple";
String t = "orange";
if ( s.compareTo(t) < 0 )
{
System.out.println( "s < t" );
}
compareTo will return:
- some positive number if string s lexically comes after t.
- 0 if s is the same as t.
- some negative number if s sorts earlier than t.
You can think of it roughly like treating the Strings as numbers and returning s-t.
Novices might be astonished by the following results:
- abc.compareTo(
ABC) returns
abc > ABC. compareTo is case sensitive.
- abc .compareTo ( abc( returns abc > abc. Blanks are treated like any
other character.
- "".compareTo(
null) raises a java.lang.NullPointerException.
- "" is not the same thing as null. Most String functions will be happy
to handle "", but very few will accept null.
- The comparison is done by straightforward Unicode numeric character by character comparison. There is no
adjustment for locale collating sequence.
When you write your own classes, the default Object.equals does not
do a field by field comparison. You have to write your own version of equals to get
that effect. The default version simply tests the equality of the two references — that they both point to
the same object.
Case-Sensitive and Case-Insensitive Comparison
Searching
Your basic tools are indexOf and lastIndexOf. They both
have variants with a base fromOffset where to start searching. The result is relative to
the start of the entire String, not the fromOffset. The common15 package contains a StringSearch class that will search for many different strings. These searches are all
case-sensitive. To get case-insensitive searches, convert both Strings to all upper case or all lower case first.
You must be
There are variants of the methods that search for a single character. These are faster than the equivalent
methods that look for a 1-character String. It would be nice if the compiler were smart enough to optimise a
1-character String constant to a char as the parameter of
indexOf. You can abbreviate: x.indexOf( y ) >= 0 as x.contains ( y ).
Creating Strings
Strings are immutable. Therefore they can be reused indefinitely, and they can be shared for many purposes. When
you assign one String variable to another, no copy is made. Even when you take a substring there is no new String
created, though a new String descriptor is. New Strings are created when:
- you concatenate.
- you read Strings from files.
- you foolishly use new String(String);. There is one situation where its use is legit. See substring for the explanation.
- you use new String( somethingElse ) ; for conversion.
- You use StringBuffer/StringBuilder toString/substring.
toString
Every Object has a method called toString that makes some
sort of attempt to convert the contents of the Object into human-readable form as a
Unicode String for display. Normally, when you write a new class, you write you own
corresponding toString method for it, even if just for debugging.
You use it like this: String toShow = myThing. toString();
The default Object.toString
method is not very clever. It does not display all the primitives in your class with field names
as you might expect. If you want that, you must code it yourself. A default toString
will typically, instead, do something lame like dump the hashCode or the Object’s address — only mildly interesting.
toString has a magical property. It appears to get invoked automatically to
convert to String without you having to mention toString.
- In one case, System.out.println (and brothers), it is not really magic. println pulls it
off with a plethora of overloaded methods. println has many overloaded methods, one for each of the primitive types, and then each overloaded method converts its primitive parameter to a
String for you, and passes that on to the variant of println that can only handle Strings. But, you say, (glad to see you
are so attentive), primitives don’t have a toString method! That is true, but
there are static conversion methods
to get that effect, such as String. valueOf(
double ). For any Object other than a String, println invokes the Object’s usually-overridden custom
toString method and passes the result on to the String-eating version of println.
- When you use concatenation, toString truly does get called for you magically,
sometimes. If ever you try to add two Objects, Java presumes you are really trying
to concatenate them and transparently calls each of their toString methods and
concatenates the results giving a String. It even works when you try to add a
String and a primitive. Concatenation will convert the primitive to a String for you and concatenate the results, transparently. This can lead to surprising results.
Replace
String.replace( char target, char replacement ) is considerably faster than String. replace( String target, String replacement ). Both replace all occurrences. So
if you are replacing just a char, use single quotes. Unforunately, String. replace( String target, String replacement ) is only available
in Java version 1.5 or later.
replaceAll( String regex,
String replacement ) also replaces all instances. The
difference is, replaceAll looks for a regex String not a
simple String. Beware of using replaceAll( String regex, String replacement) when you meant replace( String
target, String replacement ). The
second parameter is not just a simple String. String.
replaceAll behaves like Matcher. replaceAll. $ is a reference to a captured String in the search
pattern and \ is the regex quote character, meaning literal \ must be coded as \\\\ and literal $
as \\$.
replaceFirst( String regex,
String replacement ) also takes a regex. There is no
replaceFirst that takes only a simple String.
String.replace
in the Javadoc is shown with CharSequence parameters. Don’t let this
frighten you. String implements CharSequence, so
replace works fine on Strings. replace works on some other things as well such as StringBuilders.
Validating
You can use com.mindprod.common11.StringTools. isLegal to ensure a
String contains only the characters you consider legal. You can download it. It is pretty simple, using indexOf
on the legal String.
You can also use charAt to extract the characters one by one, then categorise them
with the Character methods such as isDigit.
Regex
String borrows some convenience regex methods,
such as split, matches, replaceAll and replaceFirst. Normally you would use the more
efficient java.util.regex methods where you precompile your Pattern and reuse it. The String versions are for one-shot use where
efficiency is not a concern.
Not only replaceAll but replace is implemented in an
inefficient way, compiling a regex pattern every time it is invoked:
So, if you are going to use replace or replaceAll more
than once, you should use a separate regex compile done only once.
Gotchas
- String.replaceAll( a, b ) is
not the method to use to replace all instances of b in a. Instead you use String. replace ( a, b ). replaceAll is
a convenience regex method.
- String.replace
( a, b ) does not modify a. It creates a new modified String. This is true of all
String methods. Strings are immutable. No method can
modify the original String.
- Consider lastIndexOf( s, fromIndex). fromIndex is the offset near the end of the string where to
start searching backwards for a match earlier in the string. It is not the index of a substring to search, i.e.
the place where the reverse searching stops.
Learning More
Oracle’s Javadoc on
String class : available:
Oracle’s Javadoc on
StringBuffer class : available:
Oracle’s Javadoc on
StringBuilder class : available: