Strings are quite different from C++. They
are immutable, i.e. You can’t change the
characters in a string. To look at individual characters, you need to use charAt().
Strings in Java are 16-bit Unicode. To edit strings, you need to use a StringBuffer
object or a char[]. In JDK 1.5+
you use StringBuilder, which works exactly like StringBuffer,
but it is faster and not thread-safe.
You get the size of a String (length in chars) with String.
length(), not .length or . size()
used in other classes.
For manipulating 8-bit characters, you want an array of bytes — byte[].
Empty Strings
There are three types of empty string, null, "" and " ".
Here is how to check for each flavour:
String Comparison
The following form:
if ( "abc".equals (s) ) echo ( "matched" );
is preferable to:
if ( s.equals ( "abc" ) ) echo ( "matched" );
because the first form won’t raise an exception if s
is null. It will treat the strings as not equal.
Unless Strings have been interned, with String.intern(),
you cannot compare them for equality with ==. You
have to use equals() instead.
The compiler will not warn you if you inadvertently use ==.
Unfortunately, the bug may take a long time to surface if your compiler or
virtual machine is doing transparent interning. Interning gets you a reference
to the master copy of a String. This allows the
duplicates to be garbage collected sooner. However, there are three
disadvantages to interning:
- It takes extra time to look up the master string in a Hashtable.
- In some implementations, you can have a maximum of 64K interned Strings.
- In some implementation, interned Strings are never
garbage collected, even when they are no longer used. The interning process
itself acts as a packratter. The answer is to implement them with weak
references.
If you want to compare for < or > you cannot use the usual comparison
operators, you have to use compareTo() or compareToIgnoreCase()
instead.
String s = "apple";
String t = "orange";
if ( s.compareTo(t) < 0 )
{
System.out.println( "s < t" );
}
compareTo will return:
- some positive number if string s lexically comes after t.
- 0 if s is the same as t.
- some negative number if s sorts earlier than t.
You can think of it roughly like treating the Strings as numbers and returning s-t.
Novices might be astonished by the following results:
- "abc".compareTo(
"ABC") returns
"abc" > "ABC".
compareTo is case sensitive.
- "abc ".compareTo
( "abc"(
returns "abc " > "abc".
Blanks are treated like any other character.
- "".compareTo(
null) raises a java.lang.NullPointerException.
- "" is not the same thing as null. Most
String functions will be happy to handle "",
but very few will accept null.
- The comparison is done by straightforward Unicode numeric character by character
comparison. There is no adjustment for locale collating sequence.
When you write your own classes, the default Object.equals
does not do a field by field comparison. You have to write your own
version of equals to get that effect. The default
version simply tests the equality of the two references — that they both
point to the same object.
Case-Sensitive and Case-Insensitive Comparison
Searching
Your basic tools are indexOf and lastIndexOf.
They both have variants with a base fromOffset where to
start searching. The result is relative to the start of the entire String, not
the fromOffset. The
common15 package contains a StringSearch
class that will search for many different strings. These searches are all case-sensitive.
To get case-insensitive searches, convert both Strings to all upper case or all
lower case first. You must be careful here since conversion can change String
lengths.
There are variants of the methods that search for a single character. These are
faster than the equivalent methods that look for a 1-character String. It would
be nice if the compiler were smart enough to optimise a 1-character String
constant to a char as the parameter of indexOf.
Creating Strings
Strings are immutable. Therefore they can be reused indefinitely, and they can
be shared for many purposes. When you assign one String variable to another, no
copy is made. Even when you take a substring there is no new String created,
though a new String descriptor is. New Strings are created when:
- you concatenate.
- you read Strings from files.
- you foolishly use new
String(String);. There is one situation where its use is legit. See substring
for the explanation.
- you use new String(
somethingElse ) ; for conversion.
- You use StringBuffer/StringBuilder toString/substring.
toString
Every Object has a method called toString
that makes some sort of attempt to convert the contents of the Object
into human-readable form as a Unicode String for
display. Normally, when you write a new class, you write you own corresponding toString
method for it, even if just for debugging.
You use it like this: String toShow
= myThing. toString();
The default Object.toString
method is not very clever. It does not display all the primitives in your
class with field names as you might expect. If you want that, you must code it
yourself. A default toString will typically, instead,
do something lame like dump the hashCode or the Object’s
address — only mildly interesting.
toString has a magical property. It appears to get
invoked automatically to convert to String without
you having to mention toString.
- In one case, System.out.println
(and brothers), it is not really magic. println
pulls it off with a plethora of overloaded methods. println
has many overloaded methods, one for each of the primitive
types, and then each overloaded method converts its primitive parameter to a String
for you, and passes that on to the variant of println
that can only handle Strings. But, you say, (glad to
see you are so attentive), primitives don’t have a toString
method! That is true, but there are static conversion
methods to get that effect, such as String. valueOf(
double ). For any Object
other than a String, println
invokes the Object’s usually-overridden custom toString
method and passes the result on to the String-eating
version of println.
- When you use concatenation, toString truly does get
called for you magically, sometimes. If ever you try to add two Objects,
Java presumes you are really trying to concatenate them and transparently calls
each of their toString methods and concatenates the
results giving a String. It even works when you try
to add a String and a primitive. Concatenation will
convert the primitive to a String for you and
concatenate the results, transparently. This can lead to surprising
results.
Replace
String.replace(
char target, char
replacement ) is considerably faster than String.
replace( String target,
String replacement ). Both
replace all occurrences. So if you are replacing just a char,
use single quotes. Unforunately, String. replace(
String target, String
replacement ) is only available in JDK 1.5+.
replaceAll( String regex,
String replacement ) also
replaces all instances. The difference is, replaceAll
looks for a regex String not a simple String.
Beware of using replaceAll( String
regex, String replacement)
when you meant replace( String
target, String replacement
). The second parameter is not just a simple String. String.
replaceAll behaves like Matcher.
replaceAll. $ is a
reference to a captured String in the search pattern and \
is the regex quote character, meaning literal \
must be coded as \\\\ and literal $
as \\$.
replaceFirst( String regex,
String replacement ) also
takes a regex. There is no replaceFirst that takes
only a simple String.
String.replace
in the Javadoc is shown with CharSequence
parameters. Don’t let this frighten you. String
implements CharSequence, so replace
works fine on Strings. replace
works on some other things as well such as StringBuilders.
Validating
You can use com.mindprod.common11.StringTools. isLegal
to ensure a String contains only the characters you
consider legal. You can download it. It
is pretty simple, using indexOf on the legal String.
You can also use charAt to extract the characters
one by one, then categorise them with the Character
methods such as isDigit.
Regex
String borrows some convenience regex
methods, such as split, matches,
replaceAll and replaceFirst.
Normally you would use the more efficient java.util.regex
methods where you precompile your Pattern and reuse
it. The String versions are for one-shot use where
efficiency is not a concern.
Not only replaceAll but replace
is implemented in an inefficient way, compiling a regex pattern every time it is
invoked:
So, if you are going to use replace or replaceAll
more than once, you should use a separate regex compile done only once.
Gotchas
- String.replaceAll(
a, b ) is not the method to use to replace all instances of b in a.
Instead you use String. replace
( a, b ). replaceAll is a convenience regex
method.
- String.replace
( a, b ) does not modify a. It creates a new modified String.
This is true of all String methods. Strings
are immutable. No method can modify the original String.
- Consider lastIndexOf( s, fromIndex).
fromIndex is the offset near the end of the string
where to start searching backwards for a match earlier in the string. It is not
the index of a substring to search, i.e. the place where the reverse searching
stops.
Learning More
Sun’s Javadoc on
String class : available:
Sun’s Javadoc on
StringBuffer class : available:
Sun’s Javadoc on
StringBuilder class : available: