substring : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)

substring
String.substring( int start, int end) is Java’s way of making a copy of a piece of a String. Note it is spelled out substring not the usual substr. Java is different from most other languages in that you specify the end point, not the length of the substring. The offsets are 0-based, i.e. the first character of the String is character 0. To further confuse you, the end points one character past the end of the String. It is perhaps best to think of it this way. Imagine little vertical bars separating the characters of the String, with a bar on the beginning and end of the String as well. You feed substring the start and end vertical bar numbers that enclose the substring you want.
0_1_2_3_4_5
|h|e|l|l|o|
0_1_2_3_4_5
"hello".substring( 1 ,3 ) == "el";

substring is clever. In JDK (Java Development Kit) 1.0 to 1.6 it did not make a deep copy of the substring the way most languages do. It just creates a pointer into the original immutable String, i.e. points to the value char[] of the base string and tracks the starting offset where the substring starts and count of how long the substring is. This could be confusing if you were low-level debugging since you would see the whole String, not just the substring. There were reports of a bug in Microsoft’s implementation of substring. The downside of this cleverness is a tiny substring of a giant base String could suppress garbage collection of that big String in memory even if the whole String were no longer needed. (actually its value char[] array is held in RAM; the String object itself could be collected.)

Starting with JDK 1.7, Oracle stopped using this dodge. Now substring makes an independent copy of the substring and and does not pin the base String in RAM (Random Access Memory). This made most of my programs run more slowly from the overhead of copying and the overhead of the extra indepedent substring objects. It is a tradeoff. The JDK 1.6 way is better if you want to keep the base string around for more substrings. The JDK 1.7 way is better if you don’t need the base string any more and want it quickly garbage collected. Unfortunately there is no way to configure JDK 1.7+ to use the JDK 1.6 implementation. The new implementation simplifiees all the native String methods since they need no longer concern themselves with the offset. The offset is an implied 0.

It is probably still a good idea to use indexOf( lookFor, offset ) rather than creating a substring first and using indexOf( lookFor ) on that.

If you know a tiny substring is holding a giant string in RAM, that would otherwise be garbage collected, you can break the bond by using littleString = new String( littleString ) which will create a new smaller backing char[] with no ties to the original String.

If you are a curious sort and study the code for String. substring in src.zip, this sharing logic might not be apparent. The key is a non-public String constructor that takes parameters in the reverse of the usual order String (int offset, int count, char value[]).


This page is posted
on the web at:

http://mindprod.com/jgloss/substring.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\substring.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[3.144.251.225]
You are visitor number