image provider

LazyString


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

LazyString is like a cross between String and StringBuffer especially good for processing long String s such as in-RAM copies of files. It would be particularly efficient at doing a series of bulk search and replace operations.

Why lazy?

  1. It procrastinates the work of copying characters about. In the process, it saves considerable work copying and creating String and StringBuffer objects.
  2. LazyString does almost no work itself. It fobs the bullwork off on its older brothers String and StringBuffer .
How does it work? It maintains the String internally as a LinkedList of String fragments. Each fragment contains a reference to a String, a starting point in the String and the ending point in the String, i.e. the parameters you would pass to String.substring to access a piece of some bigger String..

Only when you are done and use the LazyString.toString method do you collect all the fragments together contiguously and convert it to a String. You implement that with a series of StringBuffer.append s and a StringBuffer.toString. I refer to this process as collapsing. LazyString is at liberty to at any time collapse the entire String or any part of it. It might do this when the ratio of number of fragments to the total size of the fragments is greater than some magic number where the optimal value of that number is found by experiment. (It would neat to have a general purpose HotSpot-style optimiser that would tweak such constants to suit the current data. It could then apply what it had learned from previous experiments to instantly set the tweaking constants for any given set of data.) You should make the magic number optionally configurable in the constructor because it depends on just which methods you use most. By setting the number to 0, you effectively make it act like a simple String. You might use -1 as a special code to make it behave like a simple StringBuffer.

The overhead of using LazyString does not pay back unless you are handling large String s, but it should be orders of magnitude faster then.

You want to implement as many of the methods of String and StringBuffer as you have energy for. Don’t make the LazyString class final unless you provide source. Others may want to add their own convenience methods. See the StringBuffer student project.

Comparison of LazyString String and StringBuffer
Method LazyString String StringBuffer Notes
append very fast very slow slow To append, i.e. concatenate, String allocates a new StringBuffer containing an array of characters and copies character by character to the tail end of the array. LazyString just adds a node to the end of the LinkedList .
substring fast very fast slow String does not copy any characters to implement substring. LazyString has to copy the list of nodes comprising the substring, but not the characters themselves.
indexOf(char) very slow slow slow LazyString fobs the work off on String.indexOf for each fragment.
indexOf(String) very slow slow slow LazyString fobs the work off on String.indexOf for each fragment. Dealing with finding strings that span fragments is difficult, but necessary. Don’t be tempted to implement this with String.toString then use String.indexOf or you defeat the point of LazyString. If you want to get clever, use Boyer Moore.
charAt(int) fast very fast very fast LazyString fobs the work off on String.charAt for each fragment. To find the right fragment, LazyString needs to construct an array of bands to track which part of the String is covered by each fragment and pass that to java.util.Arrays.binarySearch. You can procrastinate creating the array until it is needed. Alternatively you might use a TreeMap to track the fragments and find the one corresponding to a given offset.
concatenation
Rope class

This page is posted
on the web at:

http://mindprod.com/project/lazystring.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\lazystring.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[54.166.89.187]
You are visitor number