HTML Splitter & Boilerplate Refresher

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

HTML Documents, such as this one, tend to grow fat and unwieldy. At some point they need to be split into smaller documents. Have a look at methods.html. It has been split up, with a menu put on the front to link to the various pieces. Breaking up even one big document manually is at least a day’s work, by the time you get all the minor HTML adjustments done and proofread.

I would like to automate, or at least semi-automate the process. Here is how it would work. You take your monster document and insert magic tags in it showing how you want it split up. Then you run a utility that creates the new files.

There are features to deal with boilerplate, such as standard headers and footers and for customising those standard headers and footers.

The utility also generates you a menu that can be used to jump to all the different documents created.

SPLIT Tags

However, when you propagate boilerplate you want it slightly customised. You can arrange to get it customised by embedding magic comment tags that will be expanded when you run the REFRESH utility.

Customising your boilerplate: REFRESH Tags

Regenerating Boilerplate

You thus have the option of either boilerplate that is regeneratable with embedded REFRESH tags and that which should not be regenerated later, without such tags, presumably because you intend to hand customise it and don’t want your customisations overwritten.

that acts like a BEGIN REFRESH / END REFRESH pair. You would use it to get the boilerplate included in the first place, when you are not using SPLIT. It would be deleted after processing.

During SPLIT processing, all tags except SPLIT are ignored. They are just treated as ordinary text. They can be expanded/refreshed later with a REFRESH run. The SPLIT process leaves behind a line like this in each file which is useful in expansion of tags:

You can manually edit the INFO tag generated by SPLIT or insert it manually. This enables you to use the REFRESH utility without ever using the SPLIT utility.

THIS, MOTHER and ICON not in boiler plate text, are expanded once and cannot be refreshed since there is no tag left behind. If you want them refreshable, they must appear in boilerplate text, enclosed in REFRESH tags. In a pinch you could create a boilerplate file to include that had almost nothing in it but a THIS or MOTHER tag. It will be expanded in the context of the file where it finally appears.

Your head is probably hurting by now. The basic problem is how to run REFRESH multiple times. You want to get rid of what was included/expanded earlier, before you re-expand the tags. You need a way of identifying where earlier expansion material started and ended. You also must leave embedded notes around about how to re-expand. For your first cut, only worry about running REFRESH once and leave no trace of the tags. Once you have that working you may feel ready to tackle the problem of running REFRESH multiple times to freshen your boilerplate expansions.

Dealing With Broken Links

Manual Touchup

AutoGlue

I wrote a pair of split/glue utilities like this to help me manage Pascal source code on the PDP-11 many years ago. Except for fixing NAME links, this project is actually easier to code that to explain.

Implementation

REFRESH can be handled in a single pass. When you find an INFO tag you remember the information for use in later tag expansion. You discard the INFO tag itself. When you find an INCLUDE tag, you copy in data from the FROM file and recursively process it for embedded tags, including more INCLUDES. The only file you change is the one you are processing. You don’t refresh or expand the included boilerplate files themselves. You discard the INCLUDE tag itself. If you see a BEGIN REFRESH, you discard up to and including the corresponding END REFRESH. You may discard some nested BEGIN END pairs in the process! Then you treat like an INCLUDE. Don’t automatically generate new BEGIN END REFRESH pairs. If they are wanted, they will be inside the included text. If you see a THIS, ICON or MOTHER, replace it from the information in the most recent INFO tag. Discard the tag itself. This way refreshable boilerplate can be composed of refreshable boilerplate. When your refresh your documents, the latest and greatest will be recursively refreshed.

As a first cut, you might use Funduc Search and Replace as your scanning and replacing engine. The key is the *[] regular expression marker that will match anything, e.g. the old boilerplate sandwiched between two markers and the binary replace mode that lets you insert arbitrary multiline text.

Special Purpose Solutions

Outstanding Questions

Implementations

I have also done a some specific solutions to the splitter problem — to split up large glossaries, making each <dt> its own file.

	This page is posted on the web at:	http://mindprod.com/project/htmlsplitter.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\project\htmlsplitter.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.217.58]
Feedback	You are visitor number