A program that looks words up in a dictionary to check your spelling.
All prices are in
The JSpell
Java Spell Checker is a client/server spell checker where the dictionary
resides at the server. The JSpell product supports JSP, Servlets and standalone
Java applications.
for a single user license
for the platinum global site license.
Source Code
Spell Checker is specially designed for spell checking programs. It spell
checks string literals, code comments, variable names, class names, method names.
unlimited license.
WinterTree’s spell checker gradually loads the entire dictionary into RAM as it is
needed. 11 languages. Java interface.
to add spell checking to an Applet.
for source code.
for site licence.
Spelling, though now neglected by the education system, is more important than
ever. If you compose a web page, unless you spell the words correctly, including
proper names, they will not be properly indexed by the search engines. If you
compose programs with variable names incorrectly spelled, others will not be
able to remember them. If you post on the Internet, there is no secretary to
take dictation. Your spelling errors betray you as an ignoramus to all your
readers. They will dismiss your ideas before they even consider them.
A typo is a spelling mistake where it is clear you
know how to spell a work but you fingers fumbled and produced something weird
when typing, e.g. that for than. These are not quite as damaging
to your reputation as spelling errors, but they have most of the same
consequences.
Most word processors, email programs and newsreaders come with a built-in spell
checker. You still have to use it. They can’t catch errors such using your
for you’re. You have to train yourself to catch those manually.
Under the Hood
Conceptually, a spell checker is very simple. It has a list of correctly spelled
words. It goes through your document one by one looking up the word to see if it
is present in the list of correctly spelled words. The trick is to encode the
list in such a way it takes up little space on disk and in RAM and the lookup is
very fast. The spell checker can use some of the following techniques.
- The spell checker knows the frequency of use of each word. It can thus arrange
them in layers with common words kept in a special high speed cache.
- The words can be stored in alphabetical order, this mean the first few
characters are usually the same as the word before it in the list. Thus, there
is no need to explicitly store them.
- The frequency of letters is known. Thus it is possible to use a Huffman encoding,
using shorter bit patterns for common letters and letter pairs.
- There is no need to store a both a word an its plural if the plural follows one
of the standard patterns.
- The spell checker can cache words it has already checked in a document, or
earlier that day. Those words are more likely to recur.
- The main dictionary list is prepared and frozen well ahead of time. It does not
need the ability to add new words. You can use a another separate smaller
updatable exceptions dictionary for user-defined words. You know everything
there is to know about the master list. It will not change. It is completely
practical to have a computer spend hours and hours massaging and compressing the
list, looking for perfect hashes etc.
- Use of hashing. See Hashtable.
Rant
There are a number of problems with spell checkers.
- Every program uses a different spell checking engine. Not only do I have to
learn the quirks of multiple spell checkers, I must teach each one separately my
personal list of exception words that are legitimate, but are not in its
dictionaries.
- They pay no attention to context. They can’t catch my two most common:
confusing
it ⇔ in ⇔ is
and
that ⇔ than.
All the words are it the list of legitimate words, so
the spell checker does not notice if I accidentally substitute them in creative
ways. I make these errors commonly because the home row on the DSK
keyboard looks like this: AOEUI DHTNS with T
next to N and N next to S.
It needs to do a primitive grammar analysis to see a correctly spelled word
should ever occur surrounded by the other words in the context.
- I once had a very trying customer back in the days when Canadian Mind Products
built and repaired custom computers. I laughed and laughed when I noticed the
spell checker had corrected the spelling of her name to Enema. Had I not
been quite so alert, the invoice could have created quite an uproar. She would
never have believed me I did not insult her intentionally.
- Every time I spell check a web page, the spell checker makes me mark as OK the
same old exceptions time after time after time. These are document specific
words. I don’t want add them to the general dictionary.
What do we need to rectify these problems? In descending order of importance:
- We need a universal interface for spell checker plugins, much like JCE
or JavaMail. You can buy a high performance one,
plug it in, and it works identically with all apps. We should start with Java,
and later try to extend it to an all the apps on an OS.
- Spell checkers need to work anywhere and everywhere you edit text …
filling in forms, composing email, programming, browsing, chattering on Facebook…
all in exactly the same way.
- There need to be hierarchical exception lists of additional legitimate words.
Some words are universally ok, some ok just in the context of a certain document,
others only in the context of a sentence or even word instance.
- Hidden in the text needs to be embedded information about what checks have been
already done, or which parts of the document, by whom. That way you don’t
have to keep rechecking the same stuff over and over every time you make a tiny
change to the document. It also can be used to ensure you never export anything
without first spell checking it.
- Spell checkers need to be transparently collaborative. You should be able to
automatically submit your document to several automated checkers and/or
professional human proofreaders, then automatically compare the results and deal
only the discrepancies yourself. The various proofreading services (who might
just be friends you swap with the get fresher eyes), can work simultaneously,
and continuously as you edit your documents.