Internationaliser

Internationaliser


This essay is about a suggested student project in Java programming. This essay gives an overview of how it might work. It does not describe an actual complete program. Unlike most of my student projects, I am in the process of implementing this one myself. I am developing a specification for it here. If you are interested in the final product when it is ready, please let me know.
Internationaliser Overview Task SQL Table Icons
How It Works Task Item SQL Table New Icons
Terminology Assignment SQL Table Icon Naming
SQL Database Tables Components Email
People SQL table Reporter Billing
Roles SQL table Single Language View Global Properties
Projects SQL Table Multi-Language View System Requirements
Project Locale SQL Table Walk Through Not Included
Resource Bundles SQL Table The Manager Website Translation
Bundle SQL Table Manager : People Possible Extras
Context SQL Table Manager : Projects Phases
Programmer Comment SQL Table Manager : Project Locales To Come
Translator Comment SQL Table Firewalls Links
Translation SQL Table Variable Text

Internationaliser Overview

The goal of this project is to create a multi-user tool to internationalise computer programs to allow them to run in a variety of languages. It requires writing several types of code: This project might be a suitable team project.

How It Works

Programmers use ResourceBundles in their code. They code like this:
// using a key string to select the String in the appropriate language
okButton = new JButton( myResources.getString( "OkButtonLabel" ));
getString looks up the key OKButtonLabel to get the localised string for the button in the current locale/language. American English is considered a different language from Canadian English. Both are treated identically to the foreign languages. A ResourceBundle handles the translations for one class, a part of a class or several classes, for one language. Java is clever enough to select the best fit ResourceBundle for the given country and language locale.

Professional translators, living anywhere on the planet provide translations in various languages for the key.

An SQL database coordinates everything, allowing simultaneous access by programmers and translators. It also allows custom reporting, custom administration tasks, ad-hoc database correction, and additional fields added to the various tables for custom applications.

To be fully international, the database is stored in UTF-8 format. The source code is presumed to be stored in UTF-8. The bundle files are 8859_1 encoding with special \uxxxx encodings for Unicode, as per Oracle specifications.

Terminology

Bundle
A class or properties file that handles the translation to one locale — language/country/variant combination. The whole point of this project is to create these bundle properties files.
drill down
To navigate a tree from the root to the leaves, choosing ever finer level of detail, e.g. first task, then within that project, then within that resource bundle, then within that locale, then within that translation item.
locale
A triple, language, country and variant, using the ISO codes, where the country and variant can be left blank, e. g. sr_YU_CYR for Serbian spoken in Yugoslavia.
project tree
You can think of a project as like a taxonomy tree with 4+ levels, a way for the administrator or a proofreader to get an overview of the work:
  1. project
  2. ResourceBundle
  3. translation item. The key itself can contain dots which defines further levels of hierarchy.
  4. locale(language/country/variant)
the user can navigate the tree, gradually opening up detail to find any translation item.
ResourceBundle
A set of classes/properties files that share the same base name and the same set of translation keys. Typically a ResourceBundle would handle the all the language/country/variant translations for one package or one class.
task
A unit of work to be completed by a translator or a proofreader. Most commonly it would be all the translations for a given bundle. However, it could be a subset of the items in a bundle, or it could contain items from several different bundles. A translator may have several different outstanding tasks, and can choose which one to work on. There may be several people assigned to the same task. A translation item may appear in several different tasks. Tasks only apply to translators and proofreaders, not programmers. Programmer task scheduling needs something much more elaborate like Jira.
task tree
You can think of task as like a taxonomy tree with 5+ levels, a way to organise the work for a single translator:
  1. task
  2. project
  3. ResourceBundle
  4. locale/bundle (language/country/variant)
  5. translation item. The key itself can contain dots which defines further levels of hierarchy.

the translator can navigate the tree, gradually opening up detail to find any translation item. Note how the last two levels are reversed from the way the Project view shows them. Here is a rough simulation of how it will work. Try double clicking the  node folder icon  node folder icons or clicking the  tree handle icon  node handle icons. In the real system, each project will have its own icon, and each locale will be represented by a  Serbian flag  flag or similar symbol.

translation item
One translation of a key and its translation into one particular language/locale, and the associated comments.
translation key
Java translates from a short string to the appropriate locale text. The short string is the translation key, e. g. The key can contain dots. These indicate levels of hierarchy.

SQL Database Tables

These SQL tables have been defined for MySQL 5.0 for my own implementation:

People SQL table

information about translators, proofreaders and programmers.

Roles SQL table

Information on capabilities of people as translators, proofreaders and programmers. This duplicates information in the people table, but it is required for Tomcat authentication.

Projects SQL Table

a product that requires internationalisation.

Project Locale SQL Table

the locales that will could potentially get translations for this project.

Resource Bundles SQL Table

a group of translations to be done for all languages.

Bundle SQL Table

a group of translations to be done for a particular locale/language.

Context SQL Table

Screenshot and thumbnail of the translation in context.

Programmer Comment SQL Table

Comments by a programmer about a particular translation item. Applies to all locales.

Translator and Proofreader Comment SQL Table

Comments by a translator or proofreader about a particular translation item in particular locale/language.

Translation SQL Table

Individual items to be translated.

Task SQL Table

What blocks of work are there to be done?

Task Item SQL Table

What work is each task composed of?

Assignment SQL Table

Who is assigned which tasks?

Components

  1. The Manager handles the administrative records in the database, such as records for each translator, each project and each bundle.
  2. The Import can extract the translation keys, programmer comments and associated ResourceBundle names from the Java source code. The Parser is a command line utility where you specify: the name of the project, ResourceBundle and directory tree to be scanned for source files. The extracted data goes straight into the database. Import can also import a bundle.properties file, which contains translations done through some means other than this program.
  3. The Export can go through the database and create the ResourceBundles for a given project. The generated resource bundles are deposited into a file tree organised by project/package/ResourceBundle/locale. From there, programmers can copy them into their debugging or production trees. Export is a command line utility where you specify simply the name of the project, to generate the whole project or project and ResourceBundle to regenerate just one ResourceBundle.
  4. The Reporter gives status of a project so that you can see at a glance just what translating/proofreading work still needs to be done. It is a GUI that lets you see summary stats for either a particular project, a particular resource bundle, a particular bundle or particular task. The screen might look something like this:

    Reporter

    user: RG  bundle: com.mindprod.nova.HybridVehicles  language: sr  country: YU  variant:CYR
    Complete Count Stage Meaning
    20 U count of how many translation items in the group are still Untranslated.
    0 ? count of how many translation items in the group are still unsure.
    321 T count of how many translation items in the group are Translated, but not yet proofread.
    3 P count of how many translation items in the group are Proofread individually but not yet in context.
    12 C count of how many translation items in the group are Complete, proofread both individually and in context.
    359 any Total translation items in Group.

  5. The Editor lets a translator see what work she has to do, then select a ResourceBundle and locale to work on. She will see the translation keys and the language she is translating into, along with the comments from programmers, translators and proofreaders about each line. If she wants to compare translations in two languages at once, she will run two copies of the editor, each working on a different language, tiled on screen to make them simultaneously visible. She can then edit the translations or add comments. In the background as she is keying, the JWS (Java Web Start) application sends the data field by field over a socket to a the server as HTTP (Hypertext Transfer Protocol) POST transactions which then updates the database. There is a multithreaded queue mechanism so that no matter how fast she types she does not have to wait for the transmissions to and from the database to catch up. Her JWS program every 60 seconds (globally configurable) or automatically puts in a query to the database for any recent changes made by others. These appear on her screen automatically. She can sort by the various columns to make it easy to find items yet to translate, questionable items etc.

    The screen might look something like this:


    Translation Editor: Single Language View (multiple keys)

    user: RG  bundle: com.mindprod.nova.HybridVehicles  language: fr  country: FR  variant:__

    Stage

    Key

    Translation

    Comments

    Prg

    Trn

    Prf

    Changed
    ? About A propos de l’application DRF: {Application} (item in Help menu)
    RAJ: à grande vittesse, s’il vous plait.
    We need this by Monday.

    RG: Is this too long?
    DRF RG __ 2006-01-12
    T Apply Appliquer DRF: button DRF RG __ 2006-01-12
    P Cancel Annuler FR: Yes, use infinitive, not imperative. DRF RG FR 16:00
    U Continue __ DRF: button in Error alert box DRF __ __ 2005-12-31
    ? Hit F4 to stop Appuyer sur la touche F4 pour arreter. DRF: function key
    RG: hit or press and hold?
    DRF GRH __ 2005-12-01

    Stage Code Legend
    Code Meaning
    U Untranslated
    ? unsure of translation
    T Translated
    P Proofread
    C Complete
    Sorting Code Legend
    Code Meaning
    ascending sort on this column
    descending sort on this column
    For Comments, sort is by date/time of most recent comment.
    The red arrow indicates how things are sorted right now.
    The translator can enter data in the brown areas. The blue areas are read-only. Prg is the programmer. Trn is the translator. Prf is the proofreader. My goal is to use screen real estate efficiently. The date/time of the last update shows as a date for previous days, but as 24-hour local time for today’s changes.

    Translation Editor: Multi-Language View (single key)

    dummy thumbnail user: RG  bundle: com.mindprod.nova.HybridVehicles
    key: about

    Stage

    Locale

    Translation

    Comments

    Prg

    Trn

    Prf

    Changed
    T de_DE Anwendungsinfo DRF: button DRF RG __ 2006-01-12
    ? fr_FR A propos de l’application DRF: button
    RAJ: à grande vittesse, s’il vous plait.
    We need this by Monday.

    RG: Is this too long?
    DRF RG __ 2006-01-12
    P it_IT Informazioni sull’applicazione DRF: button
    FR: Yes, use infinitive, not imperative.
    DRF RG FR 16:00

  6. SQL database. It has quite light duty so could run on a development machine. It does not need a dedicated server or high performance. It needs access to the Servlet Womb. The presumption there is an administrator capable of installing MySQL and dealing with database backup and recovery. This tool is aimed at teams, not individuals, so this is a reasonable assumption.
  7. Translation Server. This will be a Servet Womb, e.g. Tomcat. Adjustments need to be made to deal with installing and running under other Servlet Wombs. The presumption there is an administrator capable of installing the servlet womb and installing the Internationaliser jars into it.

Walk Through

The Administrator uses the Manager to set up the database records for the project, and the various bundles and locales it will use. He sets up the database records for the various translators who will be working. Finally, he assigns various bundles to various translators.

Now we run the Parser on the various source code bases to extract the translation keys and programmer notes.

The administrator runs the statistic Report on the project to make sure all the bundles have some translation keys.

The translators start working translating into the various locales using the online/offline Editor.

The administrator can see how things are going by running the statistics Report for the project. He can see how many strings are yet to be translated for each bundle.

The administrator runs a Generate to create the bundles, possibly incomplete. Programmers propagate these *.properties files to the appropriate places for testing.

The administrator also runs a Inserter to insert the translator’s comments back into the Java source code.

The programmer’s add comments, and perhaps change translate key names in order to make it clearer to the translators what is required. They do this with their ordinary programming tools.

The administrator runs another Parser run to extra the latest translate keys and programmer comments from the source.

All the while translators are continuing to polish their work.

Optionally you may have some translators acting as proofreaders, proofing either the raw translations or checking them out in the context of the finished program, and updating the database with the latest status. Translators and proofreaders communicate can leave notes for each other in a field attached to each translation.

The translators will see the new programmer comments and can modify translations.

This process of Parse, Edit, Report, Insert and Generate can happen over and over in any order as the translations are completed and polished. You can even run all these steps simultaneously.

Eventually the programmers insert the generated bundles into the final build.

The Manager

Only people with administrator capability can run the Manager.

Manager : People

You will see a grid of existing people, much like a spreadsheet. The top line is blank where you can add the information for a new person. See the people table above. You can edit the information in the grid to update any existing person. You cannot change the initials. To delete a person, you must confirm that all record of that person having translated various strings will be lost forever, even though the translations themselves will not be. All people in the system will always be visible or scrollable. The SQL database will start with one administrator pre-set up with ID ADM, so you don’t have a chicken-egg problems.

Manager : Projects

You will see a grid of existing projects, much like a spreadsheet. The top line is blank where you can add the information for a new project. See the Projects table above. You can edit the information in the grid to update any existing project. You cannot change the projectID. To delete a project, you must confirm that all associated translations will be lost forever. All projects in the system will always be visible or scrollable.

Manager : Project Locales

You configure systemwide your list of possible locales you might use in any project. You type in the project name, and tick off the list of locales you want to use for this project.

Firewalls

Given that clients and servers must talk to each other, it is inevitable that firewalls will interfere to some extent. Translators using the client software may have little computer experience and will be incapable of configuring their firewalls. Central help will not be much use since everyone could have a different router and the central help people would not necessarily have the manuals. So the safest thing to do is go with HTTP protocol on port 80 using a traditional HTTP server with servlets. This won’t eliminate the problem, but there is little the program itself can do if firewalls block. To further avoid frightening firewalls, the messages back and forth will be UTF-8 text rather than binary.

Variable Text

Sometimes you want to generate a sentence like this: Your son George was late 4 times this month. You need two sentences, the male and female version: You might encode them like this:
key: tardiness.male translation: Your son {studentGivenName} was late {tardies} times this month.
key: tardiness.female translation: Your daughter {studentGivenName} was late {tardies} times this month.
The programmer could then generate the required sentence with:

You can use this same technique to handle singular/plural. Don’t attempt to solve these sorts of problem by simply replacing pronouns such as his/her/they. In other languages, when you change the pronouns, other things in the sentence have to change as well for gender/number agreement.

The only impact this scheme has on the Internationaliser is to ensure that translations include all the {…} replacement parameters in the key string and no extras.

Icons

This program has optional small (probably 16x16) transparent *.png icons to mark almost everything. The program has built-in default icons, but all the rest of the icons are the responsibility of the user to set up. They include: You assign icons by keying their name, with verification by seeing the icon. You don’t assign icons on a daily basis, only when you first set the program, and to a small extent when you start a new project or hire new people. The icons just help you scan for information more quickly. Everything still works with just default icons.

Introducing New Icons

Only the administrator can introduce new icons into the system. The icons must comply with constraints on size. Further, administrators are the only people who can assign the default icons, or individual icons.

The first thing to understand is that the icons themselves do not exist in the database, only the names of the corresponding resources. There are two kinds of icon resource:

  1. Early Icons. The administrator makes these available simply by copying/uploading them to the /icon directory. Early icons become instantly available to all clients. Early icons, however, are slow since they are downloaded from the server each time they are needed.
  2. Permanent Icons. From time to time the administrator bundles the new early icons into the icon.jar resource file. Java Web Start notices that the icon.jar file has changed and will download it the next time each client starts the editor. Thereafter, the clients get the icons from the locally cached jar, which is much quicker than downloading them over the net. The administration must shut down the server temporarily and hence also online access of the clients running the editors to update the icon.jar of permanent icons.
The client software first looks in the jar for an icon it wants. If it can’t find it there it asks the server. If it still can’t find the icon resource, it uses a default icon.

The other disadvantage of early icons is they are not accessible when a translator is using the editor offline.

Font resources work a similar way with Early fonts and Permanent fonts. There you an additional option Installed fonts where you natively install the font manually with the OS (Operating System) control panel.

Icon Naming

A naming system helps keep track. Icons are named like this. Bold marks the parts of the name that are fixed, where you have no choice in the name.
Icon Naming conventions
Icon
Database Representation
Icon
Resource name
Use
NULL people/default.png Default icon for a person. If there is no default icon defined, a built-in one is used.
female people/female.png An icon you might use for a female. There in no need to categorise by gender. It is just that people might like an icon that looks a bit like them.
female people/blond.png an icon that might be suitable for a blond male.
NULL task/default.png The default icon for tasks. If there is no default icon defined, a built-in one is used.
high task/high.png, an icon you might assign to high priority. Again you can use any classification scheme you want. high has no special meaning.
NULL project/default.png default icon for a project. If there is no default icon defined, a built-in one is used.
Symantec project/Symantec.png an icon you might use the project for Symantec. The name need not match the project name. Note, names are case-sensitive and are normally all lower case.
small project/small.png An icon you might use for small projects.
NULL ResourceBundle/default.png The default icon for resource bundles. Likely you leave this out, and take the built-in default.
NULL locale/default.png The icon to use if there was no specific icon supplied for a locale.
en_CA locale/en_CA.png the icon for a locale, probably a flag.
NULL thumbnail/default.png the default icon for a thumbnail, e.g. a shrunken screenshot.
flowquery screenshot/greatbear/tides/flowquery. png the image for a screen shot. Unlike the other icons, it has a structured name which includes the project short name and the resource bundle short name.
If you can’t be bothered with icons, just use an empty or no icons.jar file.

Icons have tooltip hoverhelp. This means when you hover your mouse over them without clicking a box will pop up telling you the meaning of the icon, both in abbreviated and long form. When you move you mouse away, the help automatically disappears. You don’t have to dismiss the box. Like everything else, these explanatory texts can be internationalised.

Sophisticated administrators will likely maintain their library of icons using a version control system such as CVS or Subversion. This is independent of the Internationaliser application. It just uses the latest icons.jar, or more correctly, any icon resources on its classpath or jarpath.

Email

The key to dealing with email is to keep it simple. We want to avoid having to configure mailservers for every client, deal with firewalls, spam, and ISPs (Internet Service Providers) trying to block you from accessing mailservers other than theirs. We don’t want to reinvent Eudora or Outlook. What still needs to be nailed down is under just what conditions are such email alerts generated. Possibilities include:

Billing

The point of billing is twofold:
  1. To track production to pay the translators and proofreaders.
  2. To track production to bill a possible customer for whom you are providing translation services.
The internationaliser does not handle billing or payments per se, but it does provide information you might find useful in billing. You pretty well have to write a custom billing package, or do it manually, which is quite feasible if you have only a handful of translators. The Internationaliser calculates character and word counts on each translation and exports you the raw data.

Fields in the database prevent you from paying for a translation more than once, even if it is modified after payment.

When you run the internationaliser billing export, you will get a display like this and a CSV (Comma-Separated Value) file to match that you can import into your own custom billing program.

Translators’ Recently Completed Work as of 2006-01-31
initials person task locale total translations total words total characters
DRF Don Fockler rolling thunder en_CA 10 40 300
en_US 11 43 310
the big grind fr_FR 7 38 321
The corresponding CSV file would look like this:
DRF,Don Fockler,rolling thunder,en_CA,10,40,300
DRF,Don Fockler,rolling thunder,en_US,11,43,310
DRF,Don Fockler,the big grind,fr_FR,7,38,321
Your custom billing program can take that CSV information and calculate the amount of money you owe each translator.

There are similar reports for each proofreader and each project (which can be used to bill customers.)

Recently Completed Work on Project Waverly as of 2006-01-31
locale total translations total words total characters
en_CA 10 40 300
en_US 11 43 310
fr_FR 7 38 321

As soon as the export file is created, from the Internationaliser’s point of view, those translations are now paid/billed, and it marks them as such in the database records so that information will not be included on later reports which would fool you into paying/billing twice.

Global Configuration Properties File

Configuration that applies to the entire internationaliser project goes in a file called internationaliser.properties which is a standard Java keyword=value properties file. It looks like this:

System Requirements

The Internationaliser is a client-server application.

The client machines must be capable of running Java 1.5+ and have the JRE (Java Runtime Environment) installed with the polishing to make Java Web Start work smoothly. They need 256+ MB of RAM (Random Access Memory). They need a 1+GHz processor. They must have full Unicode font support, which lets out W95 and W98. They might be W2K, XP, W2003, Vista, W2008, W7-32, W7-64, W8-32 and W8-64, Linux, Macs, Solaris… They must have an Internet connection, preferably ADSL or cable, but dial-up will do. Direct dialup to the server will not suffice unless it looks like a PPP (Point-to-Point Protocol) Internet connection from the client end. The client machines must have modern email software installed and a modern browser installed, preferably that support UTF-8 encoding. The machines should be equipped with keyboards that can directly generate the keys for the languages to be translated. Translators may find it most convenient to have several keyboards each specialised for a given language. You might use a reverse KVM switch so you can swap without shutting down uplugging and plugging the keyboard in the back of the machine. The Interationaliser provides no special means to generate characters that are awkward to key on a given keyboard.

The server must be capable of running Java version 1.5 or later, MySQL and Tomcat. The Internationaliser might typically run on a machine in the programmers’ office that is also used for other functions. The load the Internationaliser puts on the server is relatively light. Client Applets shoulder most of the workload.

If, instead, you use an offsite server, you can’t just copy files to and from the Internationaliser’s directories. You must upload and download them with FTP (File Transfer Protocol), which is somewhat clumsy. You also have to Telnet into do things like start up and shut down the server software. Diagnosing problems remotely is more difficult. If possible, I suggest using an on-site server. Later migrate to a high-bandwidth off-site server, only if necessary.

What Is Not Included

To make it clear, this project does not do any of the following:

Website Translation

This a future variation on this same theme that lets you manage the translation of web pages. The problem is not so much translating a page, but retranslating it, only the parts that have changed. THe idea is to extract each sentence as if it were a programmer key, and tag the orginal document with anchors (or comment markers) so that the Internationaliser parser can rapidly recognise the original sentences even if edited or reordered. This way is it easy for the Internationaliser to tell just what has changed, and how much it has changed. The translator then can focus on just the sentences that have changed, while still seeing them in the full context of the web page.

The tricky part is letting the translators for the most part ignore markup.

The other key to the solution is using HTML static macros or JSP to generate multi-lingual boiler plate so that it does not need to be translated individually on each page.

Possible Extras

machine translation would give you a rough approximation to start. This would allow translators weak in English (or the base language), to work.

Display translation A while working on translation B rather than the programmer key.

Cloning would let you copy a translation to another language as the starting point, particularly useful for country or variants of a base language.

fallback, if you don’t provide a translation for a country or variant it takes the translation from a root translation.

The Inserter can insert translated strings for selected languages and translator comments back into the Java source to let the programmers better understand the code and proofread some of the strings.

Mini server. Instead of using a full blown servlet engine, use something stripped down that does not require system administration. It would use a simplified socket-based protocol that exchanged serialised objects with the clients. The advantage is non-technical people could set up and maintain the server on any PC with internet access. The disadvantage is, since it does not use HTTP protocol, clients might have trouble accessing it through firewalls. This is the main reason for using HTTP as the main approach. The second reason is to allow non-Java access as well, with pure browser HTTP for the basic editor.

Thin Client Version. Used where Java is not available. It could used for example from public terminal in an Internet Cafe. The thin version of translator client and server would work with browser without Java installed. You would gather up a page full of entries to translate and then hit SUBMIT when you had translated them. The disadvantages of this approach is:

Phases

It is best to break a big project into phases, so that you can do redesign part way through based on some practical experience or experimentation rather than waiting until everything is complete and all interconnected making it harder to change anything.
  1. Editor with simple server. You exercise it with sample data manually entered into the database. The server collects and saves translations.
  2. Extract translations to be done from source code, and export bundles.
  3. Administrative functions
  4. The reporter
  5. Billing export. The program itself does not do billing. Write a simple custom reference billing program to show the general skeleton of how it works. Every customer must write their own. They can use this as a skeleton overriding the various methods.
  6. Email alerts
  7. Web based editor

To Come

country codes
Cyrillic alphabet
Dutch
French
German
language codes
locale
localisation
ResourceBundle
translation

This page is posted
on the web at:

http://mindprod.com/project/internationaliser.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\internationaliser.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.
no blog for this page
IP:[65.110.21.43]
Your face IP:[54.81.216.254]
You are visitor number