Bali is my hypothetical evolutionary replacement for Java to highlight problems in Java that may lead to syntax changes itself.
Bali — Java with a Spoonful of Syntactic Sugar
Bali is a proposed superset of Java. I wrote this essay in the days of JDK (Java Development Kit) 1.0.
Some of the features I requested, such as enums, are now part of Java. In light of how Java has evolved, I would
revise the precise syntax of some of these suggestions to be more congruent with where Java is now. It takes
ideas from various other languages. The intent is to create a language forapplication programmers that is safer,
terser, easier to read and maintain than standard Java. Terseness makes the language easier to read, write and
maintain. The ideas come from Abundance (See Byte Magazine October 1986), Eiffel,
Pascal, Delphi, Smalltalk, PL/I, Algol-68, Forth and even COBOL.
Not all the ideas are compatible with each other. They are intended to stimulate discussion on language
evolution, not as a formal definition of a new language. I have ordered them so that the ones least likely to be
controversial are first.
Since this was written, Java has implemented some ideas similar to my recommendations, such as enums and a
foreach loop. When you read those sections, realise I am not advocating changing the details of the new Java
implementations.
I invite you to submit your own ideas for inclusion.
The Bali/Abundance Philosophy of Language Design
Simplicity is wonderful thing, if not taken to extreme. Some Nazis decided that after they had conquered the
world, they would need a simplified German for the conquered masses to learn. A watermelon became a
large green egg-shaped water fruit.
When a language (computer or natural) lacks a specific way to express an idea, people will kludge a million
variants. An attempt at simplicity leads to a confusing tower of Babel (e.g. kludged enums in Java ).
Human languages have thousands of words and a rich syntax. Computer languages are ridiculously impoverished in
comparison. I suggest maintaining computer code would be much easier if we were much more liberal with syntactic
sugar, so there were terse standard ways of solving standard problems such as specifying the keys to a
sort, rather that writing pages of esoteric, eccentric bubblegum to compare each field in turn.
Further languages should be pronounceable. To the human mind, unpronounced punctuation is just so much
annoying fluff. Note that in natural languages, people make far fewer mistakes in word choice than punctuation.
As we move away from the keyboard toward voice control (Dragon Naturally Speaking) this
becomes ever more important.
Computer code should be pronounceable, so that you can think about them aurally in the mind, and talk about
them with others over the phone. This means reducing the amount and pickiness of the punctuation in computer
languages. I waste an hour every day balancing () and {}. Much of this could be avoided by computer language
design that was closer to natural speech.
 |
recommend book⇒The Language Instinct: How the Mind Creates Language |
| by |
Steven Pinker |
978-0-06-095833-6 |
paperback |
| birth |
1954-09-18 age: 58 |
978-0-688-12141-9 |
hardcover |
| publisher |
Harper Perennial |
978-0-06-203252-2 |
eBook |
| published |
2000-11-07 |
978-1-4558-3970-4 |
audio |
| |
B0049B1VOU |
kindle |
| This is one of the most fascinating books I ever read. It turns out that the brain has hardware for parsing language. Shortly after birth, the parsing engine gets configured to the particular language it hears. Humans without access to a common language will spontaneously develop a pidgin language and within a generation or two will develop a full creole with a complex grammar. |
|
| Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder. |
In his book, Pinker points out
that the human brain is hardwired with grammar processing wetware. I wonder if with our supposedly simple
artificial computer languages, we inadvertently bypass our hardwired syntax parsers. No wonder we make so many
programming mistakes. We should be aiming for languages more like natural spoken languages that don’t
depend so much on fussy punctuation. Natural spoken languages evolve with new vocabulary and short-forms
introduced on-the-fly.
Every time I bring up these ideas publicly somebody will argue that I wish to destroy Java by guilding the
lily and needlessly complicating it. I laugh to myself at such pompous asses, much as I would
chuckle at a child who pontificated on the relative merits of sex versus masturbation when he had only tried one
(or neither). Simplicity is a romantic notion. Living without electricity is certainly simpler but it
is a heck of a lot less convenient, and a heck of a lot less productive. The code I advocate is orders of
magnitude simpler than the low level futzing that passes for programming today. I have tried both. I promise you,
the old way is horrible in comparison. You will never want to go back, other than perhaps to rescue others
from their ignorant misery.
You will see these thoughts influencing the way I am pushing Java to evolve.
Variable Sized ( ) Display
In
int a = ((b+c)/(e+f))*(g(i)+h);
That some piece of code displayed in Bali might look like this:
int a = ((b + c)/(e +
f)) * (g(i) + h);
The red is just to highlight the outsized (), though colour coding matching () and {} is not such a bad idea.. It
might even be optionally displayed like this:
b + c
int a = ----- * (g(i) +
h);
e + f
Why?
- The variable size parentheses make it easier to visually balance. This is not a change to the language, but
how the language is presented. I find such concerns as whether we are talking about a change to the core
language, the standard libraries, or the way the SCIDs present code is immaterial. What
counts is how the language presents itself to the average maintenance programmer. She sees it as a seamless
whole.
- The alternative of named expressions means the links between the pieces are abstract, not visual.
Why Not?
- When you get deep nesting you need extra space between lines.
- You should try hard to not use complex expressions needing this feature. Split them into subexpressions and
assign them to temporary variables or extract them to methods with meaningfull names.
Assertions
Assertions should be brought over from Eiffel with as little modification as possible. Better minds that I could
concoct a more complete Java assertion syntax. Eiffel assertions are much more extensive than I have limned here.
public int corral (int low, int value, int
high)
{
require low < = high;
ensure low <= result && result < = high;
if value <= low then return low; fi;
if value >= high then return high; fi;
return value;
} end corral
Why?
- Assertions formally document the preconditions on a method’s parameters. Whose job is it to handle
invalid data, the caller or the callee?
- Assertions formally document the postconditions on what the method does, and what changes it makes on
this.
- Assertions constrain the behaviour of overriding methods.
- When turned on during debugging, assertions help flush out bugs.
Why Not?
- Assertions sometimes fail. Then what. Eiffel has a whole recovery system quite different from try, throw,
catch, finally. It would not graft well onto Java.
Canonical Counting For Loop
for i to n step 1 {… }
is
for ( int i=0; i<n; i++ )
{
}
With the exception it works if n is Integer.MAX_VALUE, and the step clause is optional if the step is 1.
Why?
Java for loops as they stand have some problems:
- They don’t work properly when the limits involve Integer.MAX_VALUE or Integer. MIN_VALUE.
- The end condition traditionally is expressed with < n. Sometimes code sneakily reads <= n, which is
easy to misread.
- Innocent looking for loops that masquerade as a canonical for (int i=0; i< n;
i++) can actually do something quite different.
- They are needlessly verbose for the canonical case.
Why Not?
- It might be confusing to have two different ways to do the standard for.
Relaxed Semicolon Rules
I would like to relax the rule on semicolons. If a compiler can detect a semicolon is missing surely
it can logically insert it. Similarly, if it can determine a semicolon is extraneous, it can
logically remove it. For example, a semicolon before a } would become optional. Similarly, the compiler would not
freak about an extra semicolon at the end of a for, e. g. This would be perfectly legal:
for (int i =0; i
<n; i
++ ; )
{
a.doSomething();
a.doSomethingElse()
_
}
A compiler might even forgive this with just a warning:
if (i
< 0
)
{
a.doSomething()
_
a.doSomethingElse()
}
A tidying program could insert and remove semicolons to some canonical standard before saving code in the
repository. The canonical form would likely have semicolons before } just as now, to make it faster to insert a
new line of code at the end of the block.
Why?
- Even experienced programmers waste hours every week just finding and fixing these pedantic little semicolon
errors. It is ridiculous to pay people $100 per hour to manually find and fix
semicolons, something a computer could easily do automatically.
- Code with fewer semicolons is actually easier to read.
Why Not?
- Programmers shouldn’t make mistakes. Experienced programmers don’t make those kind of
mistakes.
- I almost never make semicolon mistakes. Therefore the feature is useless.
- It just encourages sloppy thinking.
- It will just start rwars about the canonical form.
- Real men don’t need computers to help them get their semicolons correct.
- You make semicolon errors impossible by using a SCID. There is no need to change
the language definition.
- The compiler might think there is only a semicolon missing — but inserting it might just hide the real
problem. Of course the compiler could ask to insert the semicolon for you — in fact there are compilers doing
this, for example VisualAge for Java.
Optional Named End Statements
Java tends to overuse the { } characters. It is so easy to get them unbalanced. The compiler is no help on
finding the imbalance. Further, even when they are perfectly balanced, it is hard to match them up by eye,
especially when they are widely separated or when the source has not been through a tidier to align them.
To solve this problem, Bali takes a leaf from PL/1 and Algol-68. I don’t dare yet propose a solution as
radical as Abundance uses.
In Java you might write:
public void aMethod ( int aParm )
{
for ( i=0; i<aParm; i ++ )
{
System.out.println(i);
}
}
In Bali you may add optional end statements: like this:
public void aMethod (int aParm)
{
for ( i
=0;
i
<aParm; i
+ + )
{
System.out.println (i);
} end for
} end aMethod
The keywords you may use after end include:
- the name of the current method
- the name of the current class
- the name of the current loop
- for, then, else, while, switch, case, class, method, init, try, catch, finally.
If you use end statements, the compiler checks to ensure the preceding } does indeed match that syntactic
element.
Why?
- the end statements act as documentation to make the program easier to read.
- they are guaranteed accurate, unlike similar comments.
- They could be automatically generated by SCID-like tools.
- They would be automatically renamed along with the method by tools like VAJ (Visual Age for Java) global method rename.
- they help the compiler generate more accurate error messages for unbalanced { }. The compiler can nail down
precisely where the mismatch occurred.
- they are shorter to write than the corresponding comments.
- they make it more likely that when you insert new code into an existing program you will put it in the
correct place, particularly adding a new method to the class just before the final } of the class.
- they are likely to flush out {} nesting errors that technically balance, but which don’t do what you
intend.
- The alternative is to break the code into separate named methods. Then the links between the pieces would
be symbolic/verbal, rather than visual. With a SCID (Source Code In Database) you can collapse blocks so that the stringiness of inline
code is not such a problem.
Why Not?
- end xxxx is quite verbose just to mark the ends of blocks. It might be wiser to use more compact
icons to mark begin and end.
- If you can’t easily match brackets by the first look, refactor your code so that you can. Use an
automatic code formatter to align them properly. If there is too much code between them, extract parts of it
into its own method.
- With a SCID, matching begin-end elements can be done visually. There is no need for names.
Explicit modifiers
Permit the use of the modifier instance to explicitly declare a variable or method as non-static. Permit
the use of the modifier package or friend to explicitly declare a variable or method as not
private, protected or public.
Why?
- In Java, you can declare a method or variable static, but there is no way to explicitly declare it not
static, i.e. instance. The lack of a declaration could be an oversight, or it could be deliberate. You
can’t tell. Similarly for friend /package scope.
- There is currently not even a vocabulary to talk about friend visibility.
- There is no target string to search in the source code for friend and instance methods.
- It makes people think about whether a method should be/is static or instance. So often programmers are
puzzled when they use instance methods as if they were static. The instance keyword would jog their
memories.
Why Not?
- There are already scads of modifiers. New modifiers would just clutter code. Explicit modifiers would only
make sense if you selected the modifiers with radio buttons in the source code editor.
Extended CASE labels on
SWITCH : SELECT WHEN
In Bali case labels can have the following forms:
- integer constants
- ranges, e.g. 300..500
- strings
- variables (including objects compared with .equals). In regular Java it is not good enough to have
constants for your case labels known at load time, you need ones known at compile time.
- boolean expressions, e.g. isValid(x), or regular expressions that evaluate to a boolean.
To avoid confusion with the current SWITCH I suggest a new set of keywords:
select x
{
when selected > 1000 :
out(huge );
when selected % 2 == 0 :
System.out.println (even
) ;
when -100 .. -10,
0, +10 .. +100 :
System.out.println (boring
) ;
System.out.println("Note: no braces needed for
each case" );
System.out.println("Note: no
fallthrough" ) ;
other :
System.out.println(something else
) ;
}
Conceptually each when clause is executed in turn too see if it is true. However,
usually the code can usually be compiled much more efficiently than that with a combination of nested if, binary
search for range, jump tables, delegate arrays, hashtables and indexed lookup tables. The code is both easier to
read and faster than traditional switches. See my student
project on extended case where I go in more detail on implementation.
If you leave off the other clause, you will get a runtime exception, or perhaps a
compile time error if a value falls out the bottom unaccounted for. If you want nothing to happen, you must have
an explicit empty other clause. You might also implement decision
tables that generate similar logic.
Why?
- Maintenance is safer. The equivalent if statement would redundantly specify the switch variable.
This allows accidental divergence (what if one of the cases accidentally specifies a different switch
variable?) and cheaper maintenance (the switch variable can be modified in a single location).
- It is easier to proofread code to see that all cases have been handled. By leaving out the other clause, you can get the compiler and runtime to help you ensure you have covered all
cases.
- It is easier to read code to associate the actions with the conditions.
- It is terser than the equivalent ifs or individual case labels.
- The other case can be used to ensure no cases were overlooked. Granted you can do this with else,
but it is much easier with nested ifs to have a leak — a case that is never handled.
- A compiler might do better at optimising because of the additional structure over the equivalent nested
ifs. For example, for large range bands that are contiguous, the code need test only one end of the
range per band, not both as would typically be done in nested if. The compiler might generate a binary
search to determine range band. That sort of code implemented in nested if would create unmaintainable
code.
- Ideally you want to avoid complicated case or nested if logic, but there are times when there appears to be
no other way to handle it. You need every tool possible to help tame the mess and make the code comprehensible.
There be dragons in such places even if they are rare.
Why Not?
- It is fairly easy for a case to accidentally fall between the cracks and be handled by other.
- Compilers don’t know enough to safely generate optimal code for the cases. Humans writing code know
more and can therefore generate faster code even if the code is harder to follow.
- It is cool to have ifs nested so deep that it doesn’t fit on your screen any more, even though you
indent only one character per if-statement. See How to Write Unmaintainable
Code.
- Proper use of design patterns should make most of complicated nested if statements unnecessary.
C#-like JavaBean Properties
Bali properties could look like Eiffel, Delphi or C#. I think the best approach is C#.
In Java you write:
In Bali you would write that as:
class MyClass
{
published int height
get
{
return mHeight;
}
set
{
if ( 50 < value
&& value < 275
) mHeight = value;
}
private int mHeight }
x
.height
++;
published is a sort of super public declaration that means this property should be visible in
the beanbox. To provide read/write access you write the get/set code. To deny
read/write access you leave out the corresponding get/set code. To the clients of the
property, the property looks like an ordinary public member variable.
get, set and value are reserved words. get and set introduce accessor/guard
routines for internal variables. If you leave out the get or set clause, you suppress the ability to read or
write the property.
Why?
- Properties let you convert a public variable into a pair of guard functions without having to change the
client code.
- Client get/sets written with properties are much easier to read and write. They look just like ordinary
variables.
- In situation where it is important to be able to tell if an identifier represents a variable or a function
a SCID could come to the rescue. It could colour code them, or even expand properties back to function
notation. You could also use a naming convention to help in distinguishing access to a private member variable
directly and via the property within the class. You would call the private member mSize and the property pSize.
All members begin with m and all properties with p. See coding
conventions.
- This syntax makes a clear distinction between external properties and internal variables needed to
implement them. There is no overloading of names with the ensuing confusion. There is no problem controlling
visibility independently.
Why Not?
- Property already has a meaning in Java — a list of
keyword/value pairs. Some other name such as attribute should be used for such a feature. Perhaps they
should be called pseudovariables.
- Clients should be aware whether they are using a function or a variable.
- Competent programmers never use public variables. They set them up as functions right off the bat even if
the routines are just dummies. Thus there is no need ever to flip client code when a public variable changes to
a function.
- Perhaps it might be better to simply allow functions without arguments to be written without trailing ().
That would allow you to convert a variable to an accessor function without changing any client code without
coding any extra declarations.
- Properties are an attempt to invent PL/I-style pseudo functions, functions that can be used on the left
hand side of the = sign. Another way of looking at the problem, properties are an attempt to allow you to
override the effect of the = operator. Perhaps instead of properties, we should invent a more general mechanism
that allows overloaded functions that can be used on either side of the = sign, with multiple arguments. Then
we would have a way of dealing with tuples returned from a function. If a function can have multiple inputs,
why should it not have multiple outputs? If a function can accept several variables to access, surely a
function should be able to accept several variables to store. A more general mechanism like this would lead to
simpler syntax for accessing the Vector methods.
- If you are debugging and tracing code, you want to know if an identifier represents a variable or a
function.
- The existence of language support for get/set methods applying to private data may encourage treatment of
classes as structs with get/set methods for every physical data member. This is a common misconception even
without language support, especially among beginners; fostering it would be harmful.
The Grand Collection Unification
Containers of all types should have a uniform and simple syntax for manipulating them. The concept of container
should be expanded as it is in Abundance to include files, Hashtables, b-trees (both ram and disk based), maps,
sets, arrays, ArrayLists and any other Collection you could imagine. There are three basic operations: get, set
and iterate. I suggest the following uniform syntax to apply to all Map-like things, and set-like things where
applicable. I will use a Hashtable as an example:
Hashtable()
animal;
animal[cow ] = moo ;
animal[pig ] = oink
;
String noise = animal[ cow ];
if (noise == null ) noise = don’t know ;
for each key in animal
{
System.out.println (key + says
+ animal[key]);
}
Why
- It makes it easy to change your mind later how to implement a collection. All you need do is change the
declaration. All the rest of the code stays the same.
- The code for file handling is very much simpler.
- It makes the language easier to learn.
- The code is easier to proofread.
Why Not?
- This permits functions of a single variable on the left hand side of the equal sign. Surely you should be
able to have multiple selector arguments. You might as well generalize the notion of properties to allow them
to have 0 to n selectors, or generalise all methods to have both a right hand side producer implementation and
an optional left hand side consumer implementation. You might as well use the familiar () syntax, e.g.
map(latitude, longitude) = "Johnny’s
Bar & Grill";
String restaurant = map(latitude, longitude, slop);
- All collections code looks alike. You can’t tell if file I/O or an array or an ArrayList is being
manipulated. It isn’t natural!
- Iterators need to be smarter than that. You need ways of selecting subsets.
Enumerations
Java currently has no enumerations. There is the Enumeration interface, but that is just an Iterator over a
collection, not a language feature. I mean enumerations in the sense of Pascal enum or Ada-95 enumeration type.
Java has only static final constants. There is no formal connection between groups of constants, or between the
groups of constants and the variables that use them. There is no formal mechanism to distinguish the two kinds of
enumeration constants:
- enumeration integer constants that represent one of a possible list of choices.
- mask constants that can be ORed together to represent combinations of choices.
I suggest Java should properly support enumerations. This proposal is influenced by Ada, Pascal and C’s
enum, e.g. The type declarations would look like this:
enum DayOfWeek {
SUN (
Stanford University Network), MON, TUE, WED, THU, FRI,
SAT (
Scholastic Aptitude Test)};
set DaysOfWeek {
SUN, MON, TUE, WED, THU, FRI,
SAT};
or in shorthand:
set DaysOfWeek of DayOfWeek;
You could then declare a variable as:
DayOfWeek weekday = DayOfWeek.SAT;
or
DaysOfWeek weekEnd = DaysOfWeek.SAT | DaysOfWeek.SUN
Enums are much like classes or interfaces. They could be standalone or could be declared inside a class or
interface, and made private or public.
Inside the class you would reference the enum value by DaysOfWeek.WED, (possibly
just WED if there were no ambiguity), outside TheClass.daysOfWeek.WED. The enum constants could be used in case labels.
To handle enums that are individual bits I suggest a syntax like this:
set PizzaToppings { MOZZARELLA, PEPPERONI, TOMATOES, GREENPEPPER,
PINEAPPLE, GOUDA, CRANBERRIES, ANCHOVIES };
You could then declare a variable as:
PizzaToppings pizzaOrder = PizzaToppings.TOMATOES | PizzaToppings. CRANBERRIES;
if (pizzaOrder & PizzaToppings.ANCHOVIES)…
enums and sets are types, and are type checked in parameter passing and assignment. Currently Java’s static
final ints have no type checking to make sure you don’t try to add a Monday topping to your pizza.
You would have a cast of the form (DayOfWeek) or (PizzaToppings) that converted a plain int into a member of that enum. It would raise an exception
if it did not match one of the official values of the enum or set. You could also cast an enum to an int and a
set to a long. You could cast an enum to its related set but you could not cast a set back to an enum.
There is no formal super/subset relationship between different enums or sets.
There are a number of pseudo functions you can perform on enums. They have a syntax that simulates both static
and instance forms: e.g.
day = day.first ();
day = day.prev ();
day = day.next ();
day = day.last ();
day = DayOfWeek.first ();
day = DayOfWeek.prev (DayOfWeek .WED);
day = DayOfWeek.next (DayOfWeek .WED);
day = DayOfWeek.last ();
next and prev raise an InvalidEnumException if you run off
the end. Case labels may use enum constants, pseudofunctions of enum constants, and enum ranges, e. g.
switch (day
)
{
case DayOfWeek.first
()
.dayOfWeek.WED :
dosomething();
break;
case DayOfWeek.next
(DayOfWeek.WED
) :
dosomethingelse();
break;
case DayOfWeek.last
() :
dosomethingelseagain();
break;
default :
complain();
break;
}
Enum variables have the following operators defined on them + int, - int, enum- enum, <,
>, =, !=, []. Set variables have the following operators defined on them: | & ^
~ = !=.
Implementation
Internally, for efficiency an enum is an int 0..63 and a set is a long. Since there is no inheritance, type
checking could be done completely at compile time. A standalone enum is implemented as an interface with a group
of static final ints. You might test the idea by implementing it with a preprocessor.
When you say PizzaToppings.ANCHOVIES the compiler automatically selects the index or
the bitmap representation depending on context. This will get rid of one of the main sources of error in handling
enumerations in Java, confusion over whether an enum constant is intended as an
index or a bit map.
I debate with myself back and forth if you should be allowed to assign explicit values to the enum constants
the way you can in C++. The disadvantage of doing that is you ruin the
next and prev functions. The advantage is it makes it easier
to interface with the outside world such as databases that use ints, not enums. At this point in time I feel it
best to let the compiler assign the numbers always contiguous, starting at 0. If you need breaks in the ordering,
write a method to produce it.
Languages like Pascal manage to use the same enumeration constants in both enum and set contexts. That trick
is probably beyond the ability of simple-minded Java compiler logic, though it would be preferable from the
maintenance programmer’s point of view.
I think by default you should be able to use your enums in either mode, either as indexes or as bits in a set.
The compiler should figure out which from the context and should generate both forms internally.
Just how long should an enum be 8-bit index and 256-bit set?, 16-bit index and 8K sets?
Have a good look at how enums work in other languages. They may have cleaner solutions than I have suggested
here.
Why?
- Enums have type checking. You can’t accidentally use the wrong enum constant, or use the enum form
where you needed the set form. Consider a class like GridBagLayout that has all manner of groups of enum
constants. Currently it is very easy to use one from the wrong group, e. g. LEFT and WEST.
- Various kludges using enumeration objects won’t work in case clauses, don’t have a simple
predictable external representation, and have the extra overhead of an object rather than a simple int.
- It ensures you don’t use a constant from the wrong enumeration group.
- It ensures you use an enumeration constant in the correct way, as a value or as a bit.
- It makes maintaining code much easier. It is quite difficult without enumerations to figure out which
constants are legitimate to use with which variables. My suggested scheme makes it completely clear, and
recruit’s the compiler’s help to ensure the rules are followed.
- The formal connection enables a debugger to display the enum value by name, rather than just by value.
- Having associated integers allows comparison for order, storage in a database, permanent values for use in
native classes, and enumeration over all possible values.
- By making the compiler assign the bit patterns for the enums without gaps means that next and prev are well defined and simple to implement. They can be
used in for loops to enumerate over all possibilities.
- You ideally want to view your code two different ways:
- method focus: How does each subclass implement this particular method? I want to see all the
implementations side by side.
- subclass focus: What are the particular behaviours of this subclass.
Enumerations work better for method focus. You have alls the code for all subclasses right under your nose
combined in one method. Subclassing works better for subclass focus. With SCIDS, you could use subclassing
but still get the benefits of the old enumeration method focus. You could flip back and forth viewing and
coding either way.
Why Not?
- It may be too complicated to figure out the type rules. What sorts of arithmetic are you allowed on the
various enumeration values without casts to store them back in variables? You want to avoid casts that could
introduce errors. You probably want + int, - int for regular enumerations, | & ~ for set style
enumerations. How much run type checking should be done to ensure enumeration constants are always valid.
- Templates are coming. There will be better ways to kludge enumerations as primitives that take only 20
pages of code. This will give Roedy a chance to become famous by writing an amanuensis to generate all the
bubblegum from a list of the enumeration names.
- Enums are not primitives. They are not classes. They are a whole new thing. Fitting an entire new class of
animal into a language takes some deep thinking. It has to get along with everything else. Ideally enums could
be defined in a way that did not require any changes to the class file format. They might only need exist at
compile time. At run time, they would be treated just like ints. Enums are related to subranges and units of
measure. Perhaps some more general solution should be sought that handles all three.
- Enumerations are mostly used in switch-statements and the like. They don’t make use of polymorphism.
Fetch
The fetch keyword just helps shorten a common idiom. I find myself typing stuff like
this
This pattern is so common it would be useful to have an abbreviation both for typing and proofreading. maybe:
fetch motherObject.getLongComplicatedThingamagig();
Why?
It is faster to type and easier to proofread. One letter out in those three repetitions, and the code has a
totally different meaning.
Why Not?
It legitimizes the goofy getXXX convention for what should be Delphi-like properties.
Augments
The augments keyword is much like the extends keyword.
- It stops you from deliberately or accidentally overriding any method in the base class.
- It lets you extend even final classes with additional convenience methods without creating wrapper
methods.
Why?
- Saves writing repetitive wrapper methods.
- Ensures methods are not accidentally overridden, especially in superclasses of the immediate base
class.
- Ensures methods are not accidentally overridden if the base class is later updated.
Why Not?
- Perhaps this should better be handled with explicit override, original, overridable keywords.
via
The via keyword lets you implement an is-a interface via a has-a reference to some
class that implements it. It makes it much easier to write wrapper methods. You might use it like this:
public class DoIt
implements SomeInterface
via foo
{
private AClassThatImplementsSomeInterface foo;
}
Why?
- Saves writing repetitive wrapper methods to implement an interface with a has-a reference.
- It reduces arthritis. You can change the interface without having to maintain gobs of code.
Why Not?
- It muddles is-a with has-a logic a bit more than the purists would like.
- It encourages sloppy design, as it inspires to extend a class with functionality without thinking about the
actual responsibilities of the class.
Constructor Shortcut
In
BigDate d = new BigDate( 1997, 5 , 6 );
In Eiffel you would write:
d : BigDate;
d.Create( 1997,05,06 );
In Bali you can use the usual Java syntax or this shortcut:
BigDate (1997, 5, 6) d;
Why?
- You don’t have to write the name of the class twice.
- You are less likely to forget to initialise.
- It is impossible to create an object of the wrong type.
Why Not?
- The new syntax masks the fact you are creating a new object.
- The new syntax looks too much like an ordinary method call.
Iterate Shortcut
In
for ( CitiesByState iter = new CitiesByState(); iter.hasMoreElements();; )
{
City city=iter.nextElement();
println( city.population );
}
In Bali, you could abbreviate this to:
for each city in CitiesByState
{
println (city.population);
}
Why?
- it is a lot easier to read and proofread.
- there is less likelihood of error typing it.
- it opens the door for abbreviations to other for idioms.
Why Not?
- The Java scheme gives you extra typing practice and makes your keystroke per day output higher.
Null Method Shortcut
In
if ( person != null ) person.requestPassport();
In Bali, you could abbreviate this to:
person..requestPassport();
or
if ( a != null && a.b != null && a.b .c != null ) d = a .b.c .doit();
else d = null;
would become
d = a..b..c.. doit ( );
Why?
- There is no possibility of a minor mismatch of the name tested in the if and the variable used to access
the method.
- Programmers are lazy and tend to ignore null cases. If you make it easy for them to deal with them, they
are more likely to keep them in mind.
- This syntax really starts to pay off when you have a chain of links, each of which could potentially be
null. You can even generalise it to return 0 or null on encountering a null when a value is being
computed.
- The alternative null Object pattern requires writing an immense amount of bubblegum that essentially does
nothing. If that could be mechanically generated somehow, the null object pattern would be much more
attractive. It potentially also could be faster, since all the implied null tests, and most of the explicit
ones could in theory be eliminated. Vrroom!
Why Not?
- You might want to reserve .. for use in defining ranges.
- You should use the NullObject pattern
instead.
Conversions
In Java, casting is used for two quite different purposes:
- to request a conversion.
- to request that an object be treated as an instance of some sub or superclass without actual conversion.
You are merely reassuring the compiler the object in question truly is already of the type asserted. You
as programmer know this must be so from the way the program logic works, though this is not necessarily
immediately obvious to the compiler.
There are literally hundreds of different ways of requesting a type 1 cast. The simple (type) notation only works
on a handful of cases. Type 2 casts are uniformly done with (ClassName).
In Bali there are similarly two types of cast, but the way you request them is totally uniform:
- (DesiredType) — for conversion.
- (is DesiredType) or possibly (as DesiredType) —
for treatment as sub/superclass.
You can even convert objects to primitives, objects to objects, and primitives to objects with a type 1 cast.
How are the conversions to objects done? by looking for a constructor that takes a primitive or object as the
sole argument.
How are conversions from objects to primitives done? by looking for methods with the names intValue(),
longValue(), etc.
How are conversions from primitives to primitives done? conceptually by looking for static methods with names
like Convert.intValue(double); though in practice these are handled inline as special cases.
Why?
- Simplification. Code is easier to write and read when there is a uniform way of requesting
conversions.
- accuracy. You are more likely to get the correct method.
- it makes it clear when data are actually being transformed.
Why Not?
- It requires changing existing code to insert the as or is.
- It makes no sense that . is a postfix operator but (cast) is a prefix operator. It leads to complicated parenthesis forests. Perhaps we need a new
pair of postfix casts, e. g. (is DesiredType) and (to
DesiredType).
Generic Cast (convert)
I further suggest that the generic caster (convert) be allowed to convert where the
compiler knows the type of the target, e.g.
float x =
1.0f;
String s = ( String) x;
String t = (convert ) x;
String u =
String .valueOf (x
);
Why
- If the types of x or t change, e.g. x becomes a double, all the code automatically adjusts to suit. You
only need change the declarations. The cardinal rule of writing maintainable code is that you specify each fact
in one and only one place. Java flagrantly violates that rule in three places: casts, conversion functions, and
primitive temporary variables that don’t track the type of corresponding major variables.
- When you are writing code you don’t need to constantly be quite so aware of the precise types of each
variable.
Why Not?
- It makes it harder to add new types and conversion functions. Any such won’t be built-in..
Symmetrical Then/Else
In Bali, the if is more like Eiffel’s. The C-style if is deprecated, but still supported. In Java you might
write:
if ( (a = b== c ) || (d== e ) )
{
f= g;
r= s;
}
else
{
f= h;
r= t;
}
In Bali you would write that as:
if (a = b == c) || (d == e)
then f = g; r = s;
else f = h; r = t; fi;
Why?
- the then and else are more symmetrical. The true and false actions align which enhances comparing their
similarities and differences.
- it is easier for a human to pick out where the condition ends and the true action clause starts.
- it helps reduce the parenthesis () forest in the condition by one level.
- it helps reduce the brace {} forest in the actions by one level.
- If you switch from a simple if action to a complex one, you don’t have to add {}. The syntax is
consistent.
Why Not?
- then fi is more keystrokes than () { }.
- fi is an ad hoc end for an if. There should be a consistent system where every block type has a unique
matching begin end pair. You can’t very well use that fi technique for rof, chtiws, esac, elihw, yrt,
hctac.
Up Front Declares
You might write something like this:
dcl maxCount : int,
s : String,
myObject : SomeUserClass;
Why?
- In Java, you have to plough through a great string of symbols before you find the name of the thing being
defined.
- In Java there is no target string you can search for to find declarations.
- With this scheme, declarations align.
- It lets you find declares in listings and email posts where your SCID tools are not available.
Why Not?
- It is too fundamental a change.
- It does not extend properly to allow definitions in the middle of expressions.
- It takes more keystrokes.
- The initialisation syntax is clumsy. It is handled separately from the declaration.
- Without changing the Java syntax, a visual editor could solve the problem by displaying a series of
declarations aligned, possibly with some detail suppressed. Similarly a visual editor could bold face the
identifiers being defined to make them easier to pick out. Java’s declarations are quite a bit simpler
than C and C++’s. Such drastic measures are not needed in Java.
Explicit Concatenation Operator
I suggest that + for concatenation be deprecated, and soon outlawed altogether. In its place we will use
|||.
Why?
- In Java, + does double duty: addition and concatenation. + on an int sometimes means add, sometimes
concatenate, depending on what surrounds it. That is just plain Mickey Mouse, not to mention hard to read and
bug inducing.
Why Not?
- It takes three keystrokes instead of one.
Move Corresponding
I propose something similar to the COBOL MOVE CORRESPONDING that cannot be expressed
in the usual linear language syntax. It requires dialog boxes.
A MOVE CORRESPONDING should display a dialog box with a list of fields that have
matching names in object a and b. Beside each name is a little
checkbox that can have four states:
- copy
- ignore
- use custom copying code, e. g.
switch (a.x) {
case 1: b.x = 10; break;
case 2: b.x = 15; break;
default: b.x = a.x; break; }
- have not decided yet. (this generates a syntax error on compile).
Simply adding a new field to either a or b will create new
entries in all MOVE CORRESPONDING tables (marked not yet decided). The editor also
lets you view any leftover fields from a and/or leftover fields from b.
Why?
- Lists of fields copying from one object to another is common for versioning. You rearrange fields, add
fields and need to convert the old files to the new format. In Java, SQL (Standard Query Language) handles this problem automatically for
rows. You still have the problem of dealing with updating old objects or records in flat files. You also have
it when you export a file of some subset of the information.
- In my home-brew language Abundance, I found that a COBOL-style move corresponding was NOT quite what was
needed. I needed something of the form MOVE CORRESPONDING EXCEPT FOR… The
other problem was maintenance. When you add new fields, you need to examine all the move correspondings to
figure out just what should be done with the fields. It is easy to overlook them, because they don’t
explicitly name the variables affected. My proposal lets you ensure you have not forgotten to handle every
field.
Why Not?
- This form of MOVE CORRESPONDING cannot be expressed without dialog boxes that
dynamically examine the lists of fields in the two objects.
Primitive Object Methods
In an early version of Java, primitives were objects. This made the language simpler, but turned out to have too
much overhead. It may be possible to bring back part of this notion. For example, I think you should be able to
write y = x. sin() instead of y = Math.sin(x), and
i.toString() instead of Integer.toString(i).
Why?
- The new notation avoids mentioning a class that has nothing to do with the operation. In Integer.toString(i) there is no Integer object involved. It is
confusing to use a notation that implies there is.
- The notation is consistent with the way you handle objects. Why should there be different notations to do
the same thing?
- It may open the door to primitives even more like true objects, where you can create subclasses of integers
with type safety checking.
Why Not?
- You still could not define your own methods to act on primitives as objects. Until there was a way to
subclass from primitives, the notation would only be useful for built-in methods.
- It is good to have separate notations for methods on objects and primitives. It helps make the distinction
between objects and primitives clear.
Short constant names
In
MyClass x = new MyClass();
x.doSomething( MyClass.ACONSTANT1, MyClass.ACONSTANT2 );
In Bali, you could abbreviate that:
MyClass x = new
MyClass( );
x.doSomething(ACONSTANT1 , ACONSTANT2);
The MyClass prefix is used to help define unknown constants mentioned between the ( ) surrounding the parameters
invoking a method of a class.
Why?
- Saves typing the class name over and over.
Why Not?
- There are conditions where the name ACONSTANT2 could be ambiguous. It may be better to disambiguate it
always, rather just than when it is needed.
- Extending the scope of a class to parameter calls to its methods is just too weird.
Spaciousness
I suggest the following simplification to the compiler’s parser (the tokenizer actually). All operands and
operators must be separated by a space, with the following exceptions: comma, unary +, unary -, ++, --, ., ;, (),
[]
In Java you might write:
x=a++--b*-(c++<<1);
in Bali, you would have to write that as:
x = a++ - -b * -(C++ << 1);
Why?
- it gets rid of potentially ambiguous (to humans) strings of + and -.
- it makes code more readable.
- It removes one more temptation to write unmaintainable code.
- it speeds compilation.
- it opens the door to Forth/Abundance-like user defined operators and methods with almost arbitrary names.
You could use alpha, numbers, any punctuation, any Unicode chars except ()[]+-,.space,; You might use
characters such as these for your new user-defined operators:
| Unicode Characters to Use In User-Defined Operators |
| \u2200 .. \u22ff |
mathematic symbols |
| \u2600 .. \u26ff |
miscellaneous symbols |
| \u2460 .. \u24ff |
enclosed alphanumerics |
| \u25a0 .. \u15ff |
geometric shapes |
| \u2100 .. \u214f |
letter like shapes |
- The alternative is to break the expression up and name the pieces. Then the relationships between the parts
are symbolic/verbal rather than visual. With various SCID-like tools to format large expressions, they
become comprehensible, more comprehensible than traditional code with named subexpressions. If
you don’t believe me, perform an experiment. Your visual bandwidth is much higher than your verbal
bandwidth.
Why Not?
- It takes up more screen space and takes more keystrokes.
- Cramming would be a very hard habit to break.
- You simply shouldn’t write such hard to read expressions.
Divide and Modulus on Negative Numbers
If you divide two numbers, you can do four things with the result:
round
- useful for approximating e.g. speed = round ( distance / time )
ceil
- useful to calculate b = ceil( n/m ) how many bins b, each of which can hold m
items, are needed to hold n items.
floor
- useful to calculate i = floor( i/m ) which bin number b the item i falls into
when each bin can hold m items.
trunc
- Java style division. No known use.
| Signs |
Java division |
Bali Division |
Java Modulus |
Bali Modulus |
| + + |
+7/+4=+1 |
+7/+4=+1 |
+7%+4=+3 |
+7%+4=+3 |
| - + |
-7/+4=-1 |
-7/+4=-2 |
-7%+4=-3 |
-7%+4=+1 |
| + - |
+7/-4=-1 |
+7/-4=-2 |
+7%-4=+3 |
+7%-4=-1 |
| - - |
-7/-4=+1 |
-7/-4=+1 |
-7%-4=-3 |
-7%-4=-3 |
Bali uses floored division. Bali takes the next lowest integer if the quotient is fractional. In Bali, th
remainder has same sign as divisor. In Bali the absolute value of the remainder is always less than the divisor.
Why?
- Java division happily has the Euclidean property, namely when you multiply the
quotient by the divisor and add the remainder you get back to the dividend. Bali also has that property.
- Java’s modulus behaves, well, strangely. In Java, the sign of the
remainder follows the dividend, not the divisor. Be especially careful when corralling random numbers into a
smaller range with the modulus operator. It can produce a negative result! For example when you ask for modulo
3 in Java, you will be astounded to sometimes get a negative answer outside
the range 0..2. This is beacuse java modulus follows the sign of the dividend, not the divisor. Bali conforms
with the principle of least astonishment by always returning a result that follows the sign of the divisor, not
the dividend.
- So often in Java code I have to write ifs
surrounding the % to get the Bali effect. I have never coded a case where I
wanted the Java effect.
- Every language I have encountered defines the way integer and modulus work for negative numbers in a
different way. The only one I found useful in practice was Forth’s floored approach. In every other
language I found myself handling negative cases specially doing the arithmetic on the absolute values. In
BigDate you can see examples where floored divide/modulus would
simplify code.
Why Not?
- The problem with the approach is hardware usually does not work this way. Implementing this convention in
software would slow down code for the usual all positive case.
- You can’t change rules like that in midstream. It may introduce subtle bugs in existing code.
Explicit Override
In standard Java, the rules of inheritance and overriding depend on both static/ instance and variable/method.
Methods override, variables shadow.
In Bali, the rules are more uniform. Methods override, but you must declare them as overriding to ensure you
don’t do it by accident (e.g. if the base class later adds a clashing method.) Constants override.
Variables may not be overridden or shadowed. It just causes confusion. In Java, if you intend to override a
method, and get the signature slightly different, or slightly misspell the method name, the compiler will not
warn you. Your new method will just be effectively ignored. This is particularly a problem in overriding methods
in adapter classes. In Bali, if your specify override and there is no matching method
to override, the compiler will warn you.
You might write something like this:
class X
extends Y
{
override const int AnomalyYear
= 1582;
override public int getMonth ()
{
return month + 1;
}
original public int getMoonPhase
( )
{
…
}
}
Why?
- Lets you create a subclass, reconfiguring the tweaking constants of the base class. If you use the base
class you get the original tweakers, if the new class, the new tweakers.
- You can’t accidentally override or shadow.
- Rules are more uniform. Confusion over the existing rules are a source of bugs.
- It warns you that an apparently useless method is actually being used by some possibly distant ancestor
class.
Why Not?
- You can’t fiddle with something this basic to a language.
- The explicit override and original declarations just clutter the code. They won' t help safety, because
most programmers will not bother to declare either way.
- Final currently means both do not override and "no changes to the value after
definition. This change requires a redefinition of the meaning of final. Final would then mean do
not override. The new keyword const would mean no changes to the value after
definition.".
- Stealing const for this purpose may block other more imaginative uses of the keyword in future.
- Shadowing is just an extension of the local variable principle. If you get rid of it you destroy
encapsulation. Changes to one module force changes to an unrelated one.
Sort Interface
In Abundance, you can sort an array or file with a piece of code like this:
BY DESCENDING Salary ASCENDING Hire-Date SORT
Abundance uses postfix notation. You don’t even need to specify the name of the array or file, since the
compiler can easily deduce that. I would like something similar for Bali like this:
HeapSort.sort(anArray, desc salary asc hireDate);
As it is, in Java, among several other things, you need to compose a new
delegate
Why
- The current code is voluminous, hard to write, and even harder to proofread. Changing the sort keys should
be a trivial maintenance task.
- The current technique suffers from name pollution.
- By treating multi-key compares as special case, it may be possible to generate more efficient code for the
three way split on the compare result for each key. It might be possible to avoid the heavy overhead of the two
cast checks on each compare.
- Even COBOL does better than Java.
Why Not?
- A sort statement just does not fit into the Java model. Even for a single object, you might need several
different compare routines.
- RadixSort needs a quite different sort of compare routine.
- It would force a standard interface on all sort routines.
- You could get the same effect with a smart editor that generates the compare code for you.
- In Abundance each of the 50 primitive types has a standard compare routine. Java originally had no
equivalent, then it acquired the Comparable interface to define the natural sort order for each type. You
can’t tell from a Java String declaration how case should be considered in comparing, how accented
characters should be collated etc. Since the String class is final, you can’t create such distinctions
with subclassing.
- The feature should be more general than just for an array sort. Therefore the type needs to be specified
explicitly, e. g. comparing Employee desc salary asc hireDate.
- To use this syntax you would need the collating fields to be public.
- The new syntax should also allow functions as collating fields.
Units of Measure
Java treats all ints as equivalent as far as type safety is concerned. There is no way to subclass int to create
subcategories. You want additional type safety on your ints, floats and doubles:
- For enumerated types to make sure you don’t mix fruits and vegetables.
- For units of measure so you don’t feed kilograms to a parameter measured in meters.
- For subranges, so that you assure that a variable in within a given range, perhaps throwing an exception or
coralling it into range if it is not.
If you put in a units checking scheme, it is not hard to add a dimensionality check that formulas balance in
terms of whether they measure mass, length etc on both sides of the =. It is also not hard to add automatic unit
conversions to deal with any mismatches that are still dimensionally correct.
Why
- Java has this fancy pants type system, but most type errors are in dealing with enums. Java makes
absolutely no attempt to deal with enum type safety.
- Units of measure would make Java appealing to engineers, both for safety and convenience.
Why Not
- Java is for computer scientists. We don’t want grubby engineers using it.
- There exist type safe enum kludges. So what if they are incompatible with case labels. Let them use
ifs.
- Java is a sissy enough language. Manly programmers eschew type safety.
- There is already a type system for Objects in Java. Surely it should
deal with this. You should not have to add a totally separate type system for primitives just to deal with
enums, subranges, units of measure and design by contract.
- Let the wimps use preprocessors if they want extra type safety. None of this would affect the JVM (Java Virtual Machine) anyway.
Don’t muck up our nice clean language with new syntax I don’t want to have to learn. If it was good
enough for Kernihan, it should be good enough forever.
Credits
Many people helped formulate these ideas, usually by pointing out flaws in my proposed designs. I have only
started giving credit well after the page was started. Thus early contributors are unsung.
- Patricia Shanahan <pats@acm.org>
- Richard Freedman <rich_freedman@chiinc.com>
- Andrew R. Thomas-Cramer <artc@prism-cs.com>
- Nasser <nabassi@pacbell.net>
- Tim Tyler <tt@cryogen.com>
- Mr. Tines <@ravnaandtines.com>
- Luke Webber <luke@webber.com.au>
- Chris Gray <cg@ami-cg.GraySage.Edmonton.AB.CA>
- Charles (Russ) Lyttle <lyttlec@flash.net>
- Steve Bellenot <bellenot@math.fsu.edu>
- Edwin Guenthner <edwin.guenthner@gmx.de>
- Chris smith <cdsmith@twu.net>