Bali is my hypothetical evolutionary replacement for Java to highlight
problems in Java that may lead to syntax
changes itself.
Bali — Java with a Spoonful of Syntactic Sugar
Bali is a proposed superset of Java. I wrote this essay in the days of JDK 1.0.
Some of the features I requested, such as enums, are now part of Java. In light
of how Java has evolved, I would revise the precise syntax of some of these
suggestions to be more congruent with where Java is now. It takes ideas from
various other languages. The intent is to create a language forapplication
programmers that is safer, terser, easier to read and maintain than standard
Java. Terseness makes the language easier to read, write and maintain. The ideas
come from Abundance (See Byte Magazine October 1986),
Eiffel, Pascal, Delphi, Smalltalk, PL/I, Algol-68, Forth and even COBOL.
Not all the ideas are compatible with each other. They are intended to stimulate
discussion on language evolution, not as a formal definition of a new language.
I have ordered them so that the ones least likely to be controversial are first.
Since this was written, Java has implemented some ideas similar to my
recommendations, such as enums and a foreach loop. When you read those sections,
realise I am not advocating changing the details of the new Java implementations.
I invite you to submit your own ideas for inclusion.
The Bali/Abundance Philosophy of Language Design
Simplicity is wonderful thing, if not taken to extreme. Some Nazis decided that
after they had conquered the world, they would need a simplified German for the
conquered masses to learn. A watermelon became a "large green egg-shaped
water fruit".
When a language (computer or natural) lacks a specific way to express an idea,
people will kludge a million variants. An attempt at simplicity leads to a
confusing tower of Babel (e.g. kludged enums in Java
).
Human languages have thousands of words and a rich syntax. Computer languages
are ridiculously impoverished in comparison. I suggest maintaining computer code
would be much easier if we were much more liberal with syntactic sugar, so there
were terse standard ways of solving standard problems such as specifying
the keys to a sort, rather that writing pages of esoteric, eccentric bubblegum
to compare each field in turn.
Further languages should be pronounceable. To the human mind,
unpronounced punctuation is just so much annoying fluff. Note that in natural
languages, people make far fewer mistakes in word choice than punctuation. As we
move away from the keyboard toward voice control (Dragon
Naturally Speaking) this becomes ever more important.
Computer languages should be pronounceable, so that you can think about them
aurally in the mind, and talk about them with others over the phone. This means
reducing the amount and pickiness of the punctuation in computer languages. I
waste an hour every day balancing () and {}. Much of this could be avoided by
computer language design that was closer to natural speech.
In
his book, Pinker points out that the human brain is hardwired with grammar
processing wetware. I wonder if with our supposedly simple artificial computer
languages, we inadvertently bypass our hardwired syntax parsers. No wonder we
make so many programming mistakes. We should be aiming for languages more like
natural spoken languages that don’t depend so much on fussy punctuation.
Natural spoken languages evolve with new vocabulary and short-forms introduced
on-the-fly.
Every time I bring up these ideas publicly somebody will argue that I wish to
destroy Java by guilding the lily and needlessly “complicating” it.
I laugh to myself at such pompous asses, much as I would chuckle at a child who
pontificated on the relative merits of sex versus masturbation when he had only
tried one (or neither). Simplicity is a romantic notion. Living without
electricity is certainly "simpler" but it is a heck of a lot less
convenient, and a heck of a lot less productive. The code I advocate is orders
of magnitude simpler than the low level futzing that passes for programming
today. I have tried both. I promise you, the old way is horrible in
comparison. You will never want to go back, other than perhaps to rescue others
from their ignorant misery.
You will see these thoughts influencing the way I am pushing Java to evolve.
Variable Sized ( ) Display
In Java a piece of code might be displayed like this:
int a = ((b+c)/(e+f))*(g(i)+h);
That some piece of code displayed in Bali might look like this:
int a = ((b
+ c)/(e + f)) *
(g(i) + h);
The red is just to highlight the outsized (), though colour coding matching ()
and {} is not such a bad idea.. It might even be optionally displayed like this:
b + c
int a = ----- * (g(i) +
h);
e + f
Why?
- The variable size parentheses make it easier to visually balance. This is not a
change to the language, but how the language is presented. I find such concerns
as whether we are talking about a change to the core language, the standard
libraries, or the way the SCIDs present code is
immaterial. What counts is how the language presents itself to the average
maintenance programmer. She sees it as a seamless whole.
- The alternative of named expressions means the links between the pieces are
abstract, not visual.
Why Not?
- When you get deep nesting you need extra space between lines.
- You should try hard to not use complex expressions needing this feature. Split
them into subexpressions and assign them to temporary variables or extract them
to methods with meaningfull names.
Assertions
Assertions should be brought over from Eiffel with as little modification as
possible. Better minds that I could concoct a more complete Java assertion
syntax. Eiffel assertions are much more extensive than I have limned here.
public int corral
(int low
, int value
, int high)
{
require low <
= high;
ensure low <
= result &
& result <
= high;
if value <
= low then return low; fi;
if value >
= high then return high; fi;
return value;
} end corral
Why?
- Assertions formally document the preconditions on a method’s parameters.
Whose job is it to handle invalid data, the caller or the callee?
- Assertions formally document the postconditions on what the method does, and
what changes it makes on this.
- Assertions constrain the behaviour of overriding methods.
- When turned on during debugging, assertions help flush out bugs.
Why Not?
- Assertions sometimes fail. Then what. Eiffel has a whole recovery system quite
different from try, throw, catch, finally. It would not graft well onto Java.
Canonical Counting For Loop
for i to n step 1 {… }
is just shorthand for:
for ( int i=0; i<n; i++ )
{
}
With the exception it works if n is Integer.MAX_VALUE, and the step clause is
optional if the step is 1.
Why?
Java for loops as they stand have some problems:
- They don’t work properly when the limits involve Integer.MAX_VALUE or
Integer. MIN_VALUE.
- The end condition traditionally is expressed with < n. Sometimes code
sneakily reads <= n, which is easy to misread.
- Innocent looking for loops that masquerade as a canonical for
(int i=0; i< n; i++) can actually do something quite different.
- They are needlessly verbose for the canonical case.
Why Not?
- It might be confusing to have two different ways to do the standard for.
Relaxed Semicolon Rules
I would like to relax the rule on semicolons. If a compiler can detect a
semicolon is "missing" surely it can logically insert it. Similarly,
if it can determine a semicolon is "extraneous", it can logically
remove it. For example, a semicolon before a } would become optional. Similarly,
the compiler would not freak about an extra semicolon at the end of a for, e. g.
This would be perfectly legal:
for (int
i =0; i
<n; i
++ ;
)
{
a.doSomething();
a.doSomethingElse()
_
}
A compiler might even forgive this with just a warning:
if (i
<
0
)
{
a.doSomething()
_
a.doSomethingElse()
}
A tidying program could insert and remove semicolons to some canonical standard
before saving code in the repository. The canonical form would likely have
semicolons before } just as now, to make it faster to insert a new line of code
at the end of the block.
Why?
- Even experienced programmers waste hours every week just finding and fixing
these pedantic little semicolon errors. It is ridiculous to pay people $100
per hour to manually find and fix semicolons, something a computer could easily
do automatically.
- Code with fewer semicolons is actually easier to read.
Why Not?
- Programmers shouldn’t make mistakes. Experienced programmers don’t
make those kind of mistakes.
- I almost never make semicolon mistakes. Therefore the
feature is useless.
- It just encourages sloppy thinking.
- It will just start rwars about the canonical form.
- Real men don’t need computers to help them get their semicolons correct.
- You make semicolon errors impossible by using a SCID.
There is no need to change the language definition.
- The compiler might think there is only a semicolon missing - but inserting it
might just hide the real problem. Of course the compiler could ask to insert the
semicolon for you - in fact there are compilers doing this, for example
VisualAge for Java.
Optional Named End Statements
Java tends to overuse the { } characters. It is so easy to get them unbalanced.
The compiler is no help on finding the imbalance. Further, even when they are
perfectly balanced, it is hard to match them up by eye, especially when they are
widely separated or when the source has not been through a tidier to align them.
To solve this problem, Bali takes a leaf from PL/1 and Algol-68. I don’t
dare yet propose a solution as radical as Abundance uses.
In Java you might write:
public void aMethod ( int aParm )
{
for ( i=0; i<aParm; i ++ )
{
System.out.println(i);
}
}
In Bali you may add optional end statements: like this:
public void aMethod
(int aParm)
{
for ( i
=0; i
<aParm; i
+ + )
{
System.out.println
(i);
} end for
} end aMethod
The keywords you may use after end include:
- the name of the current method
- the name of the current class
- the name of the current loop
- for, then, else, while, switch, case, class, method, init, try, catch, finally.
If you use end statements, the compiler checks to ensure the preceding } does
indeed match that syntactic element.
Why?
- the end statements act as documentation to make the program easier to read.
- they are guaranteed accurate, unlike similar comments.
- They could be automatically generated by SCID-like tools.
- They would be automatically renamed along with the method by tools like VAJ
global method rename.
- they help the compiler generate more accurate error messages for unbalanced { }.
The compiler can nail down precisely where the mismatch occurred.
- they are shorter to write than the corresponding comments.
- they make it more likely that when you insert new code into an existing program
you will put it in the correct place, particularly adding a new method to the
class just before the final } of the class.
- they are likely to flush out {} nesting errors that technically balance, but
which don’t do what you intend.
- The alternative is to break the code into separate named methods. Then the links
between the pieces would be symbolic/verbal, rather than visual. With a SCID you
can collapse blocks so that the stringiness of inline code is not such a problem.
Why Not?
- end xxxx is quite verbose just to mark the ends of blocks. It might be
wiser to use more compact icons to mark begin and end.
- If you can’t easily match brackets by the first look, refactor your code
so that you can. Use an automatic code formatter to align them properly. If
there is too much code between them, extract parts of it into its own method.
- With a SCID, matching begin-end elements can be done visually. There is no need
for names.
Explicit modifiers
Permit the use of the modifier instance to explicitly declare a variable
or method as non-static. Permit the use of the modifier package or friend
to explicitly declare a variable or method as not private, protected or public.
Why?
- In Java, you can declare a method or variable static, but there is no way to
explicitly declare it not static, i.e. instance. The lack of a declaration could
be an oversight, or it could be deliberate. You can’t tell. Similarly for
friend /package scope.
- There is currently not even a vocabulary to talk about friend visibility.
- There is no target string to search in the source code for friend and instance
methods.
- It makes people think about whether a method should be/is static or instance. So
often programmers are puzzled when they use instance methods as if they were
static. The instance keyword would jog their memories.
Why Not?
- There are already scads of modifiers. New modifiers would just clutter code.
Explicit modifiers would only make sense if you selected the modifiers with
radio buttons in the source code editor.
Extended CASE labels on
SWITCH : SELECT WHEN
In Bali case labels can have the following forms:
- integer constants
- ranges, e.g. 300..500
- strings
- variables (including objects compared with .equals). In regular Java it is not
good enough to have constants for your case labels known at load time, you need
ones known at compile time.
- boolean expressions, e.g. isValid(x), or regular expressions that evaluate to a
boolean.
To avoid confusion with the current SWITCH I suggest a new set of keywords:
select x
{
when selected >
1000 :
out("huge"
);
when selected %
2 == 0 :
System.out.println ("even"
) ;
when -100
.. -10,
0, +10
.. +100 :
System.out.println ("boring"
) ;
System.out.println("Note:
no braces needed for each case" );
System.out.println("Note:
no fallthrough" ) ;
other :
System.out.println("something
else" ) ;
}
Conceptually each when clause is executed in turn too
see if it is true. However, usually the code can usually be compiled much more
efficiently than that with a combination of nested if, binary search for range,
jump tables, delegate arrays, hashtables and indexed lookup tables. The code is
both easier to read and faster than traditional switches. See my student
project on extended case where I go in more detail on implementation.
If you leave off the other clause, you will get a
runtime exception, or perhaps a compile time error if a value falls out the
bottom unaccounted for. If you want nothing to happen, you must have an explicit
empty other clause. You might also implement decision
tables that generate similar logic.
Why?
- Maintenance is safer. The equivalent if statement would redundantly
specify the switch variable. This allows accidental divergence (what if one of
the cases accidentally specifies a different switch variable?) and cheaper
maintenance (the switch variable can be modified in a single location).
- It is easier to proofread code to see that all cases have been handled. By
leaving out the other clause, you can get the compiler
and runtime to help you ensure you have covered all cases.
- It is easier to read code to associate the actions with the conditions.
- It is terser than the equivalent ifs or individual case labels.
- The other case can be used to ensure no cases were overlooked. Granted you can
do this with else, but it is much easier with nested ifs to have a
leak — a case that is never handled.
- A compiler might do better at optimising because of the additional structure
over the equivalent nested ifs. For example, for large range bands that
are contiguous, the code need test only one end of the range per band, not both
as would typically be done in nested if. The compiler might generate a
binary search to determine range band. That sort of code implemented in nested if
would create unmaintainable code.
- Ideally you want to avoid complicated case or nested if logic, but there are
times when there appears to be no other way to handle it. You need every tool
possible to help tame the mess and make the code comprehensible. There be
dragons in such places even if they are rare.
Why Not?
- It is fairly easy for a case to accidentally fall between the cracks and be
handled by other.
- Compilers don’t know enough to safely generate optimal code for the cases.
Humans writing code know more and can therefore generate faster code even if the
code is harder to follow.
- It is cool to have ifs nested so deep that it doesn’t fit on your screen
any more, even though you indent only one character per if-statement. See How
to Write Unmaintainable Code.
- Proper use of design patterns should make most of complicated nested if
statements unnecessary.
C#-like JavaBean Properties
Bali properties could look like Eiffel, Delphi or C#. I think the best approach
is C#.
In Java you write:
In Bali you would write that as:
class MyClass
{
published int height
get
{
return mHeight;
}
set
{
if ( 50
< value && value
< 275 ) mHeight =
value;
}
private int mHeight
}
x
.height
++;
published is a sort of super public declaration that means this
property should be visible in the beanbox. To provide read/write access you
write the get/set code. To deny read/write access you
leave out the corresponding get/set code. To the
clients of the property, the property looks like an ordinary public member
variable.
get, set and value are reserved words. get and set
introduce accessor/guard routines for internal variables. If you leave out the
get or set clause, you suppress the ability to read or write the property.
Why?
- Properties let you convert a public variable into a pair of guard functions
without having to change the client code.
- Client get/sets written with properties are much easier to read and write. They
look just like ordinary variables.
- In situation where it is important to be able to tell if an identifier
represents a variable or a function a SCID could come to the rescue. It could
colour code them, or even expand properties back to function notation. You could
also use a naming convention to help in distinguishing access to a private
member variable directly and via the property within the class. You would call
the private member mSize and the property pSize. All members begin with m and
all properties with p. See coding conventions.
- This syntax makes a clear distinction between external properties and internal
variables needed to implement them. There is no overloading of names with the
ensuing confusion. There is no problem controlling visibility independently.
Why Not?
- Property already has a meaning in Java —
a list of keyword/value pairs. Some other name such as attribute should
be used for such a feature. Perhaps they should be called pseudovariables.
- Clients should be aware whether they are using a function or a variable.
- Competent programmers never use public variables. They set them up as functions
right off the bat even if the routines are just dummies. Thus there is no need
ever to flip client code when a public variable changes to a function.
- Perhaps it might be better to simply allow functions without arguments to be
written without trailing (). That would allow you to convert a variable to an
accessor function without changing any client code without coding any extra
declarations.
- Properties are an attempt to invent PL/I-style pseudo functions, functions that
can be used on the left hand side of the = sign. Another way of looking at the
problem, properties are an attempt to allow you to override the effect of the =
operator. Perhaps instead of properties, we should invent a more general
mechanism that allows overloaded functions that can be used on either side of
the = sign, with multiple arguments. Then we would have a way of dealing with
tuples returned from a function. If a function can have multiple inputs, why
should it not have multiple outputs? If a function can accept several variables
to access, surely a function should be able to accept several variables to store.
A more general mechanism like this would lead to simpler syntax for accessing
the Vector methods.
- If you are debugging and tracing code, you want to know if an identifier
represents a variable or a function.
- The existence of language support for get/set methods applying to private data
may encourage treatment of classes as structs with get/set methods for every
physical data member. This is a common misconception even without language
support, especially among beginners; fostering it would be harmful.
The Grand Collection Unification
Containers of all types should have a uniform and simple syntax for manipulating
them. The concept of container should be expanded as it is in Abundance to
include files, Hashtables, b-trees (both ram and disk based), maps, sets, arrays,
ArrayLists and any other Collection you could imagine. There are three basic
operations: get, set and iterate. I suggest the following uniform syntax to
apply to all Map-like things, and set-like things where applicable. I will use a
Hashtable as an example:
Hashtable()
animal;
animal["cow"
] = "moo"
;
animal["pig"
] = "oink"
;
String noise =
animal[ "cow"
];
if (noise
== null
) noise = "don’t
know" ;
for each key in animal
{
System.out.println
(key + "
says " + animal
[key]);
}
Why
- It makes it easy to change your mind later how to implement a collection. All
you need do is change the declaration. All the rest of the code stays the same.
- The code for file handling is very much simpler.
- It makes the language easier to learn.
- The code is easier to proofread.
Why Not?
- This permits functions of a single variable on the left hand side of the equal
sign. Surely you should be able to have multiple selector arguments. You might
as well generalize the notion of properties to allow them to have 0 to n
selectors, or generalise all methods to have both a right hand side producer
implementation and an optional left hand side consumer implementation. You might
as well use the familiar () syntax, e.g.
map(latitude, longitude) = "Johnny’s
Bar & Grill";
String restaurant = map(latitude,
longitude, slop);
- All collections code looks alike. You can’t tell if file I/O or an array
or an ArrayList is being manipulated. It isn’t natural!
- Iterators need to be smarter than that. You need ways of selecting subsets.
Enumerations
Java currently has no enumerations. There is the Enumeration interface, but that
is just an Iterator over a collection, not a language feature. I mean
enumerations in the sense of Pascal enum or Ada-95 enumeration type. Java has
only static final constants. There is no formal connection between groups of
constants, or between the groups of constants and the variables that use them.
There is no formal mechanism to distinguish the two kinds of enumeration
constants:
- enumeration integer constants that represent one of a possible list of choices.
- mask constants that can be ORed together to represent combinations of choices.
I suggest Java should properly support enumerations. This proposal is influenced
by Ada, Pascal and C’s enum, e.g. The type declarations would look like
this:
enum DayOfWeek { SUN, MON, TUE, WED, THU,
FRI, SAT };
set DaysOfWeek { SUN, MON, TUE, WED, THU,
FRI, SAT };
or in shorthand:
set DaysOfWeek of
DayOfWeek;
You could then declare a variable as:
DayOfWeek weekday = DayOfWeek.SAT;
or
DaysOfWeek weekEnd = DaysOfWeek.SAT | DaysOfWeek.SUN
Enums are much like classes or interfaces. They could be standalone or could be
declared inside a class or interface, and made private or public.
Inside the class you would reference the enum value by DaysOfWeek.WED,
(possibly just WED if there were no ambiguity),
outside TheClass.daysOfWeek.WED. The enum constants
could be used in case labels.
To handle enums that are individual bits I suggest a syntax like this:
set PizzaToppings { MOZZARELLA, PEPPERONI,
TOMATOES, GREENPEPPER, PINEAPPLE, GOUDA, CRANBERRIES, ANCHOVIES };
You could then declare a variable as:
PizzaToppings pizzaOrder = PizzaToppings.TOMATOES | PizzaToppings.
CRANBERRIES;
if (pizzaOrder & PizzaToppings.ANCHOVIES)…
enums and sets are types, and are type checked in parameter passing and
assignment. Currently Java’s static final ints have no type checking to
make sure you don’t try to add a Monday topping to your pizza.
You would have a cast of the form (DayOfWeek) or (PizzaToppings)
that converted a plain int into a member of that enum. It would raise an
exception if it did not match one of the official values of the enum or set. You
could also cast an enum to an int and a set to a long. You could cast an enum to
its related set but you could not cast a set back to an enum.
There is no formal super/subset relationship between different enums or sets.
There are a number of pseudo functions you can perform on enums. They have a
syntax that simulates both static and instance forms: e.g.
day = day.first
();
day = day.prev
();
day = day.next
();
day = day.last
();
day = DayOfWeek.first
();
day = DayOfWeek.prev
(DayOfWeek .WED);
day = DayOfWeek.next
(DayOfWeek .WED);
day = DayOfWeek.last
();
next and prev raise an
InvalidEnumException if you run off the end. Case labels may use enum constants,
pseudofunctions of enum constants, and enum ranges, e. g.
switch (day
)
{
case DayOfWeek.first
()
.dayOfWeek.WED :
dosomething();
break;
case DayOfWeek.next
(DayOfWeek.WED
) :
dosomethingelse();
break;
case DayOfWeek.last
()
:
dosomethingelseagain();
break;
default :
complain();
break;
}
Enum variables have the following operators defined on them +
int, - int, enum- enum, <, >, =, !=, []. Set variables have the
following operators defined on them: | & ^ ~ = !=.
Implementation
Internally, for efficiency an enum is an int 0..63 and a set is a long. Since
there is no inheritance, type checking could be done completely at compile time.
A standalone enum is implemented as an interface with a group of static final
ints. You might test the idea by implementing it with a preprocessor.
When you say PizzaToppings.ANCHOVIES the compiler
automatically selects the index or the bitmap representation depending on
context. This will get rid of one of the main sources of error in handling
enumerations in Java, confusion over whether
a enum constant is intended as an index or a bit map.
I debate with myself back and forth if you should be allowed to assign explicit
values to the enum constants the way you can in C++.
The disadvantage of doing that is you ruin the next
and prev functions. The advantage is it makes it
easier to interface with the outside world such as databases that use ints, not
enums. At this point in time I feel it best to let the compiler assign the
numbers always contiguous, starting at 0. If you need breaks in the ordering,
write a method to produce it.
Languages like Pascal manage to use the same enumeration constants in both enum
and set contexts. That trick is probably beyond the ability of simple-minded
Java compiler logic, though it would be preferable from the maintenance
programmer’s point of view.
I think by default you should be able to use your enums in either mode, either
as indexes or as bits in a set. The compiler should figure out which from the
context and should generate both forms internally.
Just how long should an enum be 8-bit index and 256-bit set?, 16-bit index and 8k
sets?
Have a good look at how enums work in other languages. They may have cleaner
solutions than I have suggested here.
Why?
- Enums have type checking. You can’t accidentally use the wrong enum
constant, or use the enum form where you needed the set form. Consider a class
like GridBagLayout that has all manner of groups of enum constants. Currently it
is very easy to use one from the wrong group, e. g. LEFT and WEST.
- Various kludges using enumeration objects won’t work in case clauses, don’t
have a simple predictable external representation, and have the extra overhead
of an object rather than a simple int.
- It ensures you don’t use a constant from the wrong enumeration group.
- It ensures you use an enumeration constant in the correct way, as a value or as
a bit.
- It makes maintaining code much easier. It is quite difficult without
enumerations to figure out which constants are legitimate to use with which
variables. My suggested scheme makes it completely clear, and recruit’s
the compiler’s help to ensure the rules are followed.
- The formal connection enables a debugger to display the enum value by name,
rather than just by value.
- Having associated integers allows comparison for order, storage in a database,
permanent values for use in native classes, and enumeration over all possible
values.
- By making the compiler assign the bit patterns for the enums without gaps means
that next and prev are well
defined and simple to implement. They can be used in for loops to enumerate over
all possibilities.
- You ideally want to view your code two different ways:
- method focus: How does each subclass implement this particular method? I want to
see all the implementations side by side.
- subclass focus: What are the particular behaviours of this subclass.
Enumerations work better for method focus. You have alls the code for all
subclasses right under your nose combined in one method. Subclassing works
better for subclass focus. With SCIDS, you could use subclassing but still get
the benefits of the old enumeration method focus. You could flip back and forth
viewing and coding either way.
Why Not?
- It may be too complicated to figure out the type rules. What sorts of arithmetic
are you allowed on the various enumeration values without casts to store them
back in variables? You want to avoid casts that could introduce errors. You
probably want + int, - int for regular enumerations, | & ~ for set style
enumerations. How much run type checking should be done to ensure enumeration
constants are always valid.
- Templates are coming. There will be better ways to kludge enumerations as
primitives that take only 20 pages of code. This will give Roedy a chance to
become famous by writing an amanuensis to generate all the bubblegum from a list
of the enumeration names.
- Enums are not primitives. They are not classes. They are a whole new thing.
Fitting an entire new class of animal into a language takes some deep thinking.
It has to get along with everything else. Ideally enums could be defined in a
way that did not require any changes to the class file format. They might only
need exist at compile time. At run time, they would be treated just like ints.
Enums are related to subranges and units of measure. Perhaps some more general
solution should be sought that handles all three.
- Enumerations are mostly used in switch-statements and the like. They don’t
make use of polymorphism. The Refactoring.com
site has better solutions: such as Repace
Type Code With Subclasses, State
Strategy, Replace
type code with Class.
Fetch
The fetch keyword just helps shorten a common idiom. I
find myself typing stuff like this over and over.
This pattern is so common it would be useful to have an abbreviation both for
typing and proofreading. maybe:
fetch motherObject.getLongComplicatedThingamagig();
Why?
It is faster to type and easier to proofread. One letter out in those three
repetitions, and the code has a totally different meaning.
Why Not?
It legitimizes the goofy getXXX convention for what should be Delphi-like
properties.
Augments
The augments keyword is much like the extends
keyword.
- It stops you from deliberately or accidentally overriding any method in the base
class.
- It lets you extend even final classes with additional convenience methods
without creating wrapper methods.
Why?
- Saves writing repetitive wrapper methods.
- Ensures methods are not accidentally overridden, especially in superclasses of
the immediate base class.
- Ensures methods are not accidentally overridden if the base class is later
updated.
Why Not?
- Perhaps this should better be handled with explicit override,
original, overridable
keywords.
via
The via keyword lets you implement an is-a interface
via a has-a reference to some class that implements it. It makes it much easier
to write wrapper methods. You might use it like this:
public class
DoIt implements SomeInterface
via
foo
{
private AClassThatImplementsSomeInterface foo;
}
Why?
- Saves writing repetitive wrapper methods to implement an interface with a has-a
reference.
- It reduces arthritis. You can change the interface without having to maintain
gobs of code.
Why Not?
- It muddles is-a with has-a logic a bit more than the purists would like.
- It encourages sloppy design, as it inspires to extend a class with functionality
without thinking about the actual responsibilities of the class.
Constructor Shortcut
In Java you would write:
BigDate d = new BigDate( 1997, 5 , 6 );
In Eiffel you would write:
d : BigDate;
d.Create( 1997
,05,06 );
In Bali you can use the usual Java syntax or this shortcut:
BigDate (1997, 5
, 6) d;
Why?
- You don’t have to write the name of the class twice.
- You are less likely to forget to initialise.
- It is impossible to create an object of the wrong type.
Why Not?
- The new syntax masks the fact you are creating a new object.
- The new syntax looks too much like an ordinary method call.
Iterate Shortcut
In Java you might write:
for ( CitiesByState iter = new CitiesByState(); iter.hasMoreElements();; )
{
City city=iter.nextElement();
println( city.population );
}
In Bali, you could abbreviate this to:
for each city in CitiesByState
{
println (city.population);
}
Why?
- it is a lot easier to read and proofread.
- there is less likelihood of error typing it.
- it opens the door for abbreviations to other for idioms.
Why Not?
- The Java scheme gives you extra typing practice and makes your keystroke per day
output higher.
Null Method Shortcut
In Java you might write:
if ( person != null ) person.requestPassport();
In Bali, you could abbreviate this to:
person..requestPassport();
or
if ( a != null && a.b != null && a.b .c != null ) d = a .b.c .doit();
else d = null;
would become
d = a..b..c.. doit ( );
Why?
- There is no possibility of a minor mismatch of the name tested in the if and the
variable used to access the method.
- Programmers are lazy and tend to ignore null cases. If you make it easy for them
to deal with them, they are more likely to keep them in mind.
- This syntax really starts to pay off when you have a chain of links, each of
which could potentially be null. You can even generalise it to return 0 or null
on encountering an null when a value is being computed.
- The alternative null Object pattern requires writing an immense amount of
bubblegum that essentially does nothing. If that could be mechanically generated
somehow, the null object pattern would be much more attractive. It potentially
also could be faster, since all the implied null tests, and most of the explicit
ones could in theory be eliminated. Vrroom!
Why Not?
- You might want to reserve .. for use in defining ranges.
- You should use the NullObject
pattern instead.
Conversions
In Java, casting is used for two quite different purposes:
- to request a conversion.
- to request that an object be treated as an instance of some sub or superclass
without actual conversion. You are merely reassuring the compiler the object in
question truly is already of the type asserted. You as programmer know
this must be so from the way the program logic works, though this is not
necessarily immediately obvious to the compiler.
There are literally hundreds of different ways of requesting a type 1 cast. The
simple (type) notation only works on a handful of cases. Type 2 casts are
uniformly done with (ClassName).
In Bali there are similarly two types of cast, but the way you request them is
totally uniform:
- (DesiredType) — for conversion.
- (is DesiredType) or possibly (as
DesiredType) — for treatment as sub/superclass.
You can even convert objects to primitives, objects to objects, and primitives
to objects with a type 1 cast.
How are the conversions to objects done? by looking for a constructor that takes
a primitive or object as the sole argument.
How are conversions from objects to primitives done? by looking for methods with
the names intValue(), longValue(), etc.
How are conversions from primitives to primitives done? conceptually by looking
for static methods with names like Convert.intValue(double); though in practice
these are handled inline as special cases.
Why?
- Simplification. Code is easier to write and read when there is a uniform way of
requesting conversions.
- accuracy. You are more likely to get the correct method.
- it makes it clear when data are actually being transformed.
Why Not?
- It requires changing existing code to insert the as or is.
- It makes no sense that . is a postfix operator but (cast)
is a prefix operator. It leads to complicated parenthesis forests. Perhaps we
need a new pair of postfix casts, e. g. (is DesiredType)
and (to DesiredType).
Generic Cast (convert)
I further suggest that the generic caster (convert) be
allowed to convert where the compiler knows the type of the target, e.g.
float x
= 1.0f;
String s
= ( String)
x;
String t
= (convert ) x;
String u
= String .valueOf
(x );
Why
- If the types of x or t change, e.g. x becomes a double, all the code
automatically adjusts to suit. You only need change the declarations. The
cardinal rule of writing maintainable code is that you specify each fact in one
and only one place. Java flagrantly violates that rule in three places: casts,
conversion functions, and primitive temporary variables that don’t track
the type of corresponding major variables.
- When you are writing code you don’t need to constantly be quite so aware
of the precise types of each variable.
Why Not?
- It makes it harder to add new types and conversion functions. Any such won’t
be built-in..
Symmetrical Then/Else
In Bali, the if is more like Eiffel’s. The C-style if is deprecated, but
still supported. In Java you might write:if ( (a = b== c ) || (d== e ) )
{
f= g;
r= s;
}
else
{
f= h;
r= t;
}
In Bali you would write that as:
if (a = b == c) || (d == e)
then f = g; r = s;
else f = h; r = t; fi;
Why?
- the then and else are more symmetrical. The true and false actions align which
enhances comparing their similarities and differences.
- it is easier for a human to pick out where the condition ends and the true
action clause starts.
- it helps reduce the parenthesis () forest in the condition by one level.
- it helps reduce the brace {} forest in the actions by one level.
- If you switch from a simple if action to a complex one, you don’t have to
add {}. The syntax is consistent.
Why Not?
- then fi is more keystrokes than () { }.
- fi is an ad hoc end for an if. There should be a consistent system where every
block type has a unique matching begin end pair. You can’t very well use
that fi technique for rof, chtiws, esac, elihw, yrt, hctac.
Up Front Declares
You might write something like this:
dcl maxCount : int,
s : String,
myObject : SomeUserClass;
Why?
- In Java, you have to plough through a great string of symbols before you find
the name of the thing being defined.
- In Java there is no target string you can search for to find declarations.
- With this scheme, declarations align.
- It lets you find declares in listings and email posts where your SCID tools are
not available.
Why Not?
- It is too fundamental a change.
- It does not extend properly to allow definitions in the middle of expressions.
- It takes more keystrokes.
- The initialisation syntax is clumsy. It is handled separately from the
declaration.
- Without changing the Java syntax, a visual editor could solve the problem by
displaying a series of declarations aligned, possibly with some detail
suppressed. Similarly a visual editor could bold face the identifiers being
defined to make them easier to pick out. Java’s declarations are quite a
bit simpler than C and C++’s. Such
drastic measures are not needed in Java.
Explicit Concatenation Operator
I suggest that + for concatenation be deprecated, and soon outlawed altogether.
In its place we will use |||.
Why?
- In Java, + does double duty: addition and concatenation. + on an int sometimes
means add, sometimes concatenate, depending on what surrounds it. That is just
plain Mickey Mouse, not to mention hard to read and bug inducing.
Why Not?
- It takes three keystrokes instead of one.
Move Corresponding
I propose something similar to the COBOL MOVE CORRESPONDING
that cannot be expressed in the usual linear language syntax. It requires dialog
boxes.
A MOVE CORRESPONDING should display a dialog box with
a list of fields that have matching names in object a
and b. Beside each name is a little checkbox that can
have four states:
- copy
- ignore
- use custom copying code, e. g.
switch (a.x) {
case 1: b.x = 10; break;
case 2: b.x = 15; break;
default: b.x = a.x; break; }
- have not decided yet. (this generates a syntax error on compile).
Simply adding a new field to either a or b
will create new entries in all MOVE CORRESPONDING
tables (marked not yet decided). The editor also lets you view any leftover
fields from a and/or leftover fields from b.
Why?
- Lists of fields copying from one object to another is common for versioning. You
rearrange fields, add fields and need to convert the old files to the new format.
In Java, SQL handles this problem automatically for rows. You still have the
problem of dealing with updating old objects or records in flat files. You also
have it when you export a file of some subset of the information.
- In my home-brew language Abundance, I found that a COBOL-style move
corresponding was NOT quite what was needed. I needed something of the form MOVE
CORRESPONDING EXCEPT FOR… The other problem was maintenance. When
you add new fields, you need to examine all the move correspondings to figure
out just what should be done with the fields. It is easy to overlook them,
because they don’t explicitly name the variables affected. My proposal
lets you ensure you have not forgotten to handle every field.
Why Not?
- This form of MOVE CORRESPONDING cannot be expressed
without dialog boxes that dynamically examine the lists of fields in the two
objects.
Primitive Object Methods
In an early version of Java, primitives were objects. This made the language
simpler, but turned out to have too much overhead. It may be possible to bring
back part of this notion. For example, I think you should be able to write y
= x. sin() instead of y = Math.sin(x), and i.toString()
instead of Integer.toString(i).
Why?
- The new notation avoids mentioning a class that has nothing to do with the
operation. In Integer.toString(i) there is no Integer
object involved. It is confusing to use a notation that implies there is.
- The notation is consistent with the way you handle objects. Why should there be
different notations to do the same thing?
- It may open the door to primitives even more like true objects, where you can
create subclasses of integers with type safety checking.
Why Not?
- You still could not define your own methods to act on primitives as objects.
Until there was a way to subclass from primitives, the notation would only be
useful for built-in methods.
- It is good to have separate notations for methods on objects and primitives. It
helps make the distinction between objects and primitives clear.
Short constant names
In Java you must write:
MyClass x = new MyClass();
x.doSomething( MyClass.ACONSTANT1, MyClass.ACONSTANT2 );
In Bali, you could abbreviate that:
MyClass x = new
MyClass( );
x.doSomething(ACONSTANT1
, ACONSTANT2);
The MyClass prefix is used to help define unknown constants mentioned between
the ( ) surrounding the parameters invoking a method of a class.
Why?
- Saves typing the class name over and over.
Why Not?
- There are conditions where the name ACONSTANT2 could be ambiguous. It may be
better to disambiguate it always, rather just than when it is needed.
- Extending the scope of a class to parameter calls to its methods is just too
weird.
Spaciousness
I suggest the following simplification to the compiler’s parser (the
tokeniser actually). All operands and operators must be separated by a space,
with the following exceptions: comma, unary +, unary -, ++, --, ., ;, (), []
In Java you might write:
x=a++--b*-(c++<<1);
in Bali, you would have to write that as:
x = a++ - -b * -(C++ << 1);
Why?
- it gets rid of potentially ambiguous (to humans) strings of + and -.
- it makes code more readable.
- It removes one more temptation to write unmaintainable code.
- it speeds compilation.
- it opens the door to Forth/Abundance-like user defined operators and methods
with almost arbitrary names. You could use alpha, numbers, any punctuation, any
Unicode chars except ()[]+-,.space,; You might use characters such as these for
your new user-defined operators:
| Unicode Characters to Use In User-Defined Operators |
| \u2200 .. \u22ff |
mathematic symbols |
| \u2600 .. \u26ff |
miscellaneous symbols |
| \u2460 .. \u24ff |
enclosed alphanumerics |
| \u25a0 .. \u15ff |
geometric shapes |
| \u2100 .. \u214f |
letter like shapes |
- The alternative is to break the expression up and name the pieces. Then the
relationships between the parts are symbolic/verbal rather than visual. With
various SCID-like tools to format large expressions, they become
comprehensible, more comprehensible than traditional code with named
subexpressions. If you don’t believe me, perform an experiment. Your
visual bandwidth is much higher than your verbal bandwidth.
Why Not?
- It takes up more screen space and takes more keystrokes.
- Cramming would be a very hard habit to break.
- You simply shouldn’t write such hard to read expressions.
Divide and Modulus on Negative Numbers
If you divide two numbers, you can do four things with the result:
round
- useful for approximating e.g. speed = round ( distance /
time )
ceil
- useful to calculate b = ceil( n/m ) how many bins b,
each of which can hold m items, are needed to hold n items.
floor
- useful to calculate i = floor( i/m ) which bin
number b the item i falls into when each bin can hold m items.
trunc
- Java style division. No known use.
| Signs |
Java division |
Bali Division |
Java Modulus |
Bali Modulus |
| + + |
+7/+4=+1 |
+7/+4=+1 |
+7%+4=+3 |
+7%+4=+3 |
| - + |
-7/+4=-1 |
-7/+4=-2 |
-7%+4=-3 |
-7%+4=+1 |
| + - |
+7/-4=-1 |
+7/-4=-2 |
+7%-4=+3 |
+7%-4=-1 |
| - - |
-7/-4=+1 |
-7/-4=+1 |
-7%-4=-3 |
-7%-4=-3 |
Bali uses floored division. Bali takes the next lowest integer if the quotient
is fractional. In Bali, th remainder has same sign as divisor. In Bali the
absolute value of the remainder is always less than the divisor.
Why?
- Java division happily has the Euclidean property,
namely when you multiply the quotient by the divisor and add the remainder you
get back to the dividend. Bali also has that property.
- Java’s modulus behaves, well, strangely. In
Java, the sign of the remainder follows the dividend, not the divisor. Be
especially careful when corralling random numbers into a smaller range with the
modulus operator. It can produce a negative result! For example when you ask for
modulo 3 in Java, you will be astounded to
sometimes get a negative answer outside the range 0..2. This is beacuse java
modulus follows the sign of the dividend, not the divisor. Bali conforms with
the principle of least astonishment by always returning a result that follows
the sign of the divisor, not the dividend.
- So often in Java code I have to write ifs
surrounding the % to get the Bali effect. I have
never coded a case where I wanted the Java effect.
- Every language I have encountered defines the way integer and modulus work for
negative numbers in a different way. The only one I found useful in practice was
Forth’s floored approach. In every other language I found myself handling
negative cases specially doing the arithmetic on the absolute values. In BigDate
you can see examples where floored divide/modulus would simplify code.
Why Not?
- The problem with the approach is hardware usually does not work this way.
Implementing this convention in software would slow down code for the usual all
positive case.
- You can’t change rules like that in midstream. It may introduce subtle
bugs in existing code.
Explicit Override
In standard Java, the rules of inheritance and overriding depend on both static/
instance and variable/method. Methods override, variables shadow.
In Bali, the rules are more uniform. Methods override, but you must declare them
as overriding to ensure you don’t do it by accident (e.g. if the base
class later adds a clashing method.) Constants override. Variables may not be
overridden or shadowed. It just causes confusion. In Java, if you intend to
override a method, and get the signature slightly different, or slightly
misspell the method name, the compiler will not warn you. Your new method will
just be effectively ignored. This is particularly a problem in overriding
methods in adapter classes. In Bali, if your specify override
and there is no matching method to override, the compiler will warn you.
You might write something like this:
class X
extends Y
{
override const int AnomalyYear
= 1582;
override public int getMonth
()
{
return month +
1;
}
original public int getMoonPhase
( )
{
…
}
}
Why?
- Lets you create a subclass, reconfiguring the tweaking constants of the base
class. If you use the base class you get the original tweakers, if the new class,
the new tweakers.
- You can’t accidentally override or shadow.
- Rules are more uniform. Confusion over the existing rules are a source of bugs.
- It warns you that an apparently useless method is actually being used by some
possibly distant ancestor class.
Why Not?
- You can’t fiddle with something this basic to a language.
- The explicit override and original declarations just clutter the code. They won'
t help safety, because most programmers will not bother to declare either way.
- Final currently means both "do not override" and "no changes to
the value after definition." This change requires a redefinition of the
meaning of final. Final would then mean "do not override". The new
keyword const would mean "no changes to the value after definition.".
- Stealing const for this purpose may block other more imaginative uses of the
keyword in future.
- Shadowing is just an extension of the local variable principle. If you get rid
of it you destroy encapsulation. Changes to one module force changes to an
unrelated one.
Sort Interface
In Abundance, you can sort an array or file with a piece of code like this:
BY DESCENDING Salary ASCENDING Hire-Date SORT
Abundance uses postfix notation. You don’t even need to specify the name
of the array or file, since the compiler can easily deduce that. I would like
something similar for Bali like this:
HeapSort.sort(anArray, desc salary asc
hireDate);
As it is, in Java, among several other things,
you need to compose a new delegate class with a compare routine something like
this:
Why
- The current code is voluminous, hard to write, and even harder to proofread.
Changing the sort keys should be a trivial maintenance task.
- The current technique suffers from name pollution.
- By treating multi-key compares as special case, it may be possible to generate
more efficient code for the three way split on the compare result for each key.
It might be possible to avoid the heavy overhead of the two cast checks on each
compare.
- Even COBOL does better than Java.
Why Not?
- A sort statement just does not fit into the Java model. Even for a single object,
you might need several different compare routines.
- RadixSort needs a quite different sort of compare routine.
- It would force a standard interface on all sort routines.
- You could get the same effect with a smart editor that generates the compare
code for you.
- In Abundance each of the 50 primitive types has a standard compare routine. Java
originally had no equivalent, then it acquired the Comparable interface to
define the natural sort order for each type. You can’t tell from a Java
String declaration how case should be considered in comparing, how accented
characters should be collated etc. Since the String class is final, you can’t
create such distinctions with subclassing.
- The feature should be more general than just for an array sort. Therefore the
type needs to be specified explicitly, e. g. comparing
Employee desc salary asc hireDate.
- To use this syntax you would need the collating fields to be public.
- The new syntax should also allow functions as collating fields.
Units of Measure
Java treats all ints as equivalent as far as type safety is concerned. There is
no way to subclass int to create subcategories. You want additional type safety
on your ints, floats and doubles:
- For enumerated types to make sure you don’t mix fruits and vegetables.
- For units of measure so you don’t feed kilograms to a parameter measured
in meters.
- For subranges, so that you assure that a variable in within a given range,
perhaps throwing an exception or coralling it into range if it is not.
If you put in a units checking scheme, it is not hard to add a dimensionality
check that formulas balance in terms of whether they measure mass, length etc on
both sides of the =. It is also not hard to add automatic unit conversions to
deal with any mismatches that are still dimensionally correct.
Why
- Java has this fancy pants type system, but most type errors are in dealing with
enums. Java makes absolutely no attempt to deal with enum type safety.
- Units of measure would make Java appealing to engineers, both for safety and
convenience.
Why Not
- Java is for computer scientists. We don’t want grubby engineers using it.
- There exist type safe enum kludges. So what if they are incompatible with case
labels. Let them use ifs.
- Java is a sissy enough language. Manly programmers eschew type safety.
- There is already a type system for Objects in Java.
Surely it should deal with this. You should not have to add a totally separate
type system for primitives just to deal with enums, subranges, units of measure
and design by contract.
- Let the wimps use preprocessors if they want extra type safety. None of this
would affect the JVM anyway. Don’t muck up our nice clean language with
new syntax I don’t want to have to learn. If it was good enough for
Kernihan, it should be good enough forever.
Credits
Many people helped formulate these ideas, usually by pointing out flaws in my
proposed designs. I have only started giving credit well after the page was
started. Thus early contributors are unsung.
- Patricia Shanahan <pats@acm.org>
- Richard Freedman <rich_freedman@chiinc.com>
- Andrew R. Thomas-Cramer <artc@prism-cs.com>
- Nasser <nabassi@pacbell.net>
- Tim Tyler <tt@cryogen.com>
- Mr. Tines <@ravnaandtines.com>
- Luke Webber <luke@webber.com.au>
- Chris Gray <cg@ami-cg.GraySage.Edmonton.AB.CA>
- Charles (Russ) Lyttle <lyttlec@flash.net>
- Steve Bellenot <bellenot@math.fsu.edu>
- Edwin Guenthner <edwin.guenthner@gmx.de>
- Chris smith <cdsmith@twu.net>