Bali is my hypothetical evolutionary replacement for Java to highlight problems in
Java that may lead to syntax changes itself.
Bali — Java with a Spoonful of Syntactic Sugar
Bali is a proposed superset of Java. I wrote this essay in the days of
JDK (Java Development Kit) 1.0. Some of the features
I requested, such as enums, are now part of Java. In light of how Java has evolved, I
would revise the precise syntax of some of these suggestions to be more congruent
with where Java is now. It takes ideas from various other languages. The intent is to
create a language for application programmers that is safer, terser, easier to read
and maintain than standard Java. Terseness makes the language easier to read, write
and maintain. The ideas come from Abundance (See Byte Magazine October 1986), Eiffel, Pascal,
Delphi, Smalltalk, PL/I, Algol-68, Forth and even COBOL.
Not all the ideas are compatible with each other. They are intended to stimulate
discussion on language evolution, not as a formal definition of a new language. I
have ordered them so that the ones least likely to be controversial are first.
Since this was written, Java has implemented some ideas similar to my
recommendations, such as enums and a foreach loop. When you read those sections,
realise I am not advocating changing the details of the new Java implementations.
I invite you to submit your own ideas for inclusion.
The Bali/Abundance Philosophy of Language Design
Simplicity
is wonderful thing, if not taken to extreme. Some Nazis decided that after they had
conquered the world, they would need a simplified German for the conquered masses to
learn. A watermelon became a large green egg-shaped water
fruit.
When a language (computer or natural) lacks a specific way to express an idea,
people will kludge a million variants. An attempt at simplicity leads to a confusing
tower of Babel (e.g. kludged enums in Java ).
Human languages have thousands of words and a rich syntax. Computer languages are
ridiculously impoverished in comparison. I suggest maintaining computer code would be
much easier if we were much more liberal with syntactic sugar, so there were terse
standard ways of solving standard problems such as specifying the
keys to a sort, rather that writing pages of esoteric, eccentric bubblegum to compare
each field in turn.
Further, languages should be pronounceable. To the human mind,
unpronounced punctuation is just so much annoying fluff. Note that in natural
languages, people make far fewer mistakes in word choice than punctuation. As we move
away from the keyboard toward voice control (Dragon Naturally Speaking) this becomes ever more important.
Computer code should be pronounceable, so that you can think about them aurally in
the mind and talk about them with others over the phone. This means reducing the
amount and pickiness of the punctuation in computer languages. I waste an hour every
day balancing () and {}. Much of this could be avoided by computer language design
that was closer to natural speech.
Book referral for The Language Instinct: How the Mind Creates Language
|
recommend book⇒The Language Instinct: How the Mind Creates Language |
by |
Steven Pinker |
978-0-06-095833-6 |
paperback |
birth |
1954-09-18 age:63 |
978-0-688-12141-9 |
hardcover |
publisher |
Harper Perennial |
978-0-06-203252-2 |
eBook |
published |
2000-11-07 |
978-1-4558-3970-4 |
audio |
|
B0049B1VOU |
kindle |
This is one of the most fascinating books I ever read. It turns out that the brain has hardware for parsing language. Shortly after birth, the parsing engine gets configured to the particular language it hears. Humans without access to a common language will spontaneously develop a pidgin language and within a generation or two will develop a full creole with a complex grammar. |
|
Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder. |
In his book, Pinker points out that the human brain is
hardwired with grammar processing wetware. I wonder if with our supposedly simple
artificial computer languages, we inadvertently bypass our hardwired syntax parsers.
No wonder we make so many programming mistakes. We should be aiming for languages
more like natural spoken languages that don’t depend so much on fussy
punctuation. Natural spoken languages evolve with new vocabulary and short-forms
introduced on-the-fly.
Every time I bring up these ideas publicly somebody will argue that I wish to
destroy Java by gilding the lily and needlessly complicating it. I laugh to myself at such pompous asses, much as I
would chuckle at a child who pontificated on the relative merits of sex versus
masturbation when he had only tried one (or neither). Simplicity is a romantic
notion. Living without electricity is certainly simpler
but it is a heck of a lot less convenient and a heck of a lot less productive. The
code I advocate is orders of magnitude simpler than the low level futzing that passes
for programming today. I have tried both. I promise you, the old way is
horrible in comparison. You will never want to go back, other than
perhaps to rescue others from their ignorant misery.
You will see these thoughts influencing the way I am pushing Java to evolve.
Variable Sized ( ) Display
Bali, in the editor might display like this:
int a = ( (b+c)/(e+f) ) * ( g(i)+h );
That how JDisplay shows Java listings now. It might even be optionally displayed like
this:
b + c
int a = ----- * (g(i) +
h);
e + f
Why?
- The variable size parentheses make it easier to visually balance. This is not a
change to the language, but how the language is presented. I find such concerns as
whether we are talking about a change to the core language, the standard libraries,
or the way the SCIDs present code is
immaterial. What counts is how the language presents itself to the average
maintenance programmer. She sees it as a seamless whole.
- The alternative of named expressions means the links between the pieces are
abstract, not visual.
Why Not?
- When you get deep nesting you need extra space between lines.
- You should try hard to not use complex expressions needing this feature. Split
them into subexpressions and assign them to temporary variables or extract them to
methods with meaningful names.
Assertions
Assertions should be brought over from Eiffel with
as little modification as possible. Better minds that I could concoct a more complete
Java assertion syntax. Eiffel assertions are much more extensive than I have limned
here.
Why?
- Assertions formally document the preconditions on a method’s parameters.
Whose job is it to handle invalid data, the caller or the callee?
- Assertions formally document the postconditions on what the method does and
what changes it makes on this.
- Assertions constrain the behaviour of overriding methods.
- When turned on during debugging, assertions help flush out bugs.
Why Not?
- Assertions sometimes fail. Then what. Eiffel has a whole recovery system quite
different from try, throw, catch, finally. It would not graft well onto Java.
Canonical Counting For Loop
In Java you would write:
for ( int i=0; i<n; i++ )
{
}
In Bali you would write:
for i to n step 1
{
...
} end for
With the exception it works if n is Integer.MAX_VALUE and the step clause is
optional if the step is 1.
Why?
Java for loops as they stand have some problems:
- They don’t work properly when the limits involve Integer.MAX_VALUE or
Integer. MIN_VALUE.
- The end condition traditionally is expressed with < n. Sometimes code
sneakily reads <= n, which is easy to misread.
- Innocent looking for loops that masquerade as a canonical for (int i=0; i< n; i++) can actually do something quite
different.
- They are needlessly verbose for the canonical case.
Why Not?
- It might be confusing to have two different ways to do the standard for.
Relaxed Semicolon Rules
I would like to relax the rule on
semicolons. If a compiler can detect a semicolon is missing surely it can logically insert it. Similarly, if it can
determine a semicolon is extraneous, it can logically
remove it. For example, a semicolon before a } would become optional. Similarly, the
compiler would not freak about an extra semicolon at the end of a for, e. g. This
would be perfectly legal: for ( int i =0; i<n; i++ ; )
{
a.doSomething();
a.doSomethingElse()
}
A tidying program could insert and remove semicolons to some canonical standard
before saving code in the repository. The canonical form would likely have semicolons
before } just as now, to make it faster to insert a new line of code at the end of
the block.
Why?
- Even experienced programmers waste hours every week just finding and fixing
these pedantic little semicolon errors. It is ridiculous to pay people $100 per hour to manually find and fix semicolons, something a
computer could easily do automatically.
- Code with fewer semicolons is actually easier to read.
Why Not?
- Programmers shouldn’t make mistakes. Experienced
programmers don’t make those kind of mistakes.
- I almost never make semicolon mistakes. Therefore
the feature is useless.
- It just encourages sloppy thinking.
- It will just start rwars about the canonical form.
- Real men don’t need computers to help them get their semicolons
correct.
- You make semicolon errors impossible by using a SCID. There is no need to change the language definition.
- The compiler might think there is only a semicolon missing — but
inserting it might just hide the real problem. Of course, the compiler could ask to
insert the semicolon for you — in fact there are compilers doing this, for
example VisualAge for Java.
Optional Named End Statements
Java tends to overuse the { }
characters. It is so easy to get them unbalanced. The compiler is no help on finding
the imbalance. Further, even when they are perfectly balanced, it is hard to match
them up by eye, especially when they are widely separated or when the source has not
been through a tidier to align them.
To solve this problem, Bali takes a leaf from PL/1 and Algol-68. I don’t
dare yet propose a solution as radical as Abundance uses.
In Java you might write:
public void aMethod ( int aParm )
{
for ( i=0; i<aParm; i ++ )
{
out.println(i);
}
}
In Bali you may add optional end statements: like this:
public void aMethod( int aParm )
{
for ( i=0; i<aParm; i++ )
{
System.out.println( i );
} end for
} end aMethod
The keywords you may use after end include:
- the name of the current method
- the name of the current class
- the name of the current loop
- for, then, else, while, switch, case, class, method, init, try, catch,
finally.
If you use end statements, the compiler checks to ensure the preceding } does
indeed match that syntactic element.
Why?
- the end statements act as documentation to make the program easier to
read.
- they are guaranteed accurate, unlike similar comments.
- They could be automatically generated by SCID-like tools.
- They would be automatically renamed along with the method by tools like
VAJ (Visual Age for Java) global method
rename.
- they help the compiler generate more accurate error messages for unbalanced {
}. The compiler can nail down precisely where the mismatch occurred.
- they are shorter to write than the corresponding comments.
- they make it more likely that when you insert new code into an existing program
you will put it in the correct place, particularly adding a new method to the class
just before the final } of the class.
- they are likely to flush out {} nesting errors that technically balance, but
which don’t do what you intend.
- The alternative is to break the code into separate named methods. Then the
links between the pieces would be symbolic/verbal, rather than visual. With a
SCID (Source Code In Database) you can collapse blocks
so that the stringiness of inline code is not such a problem.
Why Not?
- end xxxx is quite verbose just to mark the ends of blocks. It
might be wiser to use more compact icons to mark begin and end.
- If you can’t easily match brackets by the first look, refactor your code
so that you can. Use an automatic code formatter to align them properly. If there
is too much code between them, extract parts of it into its own method.
- With a SCID, matching begin-end elements can be done visually.
There is no need for names.
Explicit modifiers
Permit the use of the modifier
instance to explicitly declare a variable or method as non-static.
Permit the use of the modifier package or friend to
explicitly declare a variable or method as not private, protected or public.
Why?
- In Java, you can declare a method or variable static, but there is no way to
explicitly declare it not static, i.e. instance. The lack of a declaration could be
an oversight, or it could be deliberate. You can’t tell. Similarly for friend
/package scope.
- There is currently not even a vocabulary to talk about friend visibility.
- There is no target string to search in the source code for friend and instance
methods.
- It makes people think about whether a method should be/is static or instance.
So often programmers are puzzled when they use instance methods as if they were
static. The instance keyword would jog their memories.
Why Not?
- There are already scads of modifiers. New modifiers would just clutter code.
Explicit modifiers would only make sense if you selected the modifiers with radio
buttons in the source code editor.
Extended CASE labels on SWITCH : SELECT
WHEN
In Bali case labels can have the following forms:
- integer constants
- ranges, e.g. 300..500
- strings
- variables (including objects compared with .equals). In regular Java it is not
good enough to have constants for your case labels known at load time, you need
ones known at compile time.
- Boolean expressions, e.g. isValid(x), or regular expressions that evaluate to a
Boolean.
To avoid confusion with the current SWITCH I suggest a new set of keywords:
Conceptually each when clause is executed in turn too
see if it is true. However, usually the code can usually be compiled much more
efficiently than that with a combination of nested if, binary search for range, jump
tables, delegate arrays, hashtables and indexed lookup tables. The code is both
easier to read and faster than traditional switches. See my student project on extended case where I go in more
detail on implementation.
If you leave off the other clause, you will get a
runtime exception, or perhaps a compile time error if a value falls out the bottom
unaccounted for. If you want nothing to happen, you must have an explicit empty
other clause. You might also implement decision tables that generate similar logic.
Why?
- Maintenance is safer. The equivalent if statement would redundantly
specify the switch variable. This allows accidental divergence (what if one of the
cases accidentally specifies a different switch variable?) and cheaper maintenance
(the switch variable can be modified in a single location).
- It is easier to proofread code to see that all cases have been handled. By
leaving out the other clause, you can get the compiler
and runtime to help you ensure you have covered all cases.
- It is easier to read code to associate the actions with the conditions.
- It is terser than the equivalent ifs or individual case labels.
- The other case can be used to ensure no cases were overlooked. Granted you can
do this with else, but it is much easier with nested ifs to have
a leak — a case that is never handled.
- A compiler might do better at optimising because of the additional structure
over the equivalent nested ifs. For example, for large range bands that
are contiguous, the code need test only one end of the range per band, not both as
would typically be done in nested if. The compiler might generate a binary
search to determine range band. That sort of code implemented in nested if
would create unmaintainable code.
- Ideally you want to avoid complicated case or nested if logic, but there are
times when there appears to be no other way to handle it. You need every tool
possible to help tame the mess and make the code comprehensible. There be dragons
in such places even if they are rare.
Why Not?
- It is fairly easy for a case to accidentally fall between the cracks and be
handled by other.
- Compilers don’t know enough to safely generate optimal code for the
cases. Humans writing code know more and can therefore generate faster code even if
the code is harder to follow.
- It is cool to have ifs nested so deep that it doesn’t fit on your screen
any more, even though you indent only one character per if-statement. See
How to Write Unmaintainable Code.
- Proper use of design patterns should make most of complicated nested if
statements unnecessary.
C#-like JavaBean Properties
Bali properties could look like
Eiffel, Delphi or C#. I think the best approach is C#.
In Java you write:
In Bali you would write that as:
published is a sort of super public declaration that means this property should be visible in the
beanbox. To provide read/write access you write the get/set
code. To deny read/write access you leave out the corresponding get/set code. To the clients of the property, the property looks like
an ordinary public member variable.
get, set and value are reserved words. get and
set introduce accessor/guard routines for internal
variables. If you leave out the get or set clause, you suppress the ability to read
or write the property.
Why?
- Properties let you convert a public variable into a pair of guard functions
without having to change the client code.
- Client get/sets written with properties are much easier to read and write. They
look just like ordinary variables.
- In situation where it is important to be able to tell if an identifier
represents a variable or a function a SCID
could come to the rescue. It could colour code them, or even expand properties
back to function notation. You could also use a naming convention to help in
distinguishing access to a private member variable directly and via the property
within the class. You would call the private member mSize and the property pSize.
All members begin with m and all properties with p. See coding conventions.
- This syntax makes a clear distinction between external properties and internal
variables needed to implement them. There is no overloading of names with the
ensuing confusion. There is no problem controlling visibility independently.
Why Not?
- Property already has a meaning in Java — a list of keyword/value pairs. Some other name
such as attribute should be used for such a feature. Perhaps they should
be called pseudovariables.
- Clients should be aware whether they are using a function or a variable.
- Competent programmers never use public variables. They set them up as functions
right off the bat even if the routines are just dummies. Thus there is no need ever
to flip client code when a public variable changes to a function.
- Perhaps it might be better to simply allow functions without arguments to be
written without trailing (). That would allow you to convert a variable to an
accessor function without changing any client code without coding any extra
declarations.
- Properties are an attempt to invent PL/I-style pseudo functions, functions that
can be used on the left hand side of the = sign. Another way of looking at the
problem, properties are an attempt to allow you to override the effect of the =
operator. Perhaps instead of properties, we should invent a more general mechanism
that allows overloaded functions that can be used on either side of the = sign,
with multiple arguments. Then we would have a way of dealing with tuples returned
from a function. If a function can have multiple inputs, why should it not have
multiple outputs? If a function can accept several variables to access, surely a
function should be able to accept several variables to store. A more general
mechanism like this would lead to simpler syntax for accessing the Vector
methods.
- If you are debugging and tracing code, you want to know if an identifier
represents a variable or a function.
- The existence of language support for get/set methods applying to private data
may encourage treatment of classes as structs with get/set methods for every
physical data member. This is a common misconception even without language support,
especially among beginners; fostering it would be harmful.
The Grand Collection Unification
Containers of all types
should have a uniform and simple syntax for manipulating them. The concept of
container should be expanded as it is in Abundance to include files, Hashtables,
b-trees (both ram and disk based), maps, sets, arrays, ArrayLists and any other
Collection you could imagine. There are three basic operations: get, set and iterate.
I suggest the following uniform syntax to apply to all Map-like things and set-like
things where applicable. I will use a Hashtable as an example:
Why?
- It makes it easy to change your mind later how to implement a collection. All
you need do is change the declaration. All the rest of the code stays the
same.
- The code for file handling is very much simpler.
- It makes the language easier to learn.
- The code is easier to proofread.
Why Not?
This permits functions of a single variable on the left hand side of the equal
sign. Surely you should be able to have multiple selector arguments. You might as
well generalize the notion of properties to allow them to have 0 to n selectors,
or generalise all methods to have both a right hand side producer implementation
and an optional left hand side consumer implementation. You might as well use the
familiar () syntax, e.g.
map( latitude, longitude ) = "Johnny's Bar & Grill";
String restaurant = map( latitude, longitude, slop );
- All collections code looks alike. You can’t tell if file I/O or an array
or an ArrayList is being manipulated. It isn’t natural!
- Iterators need to be smarter than that. You need ways of selecting
subsets.
Enumerations
Java currently has no enumerations. There is
the Enumeration interface, but that is just an Iterator over a collection, not a
language feature. I mean enumerations in the sense of Pascal enum or Ada-95
enumeration type. Java has only static final constants. There is no formal connection
between groups of constants, or between the groups of constants and the variables
that use them. There is no formal mechanism to distinguish the two kinds of
enumeration constants:
- enumeration integer constants that represent one of a possible list of
choices.
- mask constants that can be ORed together to represent combinations of
choices.
I suggest Java should properly support enumerations. This proposal is influenced
by Ada, Pascal and C’s enum, e.g. The type declarations would look like
this:
enum DayOfWeek { SUN, MON, TUE, WED, THU, FRI, SAT };
set DaysOfWeek { SUN, MON, TUE, WED, THU, FRI, SAT };
or in shorthand:
DayOfWeek aDay = DayOfWeek.SAT;
DaysOfWeek weekEnd = DaysOfWeek.SAT | DaysOfWeek.SUN
Enums are much like classes or interfaces. They could be standalone or could be
declared inside a class or interface and made private or public.
Inside the class you would reference the enum value by DaysOfWeek.WED, (possibly just WED if there
were no ambiguity), outside TheClass.daysOfWeek.WED. The
enum constants could be used in case labels.
To handle enums that are individual bits I suggest a syntax like this:
set PizzaToppings { MOZZARELLA,
PEPPERONI, TOMATOES, GREENPEPPER, PINEAPPLE, GOUDA, CRANBERRIES, ANCHOVIES };
You could then declare a variable as:
PizzaToppings pizzaOrder = PizzaToppings.TOMATOES | PizzaToppings.
CRANBERRIES;
if (pizzaOrder & PizzaToppings.ANCHOVIES)…
enums and sets are types and are type checked in parameter passing and
assignment. Currently Java’s static final ints have no type checking to make
sure you don’t try to add a Monday topping to your pizza.
You would have a cast of the form (DayOfWeek) or
(PizzaToppings) that converted a plain int into a member of
that enum. It would raise an exception if it did not match one of the official values
of the enum or set. You could also cast an enum to an int and a set to a long. You
could cast an enum to its related set but you could not cast a set back to an
enum.
There is no formal super/subset relationship between different enums or sets.
There are a number of pseudo functions you can perform on enums. They have a
syntax that simulates both static and instance forms: e.g.
day = day.first ();
day = day.prev ();
day = day.next ();
day = day.last ();
day = DayOfWeek.first ();
day = DayOfWeek.prev (DayOfWeek
.WED);
day = DayOfWeek.next (DayOfWeek
.WED);
day = DayOfWeek.last ();
next and prev raise an
InvalidEnumException if you run off the end. Case labels may use enum constants,
pseudofunctions of enum constants and enum ranges, e. g.
switch (day
)
{
case DayOfWeek.first
().dayOfWeek.WED :
dosomething();
break;
case DayOfWeek.next
(DayOfWeek.WED
) :
dosomethingelse();
break;
case DayOfWeek.last
() :
dosomethingelseagain();
break;
default :
complain();
break;
} Enum variables have the following operators defined on them + int, - int, enum- enum, <, >, =, !=, []. Set variables have the
following operators defined on them: | & ^ ~ = !=.
Implementation
Internally, for efficiency an enum is an int 0..63 and a set
is a long. Since there is no inheritance, type checking could be done completely at
compile time. A standalone enum is implemented as an interface with a group of static
final ints. You might test the idea by implementing it with a preprocessor.
When you say PizzaToppings.ANCHOVIES the compiler
automatically selects the index or the bitmap representation depending on context.
This will get rid of one of the main sources of error in handling enumerations in
Java, confusion over whether an enum constant is
intended as an index or a bit map.
I debate with myself back and forth if you should be allowed to assign explicit
values to the enum constants the way you can in C++. The disadvantage of doing that is you ruin the
next and prev functions. The
advantage is it makes it easier to interface with the outside world such as databases
that use ints, not enums. At this point in time I feel it best to let the compiler
assign the numbers always contiguous, starting at 0. If you need breaks in the
ordering, write a method to produce it.
Languages like Pascal manage to use the same enumeration constants in both enum
and set contexts. That trick is probably beyond the ability of simple-minded Java
compiler logic, though it would be preferable from the maintenance programmer’s
point of view.
I think by default you should be able to use your enums in either mode, either as
indexes or as bits in a set. The compiler should figure out which from the context
and should generate both forms internally.
Just how long should an enum be 8-bit index and 256-bit
set?, 16-bit index and 8K
sets?
Have a good look at how enums work in other languages. They may have cleaner
solutions than I have suggested here.
Why?
- Enums have type checking. You can’t accidentally use the wrong enum
constant, or use the enum form where you needed the set form. Consider a class like
GridBagLayout that has all manner of groups of enum constants. Currently it is very
easy to use one from the wrong group, e. g. LEFT and WEST.
- Various kludges using enumeration objects won’t work in case clauses,
don’t have a simple predictable external representation and have the extra
overhead of an object rather than a simple int.
- It ensures you don’t use a constant from the wrong enumeration
group.
- It ensures you use an enumeration constant in the correct way, as a value or as
a bit.
- It makes maintaining code much easier. It is quite difficult without
enumerations to figure out which constants are legitimate to use with which
variables. My suggested scheme makes it completely clear and recruit’s the
compiler’s help to ensure the rules are followed.
- The formal connection enables a debugger to display the enum value by name,
rather than just by value.
- Having associated integers allows comparison for order, storage in a database,
permanent values for use in native classes and enumeration over all possible
values.
- By making the compiler assign the bit patterns for the enums without gaps means
that next and prev are well
defined and simple to implement. They can be used in for loops to enumerate over
all possibilities.
- You ideally want to view your code two different ways:
- method focus: How does each subclass implement this particular method? I
want to see all the implementations side by side.
- subclass focus: What are the particular behaviours of this subclass.
Enumerations work better for method focus. You have alls the code for all
subclasses right under your nose combined in one method. Subclassing works better
for subclass focus. With SCIDS, you could use subclassing but still get the
benefits of the old enumeration method focus. You could flip back and forth
viewing and coding either way.
Why Not?
- It may be too complicated to figure out the type rules. What sorts of
arithmetic are you allowed on the various enumeration values without casts to store
them back in variables? You want to avoid casts that could introduce errors. You
probably want + int, - int for regular enumerations, | & ~ for set style
enumerations. How much run type checking should be done to ensure enumeration
constants are always valid.
- Templates are coming. There will be better ways to kludge enumerations as
primitives that take only 20 pages of code. This will give Roedy a chance to become
famous by writing an amanuensis to generate all the bubblegum from a list of the
enumeration names.
- Enums are not primitives. They are not classes. They are a whole new thing.
Fitting an entire new class of animal into a language takes some deep thinking. It
has to get along with everything else. Ideally enums could be defined in a way that
did not require any changes to the class file format. They might only need exist at
compile time. At run time, they would be treated just like ints. Enums are related
to subranges and units of measure. Perhaps some more general solution should be
sought that handles all three.
- Enumerations are mostly used in switch-statements and the like. They
don’t make use of polymorphism.
Fetch
The fetch keyword just helps
shorten a common idiom. I find myself typing stuff like this
This pattern is so common it would be useful to have an abbreviation both for typing
and proofreading. maybe:
fetch motherObject.getLongComplicatedThingamagig();
Why?
It is faster to type and easier to proofread. One letter out in those
three repetitions and the code has a totally different meaning.
Why Not?
It legitimizes the goofy getXXX convention for what should be
Delphi-like properties.
Augments
The augments keyword is much
like the extends keyword.
- It stops you from deliberately or accidentally overriding any method in the
base class.
- It lets you extend even final classes with additional convenience methods
without creating wrapper methods.
Why?
- Saves writing repetitive wrapper methods.
- Ensures methods are not accidentally overridden, especially in superclasses of
the immediate base class.
- Ensures methods are not accidentally overridden if the base class is later
updated.
Why Not?
- Perhaps this should better be handled with explicit override, original, overridable keywords.
via
The via keyword lets you implement an
is-a interface via a has-a reference to some class that implements it. It makes it
much easier to write wrapper methods. You might use it like this:
public class DoIt implements
SomeInterface
via foo
{
private AClassThatImplementsSomeInterface
foo;
}
Why?
- Saves writing repetitive wrapper methods to implement an interface with a has-a
reference.
- It reduces arthritis. You can change the interface without having to maintain
gobs of code.
Why Not?
- It muddles is-a with has-a logic a bit more than the purists would like.
- It encourages sloppy design, as it inspires to extend a class with
functionality without thinking about the actual responsibilities of the class.
Constructor Shortcut
In
BigDate d = new BigDate( 1997, 5 , 6 );
In Eiffel you would write:
d : BigDate;
d.Create(
1997,05,06 );
In Bali you can use the usual Java syntax or this shortcut:
BigDate (1997,
5, 6)
d;
Why?
- You don’t have to write the name of the class twice.
- You are less likely to forget to initialise.
- It is impossible to create an object of the wrong type.
Why Not?
- The new syntax masks the fact you are creating a new object.
- The new syntax looks too much like an ordinary method call.
Iterate Shortcut
In
for ( CitiesByState iter = new CitiesByState(); iter.hasMoreElements();; )
{
City city=iter.nextElement();
println( city.population );
}
In Bali, you could abbreviate this to:
for each city in CitiesByState
{
println (city.population);
}
Why?
- it is a lot easier to read and proofread.
- there is less likelihood of error typing it.
- it opens the door for abbreviations to other for idioms.
Why Not?
- The Java scheme gives you extra typing practice and makes your keystroke per
day output higher.
Null Method Shortcut
In
if ( person != null ) person.requestPassport();
In Bali, you could abbreviate this to:
person..requestPassport();
or if ( a != null && a.b != null && a.b .c != null ) d = a .b.c .doit();
else d = null;
would become
d = a..b..c..
doit ( );
Why?
- There is no possibility of a minor mismatch of the name tested in the if and
the variable used to access the method.
- Programmers are lazy and tend to ignore null cases. If you make it easy for
them to deal with them, they are more likely to keep them in mind.
- This syntax really starts to pay off when you have a chain of links, each of
which could potentially be null. You can even generalise it to return 0 or null on
encountering a null when a value is being computed.
- The alternative null Object pattern requires writing an immense amount of
bubblegum that essentially does nothing. If that could be mechanically generated
somehow, the null object pattern would be much more attractive. It potentially also
could be faster, since all the implied null tests and most of the explicit ones
could in theory be eliminated. Vrroom!
Why Not?
- You might want to reserve .. for use in defining ranges.
- You should use the NullObject pattern instead.
Conversions
In Java, casting is used for two quite different
purposes:
- to request a conversion.
- to request that an object be treated as an instance of some sub or superclass
without actual conversion. You are merely reassuring the compiler the object in
question truly is already of the type asserted. You as programmer
know this must be so from the way the program logic works, though this is not
necessarily immediately obvious to the compiler.
There are literally hundreds of different ways of requesting a type 1 cast. The
simple (type) notation only works on a handful of cases. Type 2 casts are uniformly
done with (ClassName).
In Bali there are similarly two types of cast, but the way you request them is
totally uniform:
- (DesiredType) — for conversion.
- (is DesiredType) or possibly (as
DesiredType) — for treatment as sub/superclass.
You can even convert objects to primitives, objects to objects and primitives
to objects with a type 1 cast.
How are the conversions to objects done? by looking for a constructor that takes a
primitive or object as the sole argument.
How are conversions from objects to primitives done? by looking for methods with
the names intValue(), longValue(), etc.
How are conversions from primitives to primitives done? conceptually by looking
for static methods with names like Convert.intValue(double); though in practice these
are handled inline as special cases.
Why?
- Simplification. Code is easier to write and read when there is a uniform way of
requesting conversions.
- accuracy. You are more likely to get the correct method.
- it makes it clear when data are actually being transformed.
Why Not?
- It requires changing existing code to insert the as or is.
- It makes no sense that . is a postfix operator but
(cast) is a prefix operator. It leads to complicated
parenthesis forests. Perhaps we need a new pair of postfix casts, e. g.
(is DesiredType) and (to
DesiredType).
Generic Cast (convert)
I further suggest that the generic caster
(convert) be allowed to convert where the compiler knows
the type of the target, e.g.
float x = 1.0f;
String s = ( String) x;
String t = (convert ) x;
String u = String
.valueOf (x
);
Why?
- If the types of x or t change, e.g. x becomes a double, all the code
automatically adjusts to suit. You only need change the declarations. The cardinal
rule of writing maintainable code is that you specify each fact in one and only one
place. Java flagrantly violates that rule in three places: casts, conversion
functions and primitive temporary variables that don’t track the type of
corresponding major variables.
- When you are writing code you don’t need to constantly be quite so aware
of the precise types of each variable.
Why Not?
- It makes it harder to add new types and conversion functions. Any such
won’t be built-in.
Symmetrical Then/Else
In Bali, the if is more like Eiffel’s.
The C-style if is deprecated, but still supported. In Java you might write:
if ( (a = b== c ) || (d== e ) )
{
f= g;
r= s;
}
else
{
f= h;
r= t;
}
In Bali you would write that as:
if (a = b == c) || (d == e)
then f = g; r = s;
else f = h; r = t; if;
Why?
- the then and else are more symmetrical. The true and false actions align which
enhances comparing their similarities and differences.
- it is easier for a human to pick out where the condition ends and the true
action clause starts.
- it helps reduce the parenthesis () forest in the condition by one level.
- it helps reduce the brace {} forest in the actions by one level.
- If you switch from a simple if action to a complex one, you don’t have to
add {}. The syntax is consistent.
Why Not?
- then if is more keystrokes than () { }.
- if is an ad hoc end for an if. There should be a consistent system where every
block type has a unique matching begin end pair. You can’t very well use that
if technique for rof, chtiws, esac, elihw, yrt, hctac.
Up Front Declares
You might write something like this:
dcl maxCount : int,
s : String,
myObject : SomeUserClass;
Why?
- In Java, you have to plough through a great string of symbols before you find
the name of the thing being defined.
- In Java there is no target string you can search for to find declarations.
- With this scheme, declarations align.
- It lets you find declares in listings and email posts where your
SCID tools
are not available.
Why Not?
- It is too fundamental a change.
- It does not extend properly to allow definitions in the middle of
expressions.
- It takes more keystrokes.
- The initialisation syntax is clumsy. It is handled separately from the
declaration.
- Without changing the Java syntax, a visual editor could solve the problem by
displaying a series of declarations aligned, possibly with some detail suppressed.
Similarly a visual editor could bold face the identifiers being defined to make
them easier to pick out. Java’s declarations are quite a bit simpler than C
and C++’s. Such drastic measures are not
needed in Java.
Distinct Concatenation Operator
I suggest that + for
concatenation be deprecated and soon outlawed altogether. In its place we will use
|||.
Why?
- In Java, + does double duty: addition and concatenation. +
on an int sometimes
means add, sometimes concatenate, depending on what surrounds it. That is just
plain Mickey Mouse, not to mention hard to read and bug inducing.
- If Java acquired a distinct concatenation operator, what would it be?
Pretty much all the ASCII (American Standard Code for Information Interchange) combinations are taken, e.g ||
used in PL/I and SQL (Standard Query Language). Maybe ||| . % -|- :: # ## _ >< (though ><
should
be reserved for != ).
Maybe a non-ASCII glyph, such as
⊕
(0x2295) or ⋅ (0x22c5).
Unicode has more characters than you could shake a stick at, but no official concatenation operator.
It may also be possible to use space. Where there is no operator between two strings, a concatenation operator is presumed.
We might invent a brand new glyph and get a code point for it inserted into Unicode.
Mathematicians use two closely spaced vertical lines.
Perhaps something on that line that would not be confused with
||. Unicode already has ‖
(0x2016),
∥
(0x2225)
and
⏸
(0x23f8).
The IDE (Integrated Development Environment) could convert + to the new operator where appropriate so you would not have to directly key it. (In that case it could not have higher priority than +.)
Why Not?
- It takes three keystrokes instead of one.
- If this were added to Java, we would have a big fight about what to do with legacy code that used + for both addition and
concatenation. Code could be automatically converted with no problem. The old form could be deprecated. The old form could be left as is without warning.
You would not get any benefit until code were converted.
Move Corresponding
I propose something similar to the
COBOL MOVE CORRESPONDING that cannot be expressed in the
usual linear language syntax. It requires dialog boxes.
A MOVE CORRESPONDING should display a dialog box with a
list of fields that have matching names in object a and
b. Beside each name is a little checkbox that can have
four states:
- copy
- ignore
- use custom copying code, e. g.
switch (a.x) {
case 1: b.x = 10; break;
case 2: b.x = 15; break;
default: b.x = a.x; break; }
- have not decided yet. (this generates a syntax error on compile).
Simply adding a new field to either a or
b will create new entries in all MOVE
CORRESPONDING tables (marked not yet decided). The editor also lets you view
any leftover fields from a and/or leftover fields from
b.
Why?
- Lists of fields copying from one object to another is common for versioning.
You rearrange fields, add fields and need to convert the old files to the new
format. In Java, SQL handles this problem automatically for rows. You
still have the problem of dealing with updating old objects or records in flat
files. You also have it when you export a file of some subset of the information.
- In my home-brew language Abundance, I found that a COBOL-style move
corresponding was NOT quite what was needed. I needed something of the form
MOVE CORRESPONDING EXCEPT FOR… The other problem
was maintenance. When you add new fields, you need to examine all the move
correspondings to figure out just what should be done with the fields. It is easy
to overlook them because they don’t explicitly name the variables affected.
My proposal lets you ensure you have not forgotten to handle every field.
Why Not?
- This form of MOVE CORRESPONDING cannot be expressed
without dialog boxes that dynamically examine the lists of fields in the two
objects.
Primitive Object Methods
In an early version of Java,
primitives were objects. This made the language simpler, but turned out to have too
much overhead. It may be possible to bring back part of this notion. For example, I
think you should be able to write y = x. sin() instead of
y = Math.sin(x) and i.toString()
instead of Integer.toString(i).
Why?
- The new notation avoids mentioning a class that has nothing to do with the
operation. In Integer.toString(i) there is no
Integer object involved. It is confusing to use a
notation that implies there is.
- The notation is consistent with the way you handle objects. Why should there be
different notations to do the same thing?
- It may open the door to primitives even more like true objects, where you can
create subclasses of integers with type safety checking.
Why Not?
- You still could not define your own methods to act on primitives as objects.
Until there was a way to subclass from primitives, the notation would only be
useful for built-in methods.
- It is good to have separate notations for methods on objects and primitives. It
helps make the distinction between objects and primitives clear.
Short constant names
In
MyClass x = new MyClass();
x.doSomething( MyClass.ACONSTANT1, MyClass.ACONSTANT2 );
In Bali, you could abbreviate that:
MyClass x = new
MyClass( );
x.doSomething(ACONSTANT1 ,
ACONSTANT2);
The MyClass prefix is used to help define unknown constants mentioned between
the ( ) surrounding the parameters invoking a method of a class.
Why?
- Saves typing the class name over and over.
Why Not?
- There are conditions where the name ACONSTANT2 could be ambiguous. It may be
better to disambiguate it always, rather just than when it is needed.
- Extending the scope of a class to parameter calls to its methods is just too
weird.
Spaciousness
I suggest the following simplification to the
compiler’s parser (the tokenizer actually). All operands and operators must be
separated by a space, with the following exceptions: comma, unary +, unary -, ++, --,
., ;, (), []
In Java you might write:
x=a++--b*-(c++<<1);
in Bali, you would have to write that as:
x = a++ - -b * -(C++ << 1);
Why?
- it gets rid of potentially ambiguous (to humans) strings of + and -.
- it makes code more readable.
- It removes one more temptation to write unmaintainable code.
- it speeds compilation.
- it opens the door to Forth/Abundance-like user defined operators and methods
with almost arbitrary names. You could use alpha, numbers, any punctuation, any
Unicode chars except ()[]+-,.space,; You might use characters such as these for
your new user-defined operators:
Unicode Characters to Use In User-Defined Operators
Unicode Characters to Use In User-Defined Operators |
\u2200 .. \u22ff |
mathematic symbols |
\u2600 .. \u26ff |
miscellaneous symbols |
\u2460 .. \u24ff |
enclosed alphanumerics |
\u25a0 .. \u15ff |
geometric shapes |
\u2100 .. \u214f |
letter like shapes |
- The alternative is to break the expression up and name the pieces. Then the
relationships between the parts are symbolic/verbal rather than visual. With
various SCID-like tools to format large expressions, they become
comprehensible, more comprehensible than traditional code with
named subexpressions. If you don’t believe me, perform an experiment. Your
visual bandwidth is much higher than your verbal bandwidth.
Why Not?
- It takes up more screen space and takes more keystrokes.
- Cramming would be a very hard habit to break.
- You simply shouldn’t write such hard to read expressions.
Divide and Modulus on Negative Numbers
If you divide two
numbers, you can do four things with the result:
round
- useful for approximating e.g. speed = round (
distance / time )
ceil
- useful to calculate b = ceil( n/m ) how
many bins b, each of which can hold m items, are needed to hold n items.
floor
- useful to calculate i = floor( i/m )
which bin number b the item i falls into when each bin can hold m items.
trunc
- Java style division. No known use.
division
Signs |
Java division |
Bali Division |
Java Modulus |
Bali Modulus |
+ + |
+7/+4=+1 |
+7/+4=+1 |
+7%+4=+3 |
+7%+4=+3 |
- + |
-7/+4=-1 |
-7/+4=-2 |
-7%+4=-3 |
-7%+4=+1 |
+ - |
+7/-4=-1 |
+7/-4=-2 |
+7%-4=+3 |
+7%-4=-1 |
- - |
-7/-4=+1 |
-7/-4=+1 |
-7%-4=-3 |
-7%-4=-3 |
Bali uses floored division. Bali takes the next lowest integer if the
quotient is fractional. In Bali, th remainder has same sign as divisor. In Bali the
absolute value of the remainder is always less than the divisor.
Why?
- Java division happily has the Euclidean property,
namely when you multiply the quotient by the divisor and add the remainder you get
back to the dividend. Bali also has that property.
- Java’s modulus behaves, well,
strangely. In Java, the sign of the remainder follows the dividend, not the
divisor. Be especially careful when corralling random numbers into a smaller range
with the modulus operator. It can produce a negative result! For example when you
ask for modulo 3 in Java, you will be astounded
to sometimes get a negative answer outside the range 0..2. This is because java
modulus follows the sign of the dividend, not the divisor. Bali conforms with the
principle of least astonishment by always returning a result that follows the sign
of the divisor, not the dividend.
- So often in Java code I have to write
ifs surrounding the % to
get the Bali effect. I have never coded a case where I wanted the Java effect.
- Every language I have encountered defines the way integer and modulus work for
negative numbers in a different way. The only one I found useful in practice was
Forth’s floored approach. In every other language I found myself handling
negative cases specially doing the arithmetic on the absolute values. In
BigDate you can see
examples where floored divide/modulus would simplify code.
Why Not?
- The problem with the approach is hardware usually does not work this way.
Implementing this convention in software would slow down code for the usual all
positive case.
- You can’t change rules like that in midstream. It may introduce subtle
bugs in existing code.
Explicit Override
In standard Java, the rules of inheritance
and overriding depend on both static/ instance and variable/method. Methods override,
variables shadow.
In Bali, the rules are more uniform. Methods override, but you must declare them
as overriding to ensure you don’t do it by accident (e.g. if the base class
later adds a clashing method.) Constants override. Variables may not be overridden or
shadowed. It just causes confusion. In Java, if you intend to override a method and
get the signature slightly different, or slightly misspell the method name, the
compiler will not warn you. Your new method will just be effectively ignored. This is
particularly a problem in overriding methods in adapter classes. In Bali, if your
specify override and there is no matching method to
override, the compiler will warn you.
You might write something like this:
class X
extends Y
{
override const int AnomalyYear
= 1582;
override public int
getMonth ()
{
return month + 1;
}
original public int getMoonPhase (
)
{
…
}
}
Why?
- Lets you create a subclass, reconfiguring the tweaking constants of the base
class. If you use the base class you get the original tweakers, if the new class,
the new tweakers.
- You can’t accidentally override or shadow.
- Rules are more uniform. Confusion over the existing rules are a source of
bugs.
- It warns you that an apparently useless method is actually being used by some
possibly distant ancestor class.
Why Not?
- You can’t fiddle with something this basic to a language.
- The explicit override and original declarations just clutter the code. They
won' t help safety because most programmers will not bother to declare either
way.
- Final currently means both do not override and
"no changes to the value after definition. This change
requires a redefinition of the meaning of final. Final would then mean do
not override. The new keyword const would mean no
changes to the value after definition.".
- Stealing const for this purpose may block other more imaginative uses of the
keyword in future.
- Shadowing is just an extension of the local variable principle. If you get rid
of it you destroy encapsulation. Changes to one module force changes to an
unrelated one.
Sort Interface
In Abundance, you can sort an array or file with a
piece of code like this:
BY DESCENDING Salary ASCENDING Hire-Date SORT
Abundance uses postfix notation. You don’t even need to specify the name
of the array or file, since the compiler can easily deduce that. I would like
something similar for Bali like this:
HeapSort.sort(anArray, desc salary asc hireDate);
As it is, in Java, among several other
things, you need to compose a new delegate
Why?
- The current code is voluminous, hard to write and even harder to proofread.
Changing the sort keys should be a trivial maintenance task.
- The current technique suffers from name pollution.
- By treating multi-key compares as special case, it may be possible to generate
more efficient code for the three way split on the compare result for each key. It
might be possible to avoid the heavy overhead of the two cast checks on each
compare.
- Even COBOL does better than Java.
Why Not?
- A sort statement just does not fit into the Java model. Even for a single
object, you might need several different compare routines.
- RadixSort needs a quite different sort of
compare routine.
- It would force a standard interface on all sort routines.
- You could get the same effect with a smart editor that generates the compare
code for you.
- In Abundance each of the 50 primitive types has a standard compare routine.
Java originally had no equivalent, then it acquired the Comparable interface to
define the natural sort order for each type. You can’t tell from a Java
String declaration how case should be considered in comparing, how accented
characters should be collated etc. Since the String class is final, you can’t
create such distinctions with subclassing.
- The feature should be more general than just for an array sort. Therefore the
type needs to be specified explicitly, e. g. comparing Employee
desc salary asc hireDate.
- To use this syntax you would need the collating fields to be public.
- The new syntax should also allow functions as collating fields.
Units of Measure
Java treats all ints as equivalent as
far as type safety is concerned. There is no way to subclass int to create
subcategories. You want additional type safety on your ints, floats and doubles:
- For enumerated types to make sure you don’t mix fruits and
vegetables.
- For units of measure so you don’t feed kilograms to a parameter measured
in meters.
- For subranges, so that you assure that a variable in within a given range,
perhaps throwing an exception or corralling it into range if it is not.
If you put in a units checking scheme, it is not hard to add a dimensionality
check that formulas balance in terms of whether they measure mass, length etc on both
sides of the =. It is also not hard to add automatic unit conversions to deal with
any mismatches that are still dimensionally correct.
Why?
- Java has this fancy pants type system, but most type errors are in dealing with
enums. Java makes absolutely no attempt to deal with enum type safety.
- Units of measure would make Java appealing to engineers, both for safety and
convenience.
Why Not
- Java is for computer scientists. We don’t want grubby engineers using
it.
- There exist type safe enum kludges. So what if they are incompatible with case
labels. Let them use ifs.
- Java is a sissy enough language. Manly programmers eschew type safety.
- There is already a type system for Objects in Java. Surely it should deal with this. You should not have
to add a totally separate type system for primitives just to deal with enums,
subranges, units of measure and design by contract.
- Let the wimps use preprocessors if they want extra type safety. None of this
would affect the JVM (Java Virtual Machine) anyway. Don’t muck up our nice clean
language with new syntax I don’t want to have to learn. If it was good
enough for Kernighan, it should be good enough forever.
Learning More
Credits
Many people helped formulate these ideas, usually by
pointing out flaws in my proposed designs. I have only started giving credit well
after the page was started. Thus early contributors are unsung.
- Patricia Shanahan <pats@acm.org>
- Richard Freedman <rich_freedman@chiinc.com>
- Andrew R. Thomas-Cramer <artc@prism-cs.com>
- Nasser <nabassi@pacbell.net>
- Tim Tyler <tt@cryogen.com>
- Mr. Tines <@ravnaandtines.com>
- Luke Webber <luke@webber.com.au>
- Chris Gray <cg@ami-cg.GraySage.Edmonton.AB.CA>
- Charles (Russ) Lyttle <lyttlec@flash.net>
- Steve Bellenot <bellenot@math.fsu.edu>
- Edwin Guenthner <edwin.guenthner@gmx.de>
- Chris smith <cdsmith@twu.net>