Jadoth: 2011

Sonntag, 2. Oktober 2011

Java 7, Java 8 and Aspect Oriented Programming

The following code snippet is a method body from a SQL abstraction layer framework I'm working on.
The framework logic itself is near to irrelevant for this post, it's really all about the syntax.
All the method does is drop all columns identified as being obsolete from an existing table and add all newly defined columns yet missing in the existing table.
It does that by getting a Connection instance from a datasource managing a connection pool, executes the alterations through it, wrapping potentially occuring SQLExceptions in an exception type of this layer and finally (pun) gives that connection back to the datasource (the pool).

final int[] updateCounts;
final DbmsDatasource<H2DbmsAdaptor> datasource = this.sqlContext.datasource();
final Connection connection = datasource.getConnection();
Statement statement = null;
try{
    statement = connection.createStatement();
    this.addBatchDropColumns(table, obsoleteColumns, statement);
    this.addBatchCreateColumns(table, newColumns, statement);
    updateCounts = statement.executeBatch();
}
catch(final SQLException e){
    throw new SqlEngineException(e);
}
finally {
    JdbcUtils.closeSilent(statement);
    datasource.takeBack(connection);
}

This first code snippet is with Java 6 syntax. Get a connection and do the usual try-catch-finally dance with the JDBC resources. 14 Lines of code gross for 5 lines of net code. That's around 200% here or generally 9 LoC of overhead required not once nicely centralized but every time everywhere some SQL Operation has to be executed. Very much boiler plate code. The horror.

The next is the same snippet with Java 7 syntax-level. The new Automatic Resource Management (ARM) at least eliminates the annoying silent closing stuff and the blemish mutable Statement variable.

final int[] updateCounts;
final DbmsDatasource<H2DbmsAdaptor> datasource = this.sqlContext.datasource();
final Connection connection = datasource.getConnection();
try{
    try(Statement statement = connection.createStatement()){
        this.addBatchDropColumns(table, obsoleteColumns, statement);
        this.addBatchCreateColumns(table, newColumns, statement);
        updateCounts = statement.executeBatch();
    }
}
catch(final SQLException e){
    throw new SqlEngineException(e);
}
finally {
    datasource.takeBack(connection);
}

The major pain point, the every-time explicit exception handling boiler plate code and the pooled instance management remains. This is a typical example for what the Aspect-Oriented-Programming (AOP) guys call "cross-cutting concearn" or an orthogonal aspect of program code. While they still work on things like magically weaving cut points and point cuts and undebuggable whatevers that nobody actually wanted into bytecode and still tackling with problems like "doing it right would either blow memory or performance or both" (as I read somewhere in AOP articles), the same abstraction can be achieved utilizing functional programming.

What about shifting the cross cutting aspects of managing the pooled instance and the exception handling (both together or loosely connected as seperate modules) into a method accepting the net code as a modular function.
Like in the following method in the DbmsDatasource implementation:

@Override
public <R> R executeJdbc(final JdbcExecution<R> jdbcExecution)
{
    final Connection connection = this.getConnection();
    try{
        return jdbcExecution.execute(connection);
    }
    catch(final SQLException e){
        throw new SqlEngineException(e);
    }
    finally {
        this.takeBack(connection);
    }
}

This reduces the on-site code to this:

final int[] updateCounts = this.sqlContext.datasource().executeJdbc(new JdbcExecution<int[]>(){
    @Override public int[] execute(final Connection conn) throws SQLException {
        try(Statement statement = conn.createStatement()){
            Outer.this.addBatchDropColumns(table, obsoleteColumns, statement);
            Outer.this.addBatchCreateColumns(table, newColumns, statement);
            return statement.executeBatch();
        }
    }
});

Note that this is not just shifting boilerplate code into a method. It's shifting it from n on-site locations into 1 "aspect method" that only gets called in each of the n locations. So it doesn't really "count" when looking at the n code locations with the important
Of course the aspect could be even further abstracted into a dedicated function instead of a hard method, but that's not required in the example.
Anyway: using functional programming in that way is exactely what AOP is about: abstracting aspects like exception handling or resource management away from the business logic locations into a centralized point.

Currently, the anonymous inner class brings in the usual uglyness as a trade off. Still it's much less boiler plate code and more importantly: cleaner architecture.
And that will even go away when Java 8's project lambda will arrive (at last ^^):

final int[] updateCounts = this.sqlContext.datasource().executeJdbc(conn -> {
    try(Statement statement = conn.createStatement()){
        Outer.this.addBatchDropColumns(table, obsoleteColumns, statement);
        Outer.this.addBatchCreateColumns(table, newColumns, statement);
        return statement.executeBatch();
    }
);

Now, it finally looks like it should: hardly any boiler plate code, still everything is contained and properly in place: pooled instance management, exception handling, resource management - modularized, encapsulated where it should be but still fully controlable and debuggable. And of course the actual hand-taylored logic on-site.

Donnerstag, 28. Juli 2011

Collection Content Behaviour Constraints

A quick and hopefully short post about collections again. Conveniently a nice little diagram I drew a while ago. It shows in a very colorful way the deficiencies in "Collection Content Behaviour Constraints" (the different constraints between List, Set, etc.) in the JDK collections and how it should (must) be done to do it properly.

Here it is: CollectionContentBehaviourConstraints.pdf.

This is only one of many aspects that are similarily problematic in the JDK collections. Others are broken equality mechanisms, broken failfast concept, missing fast but safe storage access (internal iteration), built-in weak referencing, uncontrolled grown but still insufficent collection increasing operations, immutable collections, etc.
I'll try to draw comparable diagrams to demonstrate them for the paper about my collections framework.

The fascinating thing about this diagram is: It's complete.
I tried multiple times very hard to come up with additional meaningful constraints (apart from thread safety, parallel execution, observability, etc. which belong to a completely different concearn group outside of content behaviour constraints). There aren't any more!
Okay there are Maps, OrderedMaps, SortedMaps, but those are all just optimized implementations for KeyValue-Sets with the key being the constraint-dominant element.
What else could there be? A constraint that says a collection MUST always contain duplicates? Or Triples? Nonsense. That would be just a List<Pair<E,E>> or something.
A collection that allows every element only to be added once and never again? Nonsense. Any even more weirder constraint? Nah...

So "order > sortation" and "duplicates <-> uniques" and that is it as far as content behaviour constraints are concearned.
Quite calming if you ask me, that there's one (tiny) field in informatics that is really "complete" and won't have to get extended and extended over and over in the future.
Even more sad so that it's not even nearly complete in the Java API's collections.

Samstag, 9. Juli 2011

Illusions are powerful abstractions

A little quote from the video in the previous post that I moved to a seperate post because it makes such a good title as well:

"Illusions are powerful abstractions"

In the video, this sentence is meant with a little different meaning than one might expect because Cliff uses the term "illusion" as a synonym for "proper architectural seperation" and wants to express a good thing with it (at the beginning, at least, hehe. Later on, I'm not so sure. Maybe that's why he picked the term, to evolve it during the talk).

But it's also a very describing sentence for the common naivity and half knowledge of many people. Those things that happen everywhere all the time: someone (mostly from management) with only half or even less knowledge of some matter (say, software development, suprisingly) thinks around a little from his (shall I really add "or her"?) tiny point of view, has an idea based exclusively on the most simplest of all cases (which actually never happen in non-trivial projects beyond simple marketing examples), comes up with some sort of "solution" for them and then shouts out "Heureka! This is ingeniously simple and will cover ALL possible cases! Go implement it, you care about the remaining 'corner cases', I'll build the next great idea meanwhile".

(Design) illusions are indeed powerful abstractions. They cover all the real world cases with a minimal effort. Bravo!
Now we just need a universe, a world, a hardware, a language and a JVM that can implement such design illusions to actually work and we're fine.

Questionable use of Soft References

While this post isn't really about my collections framework, there's a good story in it suitable as an introduction to what I actually wanna write about:

There's an inbuilt support for volatile elements (weak referenced elements, etc.). From a developer's perspective, there's simply a factory method for, let's say a new hash set collection instance to which you can pass a ReferenceType.WEAK (or SOFT or STRONG, with STRONG being the default for an overloaded alternate version in case you don't bother selecting reference type at all).
Internally, there will be created an apropriate HashLogic instance pluged in the new hash set instance to handle elements (or references to them, to be more precise) that can just vanish any time.
Some months ago, I removed all the HashLogic variations for soft references. Not they can't be handled any more. If at some point someone (maybe me) implements them again (for the then matured architecture compared to that some months ago), they'll just work fine without the framework having to be adjusted.
The reason was: while weak references are pretty useful, I never saw any soft reference in practice, only in theory and in academic examples for the different reference types. And personally I never could get rid of the feeling: if your programm touches the boundaries of the heap, something else already went wrong before. Soft reference might be useful here, but only for a few corner cases which I don't care (at the moment) to support.

Now to the actual part:
I just watched the Google Tech Talk "A JVM Does That?" from Cliff Click (is that really his name? ^^). Very interesting. He has a high throughput of information, but not too fast, just about right.
At 30:50, he gives a very good example for why soft references and phantom references (the latter tbh I never really understood... :-/ ) are of questionable use.
Just two things as a comment:
1.) Good example
2.) I knew it! (hehe)

Later in the talk he states similar for weak reference, but at this point he just slides in ("maybe weak refs are useful").

Maybe a little objection about weak references: Yes, they are useful. Probably not how they are used most of the time (like "make everything weak an you can never get memory leaks" and then all of a sudden: unplanned null, exception, crash.), but I have one very good example where they are useful:
If you have some sort of "meta registry" that has to watch over some/all instances in your running application and has to manage them in some way, it has to keep its fingers out of the "actual work" your application does. Meaning if may only observe living instances, but never be a cause for them to live in the first place. If such a meta registry would hard-reference the instances it observes, no instance could ever get garbage collected. Or in other words: it would be the biggest, complete, "perfect" memory leak. For this reason, a meta registry MUST use weak references.

Btw: What's a real use for such a "meta registry?". The reference registry of an OR-Mapper, for example (and btw: yes, it's a REGISTRY, not a CACHE as all the existing ORMs call it. It main reason is to keep track of already handled instances to guarantee referential integrity). The ORM has to know which of the living instances as which id etc. associated even if the instance doesn't contain that information on its own (because ORM should be orthogonal and not require to add thousands of annotations and special fields like existing ORMs do simple-mindedly). For it to know the instances, it has to reference them. But it may not strongly reference them as it would cause the "perfect" memory leak (or burdon the developer using the ORM to do its management work manually by unregistering instances etc. explicitely which is a really horrible thought).

I guess those kind of situations is what brought him to saying "maybe weak refs are useful" and the reference to them not being proper architecture later on aimed more at the unwise uses of them.

Still I would agree that such application logic should be done on the application level (the ORM framework in this case) and not via some "magic" on the GC level. But for that to work on application level, the VM would have to provide a simple and fast way of telling how many times a certain instance is referenced. If that count would be only 1, a strongly referencing registry could derive "oh, no one except me is referencing it anymore, so I can let go if it for it to get collected".
Problem with that would be:
a.) AFAIK, JVMs and GCs don't work like that. There is no hard cached per-instance "reference count" the application could access.
b.) Such a logic is actually part of garbage collection. If the registry would do that, JVM engineers like Cliff would come along again and say "That's GC work, leave it to the GC. Take weak references instead and let the GC handle it!" I hope he would say that because if he would just deny both options, he'd just ignore real world and proper design needs of having an unintrusive automated reference registry for certain situations like ORM. OR, hehe, the JVM would provide a built-in service for ORM. But then still there would be other applications for keeping track of references inside the application.
One way or another: weak references do have their right to exist (even for common cases).
Soft reference most probably don't.

UPDATE:
Yesterday, a colleague pointed out to me that SoftReferences do have their use for caches. The stuff in the cache is present as long as there's enough memory. If heap runs out, the caches get cleared as needed and filled again only on demand.

Makes sense, but I'm still a little skeptic if working if a constantly full heap by design is really a good thing to do. And what about once heap is full the caches swamp each other all the time creating a live lock or at least extremely bad performance.
Still SoftReferencing sound only like some sort of "last resort" means to let the application live "a little longer" if it's already too late (memory-wise), where the actual problem is in the design that doesn't prevent memory running out.
Classic example of "I don't care about memory consumption, I just use soft references".

Another mentionable part in the video:
18:57 - People don't know how to write concurrent programms.
There's such a fitting fictional quote he's saying: "I don't know why it's busted, I've got some data race, so I throw another lock in, throw another lock in, thro - oh, the bugs went away! Okay, I'm done."

That is very well to the point. If you just look at the stupid Vector implementation in the JDK collections (oops, here's the collections topic and the evil JDK bashing again), which btw. everyone on the Java development side nowdays says that it's virtually deprected - you see a very good example for what he means. Let alone everything that normal application developers tinker all day when stumbling accross concurrency.
Synchronisation (or locking, to be more precise) is good and important. But it has to be used in the right way. For example it's very easy to implement a ThreadLocal that is NOT synchronized for reading accesses and doesn't even have to bloat the Thread class but is still thread safe (simply because you can exploit the special behaviour of thread instances if used as a hash key so that every thread will always only reach its own hash entry and never that of another thread and whooops: unsynchronized thread safe read-only access. You can find my complete and well documented implementation on sourceforge if you google a little).

Btw: just because acquiring locks gets faster and faster as the JVM guys are optimizing it further and further, it does NOT mean that one has no longer to care about them (I already hear the people saying "but the video says they only take a few nano seconds nowadays, so why should I care?").
The lock managment itself may someday cost zero time, but the code that is locked and executed while all other threads wait to enter it can still take any amount of time. So synchronisation problems are nothing that the JVM will someday optimize away, but are of architectural nature (luckily, because otherwise: if not for such complex problems, there would be no need for human developers in the future in the first place).

Donnerstag, 7. Juli 2011

Seperation of Concearns meets Collections

This is the first post actually dealing with the next generation collections framework I'm working on I mentioned in side notes.
One of the many tremendous improvements over extisting JDK collections aside from better performance, more inbuilt functionality and an SQL-like querying framework is - maybe it's the most important one - proper seperation of concearns.

Of course JDK collections are dealing with seperation of concearns as well, but only on a very rough, if not to say dilettantish level.
There is a hierarchical typing to some extent dealing with general collection operations (in java.util.Collection) and then extending it for specific concearns like allowing duplicates, order (in java.util.List) or disallowing duplicates (in java.util.Set), although the latter is pretty bugged, due to misconceptions in the equality concept (which I already blogged about here: Why equals() is broken and how to do it right) . Even this group of concearns (which I call content behaviour operations - may be a little clumsy but it's actually quite hard to come up with a nicer yet fitting name) isn't modelled properly, e.g. there are types that are sorted (which is a more specific type of being ordered), but that aren't ordered (which logically can't be, it's just a misconception in JDK) or there is no type only defining order without allowing duplicates or no type allowing duplicates without order or no set that is ordered (ordered, not sorted!), which every developer sooner or later is badly missing.

Other groups of concearns have even been completely ignored in the JDK collections. For example what about seperting concearns like getting, adding, removing, etc. operations of a collection? This is absolutely crucially needed, for example to define in an API that a passed collection has to at least provide some certain operations (like say getting and adding) or that a method will only return an immutable collection. With JDK collection types, you simply CANNOT do that typing. When designing an efficient API, this is for collection what Object or missing Generics was in past times: massively lacking of badly needed typing. Yes I know there's Collections.unmodifiable~(). But there is no TYPE reflecting those (subset) operations. Those methods only return intentionally broken instances of the general purpose types Collection, List, Set and all the nice typing system is null and void when passing those cripple-collections around as full scale general purpose types.
For a similar reason, you can't just publicly return a reference to your internal collection for others to add elements to it. Because outside code might as well delete or read content it actually should never be allowed to by design. That's the reason for many defensive copies, inconsistent data structures, decoration of collection operation in business logic classes, etc. All complicating development and costing runtime performance (because everything is copied back and forth all the time).

So this post is about how properly seperated element operation concearns SHOULD look like for collections (and how the DO look in the framework I'm working on).

First of all a new (or maybe not so new but defintely new regarding the JDK's current collections API) paradigm of interface design:
Specialized interfaces don't always have to be inheritance specialization, they can also be inheritance generalization, because they cleanly handle only one type of concearn and then get recombined to form a general purpose type.

For example:
In the JDK's collections, there's only the type Collection, declaring (on its level of content behaviour abstraction) all element operations in one monolithic cauldron: getting, adding, removing.

In my extended collections framework, there are (among others) the interfaces:
XGettingCollection
XAddingCollection
XRemovingCollection

And then:
XCollection extends XGettingCollection, XAddingCollection, XRemovingCollection

(As a side note: The X stands for extended, of course. Not only as a short distinction from the java.util. types, but also to quickly narrow down the IntelliSense content to collection types. Very similar to Swing's J~, with the difference that here it's about proper types and not narrow-minded classes).

I spare to explain everything in detail, it should be quite obvious that the getting (or querying) operations go into XGettingCollection, and so on.
The important thing is: This does NOT complicate things.
You can still just write "XCollection<string> strings = ... " and have the familiar general purpose type, all fine.
But now you CAN go into the fine grained details of single concearns if you want/have to.
You can write a simple "View" wrapper implementation wrapping a general purpose XCollection instance but implementing only XGettingCollection and thus reducing the possible operations to read-only access. (Of course I already wrote it, but you get the idea).

Now you can easily write something like:

public View<String> getStrings(){
    // View implements XGettingCollection
    return new View<String>(this.strings); 
}

Or the other way around if some client code shall be able to input additional strings but is not allowed to query what's already contained:

public Adder<String> getStringAdder(){
    // Adder implements XAddingCollection
    return new Adder<String>(this.strings); 
}

Or may add and query strings, but never remove them:

public Collector<String> getStringAdder(){
    // Collector implements XGettingCollection, XAddingCollection
    return new Collector<String>(this.strings); 
}

And, very important, properly typed immutable collections:

XImmutableCollection extends XGettingCollection

public XImmutableCollection<String> getStrings(){
    return this.strings.immure();
}

(btw it's no typo: "immure" as a figurative shortcut of "toImmutable()". Note that "immute()" would imply to make THIS collection immutable instead of instantiating a seperate immutable copy)

And all of a sudden multithreaded development and API return values become way less hurtfull but more elegant, concise, etc.

Note that there's a difference between a View and a ConstCollection (an implementation of XImmutableCollection): A View acts as a relay to reduce the operations of an otherwise still mutable collection to read-only concearns, while a ConstCollection is a hard fact, never to be changed again (unless some reflection operations mess everything up, but that's another story, applying to all immutable objects, of course).

Those are the basics of element operation concearn seperation (and I really can't help but to become a cynical when working with the half baked JDK collections that don't even provide them).

On to the next step:
How about integrating those element operation concearns of getting, adding, removing, etc. with the content behaviour concearns (List, Set, etc.)?
Quite straight forward:

XGettingList extends XGettingCollection
XAddingList extends XAddingCollection
XRemovingList extends XRemovingCollection
XList extends XGettingList, XAddingList, XRemovingList, XCollection

(note that I leave out the Generics "<E>" element typing, but of course it's there in real code)

This does the trick quite nicely: There are still seperated element operation concearns on the List level of content behaviour, but at the same time XList is still a XCollection and even XGettingList is a XGettingCollection.

Same for XSet, of course.
And for XBag (the content behaviour type defining only to allow duplicates but not order), XSequence (defining only order. Btw XSequence is actually a middle step between XList and XCollection that I left out above for greater familiarity for people only knowing the mixed up JDK collection types so far) and XEnum (which combines XSet and XSequence to have ordered uniques - salvation at last).

Admitted, this two-dimensional inheritance causes a little more complexity than just a single super type. There are sometimes 4-5 superinterfaces in one interface. But that's not a bad thing. It's still as simple as writing XSet<Double> or XCollection<File> or XGettingList<File> or whatever you need without caring much about the net of concearn combinations. Yet it's complex enough IF you need it for fine grained purposes.

Next step: More than basic
I called Getting, Adding and Removing "basic" concearns for a reason: they apply to every content behaviour type of collection. Others only apply to more specific collection types.
Like for example everything that has to do with an index is only available from sequence on (yes, sequence already, not list!), because to being able to access elements via an index, the collection must guarantee to "play along", to maintain order.

So there are advanced element operation concearns:
XPrependingSequence
XInsertingSequence
XShiftingSequence
And 2-3 more (but then it's complete, really)

Not that there is a seperate implementation for every single concearn or every combination of concearns, but it's reasonable that it might get necassary to create one for a certain combination at some point. Or to create a reducing relay wrapper (like one only allowing shifting of contained elements among each other without removing or adding anything. E.g. for sorting externally).

There's also a nice payoff included if you take the effort to properly seperate those concearns: Those wild grown special border-types like Stack, Queue and Deque become superfluous. A Stack is nothing more than the concearn types already containing some convenience methods like first(), last(), pop(), etc. A Deque is nothing more than a general purpose type implementing XAddingSequence and XPrependingSequence. So there's a medium number of properly designed and combined fundamental types, which conceptionally replace and outperform the JDK's small number of simplistic types plus medium mumber of organically grown special-purpose types. Apart from that, the JDK excrescences are mostly not even proper types (interface in Java), but just extensions of implementations. Like Stack extending Vector (not even ArrayList but the deprecated Vector!). Oh dear, really: properly designing in an interface-based language (which Java is and always has been from the start) looks different from what can be found in the JDK collections. And that for the most fundamental modules of both data structures and algorithms (which collections are. Fundamentel tools, really. Not just some ".util"). Very sad (honestly: I'd rather spent my time driving forward my next generation SQL- and ORM-technologies and not change down to fix the basic tools).

There'd still be much to write about.
Like XSortation, XSortedList, XSortedEnum types, XMap and XTable (with table being a ordered map, intuitively) and their attached satellite instances for Keys, Values and Entries. Or whole new groups of concearns like implementation storage types (mostly arrays and chains - btw. chain is a more fitting term for what everyone talks about as addon adjective "Linked~"), null handling, proper HashEquality, Functional Programming (without all the continued back-and-forth-copying and overcomplicating eager and lazy design the lambda-dev is talking about for extending the old broken collections for functional programming), integration of volatile elements (weak referencing), etc.
Oh and of course how to get it all backward compatible to JDK collections (yes it is) using satellite instances and provide reverse-upward compatibility to wrap-up JDK collections as extended collections.

While I'm currently in the process of finishing architecture and commonly used implementations in the framework, I'm concurrently working on structure and chapters for a paper in which I will describe all the ideas and new concepts that went into the next generation collections framework. Any collection topic that do not show up here in the blog will surely be contained in the paper once it's done. Sometimes it feels like I'm working on a PhD as a hobby ^^. But nevertheless.

Sonntag, 3. Juli 2011

Variables are not fields, functions are not methods

There's a certain mistake in terminology I notice more often the more insight I get in advanced Java development (and that I of course did in times past as well) that increasingly annoys me:

A "function" is not the same as "method". And "variable" is not the same as "field". There really is a reason why they have different terms, a very important one. Really.
If you are not aware of the reasons behind it, you might very quickly think "Oh my, what's he up to now? Let go already, it's about the same, so no point in being so pedantic." I for sure did in the past. So it's fair if it's me who writes a text saying "No really, it's important, listen...".

So what's the important difference:
Methods and fields are object oriented constructs with enhanced characteristics, while variables and functions are just plain procedural things with reduced characteristics. And that difference in characteristics can mean the difference between proper understanding (and in the end working code) and half-baked understanding (or broken code). If we want to be very correct and play around with OOP terminology a bit, we could say "Field extends Variable" and "Method extends Function" (or even more precisely: "Function extends Procedure" and, well, "Method extends Procedure or Function depending on context"). But I'll keep that part simple and say that they are "completely" different things.
One might argue here that it actually should be "Field extends Variable" and "LocalVariable extends Variable" with Variable being the abstract super type and thus calling a field a variable is absolutely correct and that the word "local " before the variable has to be used consequently to determine the difference. I say: okay, but: a) the problem remains that by just saying "variable" it's not clear what is meant with it and b) it's much easier and clearer if "field" stands for fields and "variable" stands for local variables. Honestly, think about it. This still allows to add the "local" all the time if you wish, no problem, but it solves the ambiguity when talking/writing.

1.) Field vs Variable
Variables (in Java) are those things you write inside blocks. Like "int value = 5;". It holds a value (a primitive value or a reference value) and it stand for a index in the stack (the fast accessible exclusively owned peace of memory of every thread). That's about it.
Fields are much more. They, too, hold values (the same ones variables do), but there the similarity ended already. First difference: their values lie on the heap (the not so fast and big thread-common area in memory). Performance optimisations like escape analysis etc. can cause the values to lie on the stack, as well, but in general, field values are located on the heap. That means: every thread can access the same field. Which of course is good, because it allows for efficient thread communication. But it can also cause all kinds of trouble (see video below).
Fields also belong to a class. Fields are accessible and analyzable (and corruptable) via reflection. Fields are complex object oriented structures, not just mere procedural labels representing a value.
The simple nature of (local) variables is also an notable advantage: they are fast. And they are (thread)safe. If an algorithms works on a variable, there's peace, calm, efficiency, as no other thread can possible mess up with it while "your" thread is working on it. And it's also as fast as you can get, as everything happens on the stack. While on the other hand when working on fields, you always have to keep in mind "can this field be accessed by other threads simulateously?" and "are there unnecessarily repeated accesses to that field in a loop?". Fields are powerful, of course, but they are also much more hazzle that has to be taken care of.

So if someone talks about a field, but says "variable" all the time and you really internalized the difference, you have a hard time understanding what he's talking about. Funny thing is: if you see both as synonyms and don't know or care about the crucial differences, you aren't confused at all. Like having less insight into a matter lets you understand it better o_0. Of course this is just an illusive phrase as ignoring the differences will at some point make you pay for it.

Nice example: There's an excellent Google Tech Talk about "The Java Memory Model" (which in the end just deals with concurrency - and I'm proud to say that I already knew the important stuff. Like "synchronized" has a double meaning of block & synchronize, that volatile is a synchronize-only for fields - NOT variables, by the way! - and that you have to care about "happens before" relationships when dealing with concurrency. But really, if he'll say "volable" just ONE more time, then I'm... argh).

What caused a little confusion was, that the guy always mixed up "field" and "variable". For example by saying things like "another thread might change your variable". Knowing about the difference and seeing him as the expert that he undoubtly is, I immediatly "paniced" and thought: "WHAT? Can threads unsynchronizedly access variables as well? Is everything I build my concepts on wrong?". Luckily for me, my world doesn't have to fall apart. He just used the wrong term. Phew.
Moreless the same slopiness is to omit the "this." before a field. Of course it is optional for the compiler, but it makes it so much easier to read and quickly understand which ones are (complicated) fields and which ones are simple variables in a piece of code. This is so important and so simple to do (even automated by the IDE) that I slowly but increasingly can't help to see people who omit it as newbies or ignorant at best.
He does that as well (see 38:00 or 43:00), but of course I wouldn't call him a noob. I'd rather say he (slopily) doesn't care about it, same as he mixes them up when talking. No problem if he gets along with it, but it's definitely more difficult for others to read his code and follow what he's saying.

On a side note: we'd need much less concurrency lessons like that if little things like differences between fields and variables would be propagated better. Most of the problems with concurrency come from not making that distinction in the first place. At many of his examples what common mistakes and wrong intentions one might fall for I just thought "how could one possibly expect this field to be safely usable in a multithreaded program?" and that just because I know about the difference of fields and variables.
If you don't care for the difference, don't watch your terminogly and don't use "this.", then the problems start (or get severel times worse).

I keep two simple rules concearning fields and variables that help me tremendously in day to day Java developing:
1.) Fields are potentially always accessed by another thread (unless clean architecture guarantees otherwise) and therefore have to be at least volatile or used synchronized if the code is to be used in multithreaded environments.
2.) Working with fields is slower than working with variables, so they should be cached in variables when used in a loop.

Keeping those two rules in mind for every code you write increases the software quality by a felt 200%. At least that's my personal experience.

A side not about performance, because it's very interessting:
There's a kind of suprising relation between the performane of read and write operations of fields and variables, which is as follows (1 being the fastest, 4 the slowest)
1.) reading from variable
2.) reading from field
3.) writing to variable
4.) writing to field

One might think that both variable operations are on places 1 and 2 and the field operations are behind them. But that's not true (as explained to me by the running VM *cough*).
Reading from a field is faster than writing to a variable (a register). May have something to do with caching fields on the stack etc., but nevertheless: What we can derive from this is: Assigning stuff to a variable/field is actually pretty expensive, at least compared to reading. That's why it doesn't pay off to cache a field value in a local variable and then only use it 2-4 times. Yes those 4 local reads are faster then the field reads, but the storing beforehand took like 10 reads' time, ruining the optimisation. So caching field values in local variables pays off, but mostly only in loops (or to ease concurrency situations, of course).
But again a very valuable piece of the puzzle. Knowing this, one might think of field accesses as "tiny micro database accesses" from an algorithm's point of view, with the "application" being the local algorithm and the "database" being the instance lying on the heap. Tiny of course. Micro. You get it. Still a helpful picture to keep in mind for writing fast software.

To sum it up, it's a very good idea (if not mandatory if you want to call yourself a quality Java developer) to take advantage of your IDE's syntax coloring and automation to make the difference as readable as possible. I came up with a pattern that is quite intuitive and helpful:

1.) Activate automated "this." qualification.
2.) Make missing "this." qualification be indicated as a problem/warning (yes, really. Because it IS a problem)
3.) Make local variables and parameter green (green implies piece, calm, not critical, simply "green light", all is good)
4.) Eclipse's default blue color for fields fits in there very well: all fields and variables are colored, either green or blue, where blue means (not critical but at least) "Uuh, special, watch out".

Of course the green color has the consequence that comments have to be recolored from green to... hm, what would be a good color for some syntactiallcy meaningless, addon, optional, external text? Grey! Of course. Come one, what where they drinking when deciding that a comment should be colored green? Ah, whatever.

Here's a small example for my color scheme:

It immediately shows all the safe, uncritical, stackbased local variables at a glance and the object oriented, heap based, "micro-database" fields clearly distinct from them.

One might argue that having too much "this." everywhere in your methods clutter up the code too much. I'd say if you consider "this" qualification as clutter that should be avoided, you are ignoring large parts of the concept of object orientation which is very bad (if not for you internally, then at least for others who have to read your code) and maybe even your code should be overhauled to make less repeated field accesses.
There's one rule of thumb in Java development: Reading is more important than writing. So if you say you can write code better if you leave out readability helpers like "this.", honestly: no one cares. Make is so that many others can read and maintain your code well (at best easily self-explaining code with speaking variable names etc. plus explaining comments where they are useful), than it's good code. Not if you typed it a little faster, once.

2.) Methods vs. Functions

Most of the stuff said for variables applies to methods as well. Part of a class, etc. Qualify with "this.", although here the intention is more to clearly distinct them from static methods. Because in Java, believe it or not, there are NO functions at all. No "function" whatsoever ever being mistakenly named as such did not belong to a class. There is no ".java" file containing just a bunch of "public int calculateStuff(int a, int b)" or whatever. They are all always in a class.

In contrast to fields, there are object-oriented-wise two types of methods, of course. Static methods and instance methods. Okay. Still both of them are methods belonging to a class, having access to non-public fields of that class, etc. But non of them are functions.

Making the difference still makes sense and is even more important than the difference between fields and variables, because, imagine: (actual) functions are coming to Java!
They're called lambdas for now (google project lambda, scheduled for Java 8) and of course they will internally just be classes themselves, just like anonymous inner class instances (but maybe be substituted by method references as Remi wrote on lambda-dev, I don't exactely look into the matter enough to understand it completely), but conceptionally, they're real functions.
Bad times for all the people currently calling methods "functions" as well.
I already see Massive Confusion 2 coming up in Java theatres: "We moved it to a function" - "A function? Are you nuts?!" - "I mean a function in the class. In the instance. You know. The old function, not those new ones." - "Ah, you mean a method. Then just say method please." - "Isn't it the same?" - "OMFG..."

Honestly: No, it's not.

On a more general point of view, it's also very helpful to make something clear for one's understand of source code: We still develop procedurally. Objectorientation is not something different from procedural development, it's an extension. Again more like "ObjectOrientedProgramming extends ProceduralProgramming". And this is very simple to see in the code itself:

The "blocks" (mostly method blocks, but also initializer and static initializer blocks. So all the stuff triggering the actions between the curly braces, excluding array initializers and type bodies) are still plain procedural programming. Procedural "islands", if you will. (Only!) the things around them is the object oriented stuff. Or object oriented foundation (that older, procedural-only languages are missing) where the procedural algorithms are embedded. The curly braces for class and interface bodies are something completely different from those enclosing blocks. They might as well have used another symbol for them. The constructs floating around in the OO waters are all accessible by reflection (Class, Field, Method, Constructor. Modifiers, etc.), the stuff in the procedural blocks is not. Those block braces are like a sign "you are entering procedural land now, reflection does not apply here". Same goes for modifiers like public, protected, etc. They're just not applicable for procedural code. After learning (or no longer ignoring) the difference between fields and variables, it's also easy to see why a "volatile" modifier makes no sense for a local variable.

There's a ... <diplomacy>not so good</diplomacy> programming language that shows this difference in principle very well: Caché Object Script. There, the OO land has one syntax style (pretty much copied from Java, but anyway) and the blocks still (mostly) have the syntax style of its predecessor language MUMPS (<- no joke). Like outside blocks (OO land), you have to end a declaration with semi-colons, while inside (procedural land), you may not end a statement with semi-colons. Etc. If this language has any value, then it's the visibility of the difference between OO waters and procedural islands.

Seen on that general level, the difference of terms represents the difference of context:
Objectoriented: field, method
Procedural: variable, function

Thank you for reading (got a little long) :-D

Sonntag, 19. Juni 2011

Why a NotImplementedYetError is needed

Everyone working on non-trivial software projects knows code snippets like the following. Especially if work-in-progress parts (say, a framework) are already used in other parts (like an actual customer project) while still growing and being under construction. Still, there are two things badly wrong with it. What are they?

@Override
public boolean prepend(final E element)
{
 // TODO Auto-generated method stub
 return false;
}

First the positive thing: There's an IDE-processable construct, the TODO, that marks this method as to be implemented for the developer. The IDE can easily list all TODOs, search and filter for particular ones, etc. And the code gets compiled just fine and the class (its already implemented parts) is usable. Fine.
This is completely sufficient if the method has been created just now and gets completely implemented the same day or maybe the next day. That should always be the case in theory and after all is very often the case in practice as well. But the practice also shows that many of those stubs remain. Not for "ever" or for years, of course. At some point, you should really get to implement them and get rid of the stubs. But in all but smaller projects, it can really take up to a few months to fix the last of them (and while you do it, the structure of the still evolving projects still changes a little bit here and there, required new interface methods which in turn to still have compilable classes create... right: more stubs).

So let's just say: until a piece of (non-trivial) software gets to a major release version, it will most certainly contain a number of "not-implemented-yet" stubs. For example the next generation collections framework I'm currently working on contains 224 of them as I type this sentence (which sounds more than it is as many of them will be solved by delegate calls to the same methods, but anyway: I surely won't fix them all today or tomorrow. Some of them maybe not until half a year from now).
Not desirable in theory, but simply reality in practice.

Given this situation, again the question: which two things are badly wrong?
Answer:
The lesser one: there's no annotation (existing in the java language) telling the IDE "this method is not implemented yet, mark it in warnings / intellisense and the like".
But the far greater one: the problem (actually the bug) of not being implemented yet doesn't show at runtime! The method as it stands there (representing all "stubs") is a construction site in disguise, pretending to be a completed, fully functional method. In this case: a boolean returning method called "prepend", taking (and ignoring) one argument of generic type E and always returning false. Why not. Valid code.
One might be tempted to say "well, that's the idea of the error the compiler gives you when you leave out the return". Not quite, because: a) there are void methods, too. And b) the compiler only watches over your syntax, not over the semantics of your code. Meaning it still makes sense to make the syntax compilable even if some parts of the logic it defines are still missing.

The problem is that a stub like this pretends to be something it isn't: finished work. If it gets executed, you have exactely the same problem you had before exceptions and automatic null and array bound checking and type safety etc. were invented: the program just continues, doing stupid things and eventually break, maybe somewhere deep deep inside the program at a completely different point. The solution is always the same: the program has to fail right where the error is, not carry on running around like a zombie that finally stumbles against a tree, but right there.

The first idea to solve this is to replace the vicious return stub code with a "throw new UnsupportedOperationException()". I did that, too, for a long time. At least it causes the program to immediately fail at the right spot. But it's only the half way of what should actually be done. Because it is an unclean, abusive workaround, not a proper means for the situation.
The concept of the UnsupportedOperationException is to cause an exception if a method gets called in an implementation that does not or cannot support the request operation. Can not in terms of "even if it's completely implemented, it can't support this operation". A good example for this are methods like add() or set() in java.util.Collections$UnmodifiableList. The class implements java.util.List which defines add() and set(), but does not and won't ever support them as it's intentionally not modifiable or "immutable" or just a constant list (which, btw. is a terrible misconception in the old JDK collections: If a list implementation is immutable, it should be of a type that only provides reading operations int the first place. Not implement a mindless undifferentiated all-in-one type that defines operations that run against the idea of the implementation - because again you have the situation that some piece of code, the constant list, can pretend to be something else, a general list, be passed along as such a list and only eventualy at some completely different point crash the program. It's a little like having no type system at all).

But that is not the point here. The implementation can and will support the operation, it's just not implemented, yet. So if I keep abusing them for cases like this, all I get is people writing code that eventually calls that method (maybe because the unfinished SubList implementation gets used in a piece of code that normally only deals with the general purpose List implementation), getting an unsupported operation exception and being confused, thinking "it's a normal operation, why isn't it supported?".
The clean, proper, professional way is to have some exception (actually not an exception but a true error) saying "Sorry, this method is not implemented yet, but you can bother the author to move it up in priority right now for you if you need it."

So what is really needed (and sadly one of the many many trivial and basic things that are missing in the JDK) is a "NotImplementedYetError".
The little weird, even funny, thing about that is: It's a piece of code, a part of the software, that should actually never be used in the software (in theory, if all part of a software would be completed immediately). So it's unnecessary by design, but still necessary in practice as a pragmatical "construction site" sign. Maybe it could best be seen as a "meta" error.
Positive side effect: You can easily search the project for all occurances of that class and you'll precisely get all methods and their classes that have construction sites, without getting all the actual occurances of not supported operations that you would have to filter out manually every time.

For these reasons, I finally extended my core project by a "net.jadoth.meta.NotImplementedYetError" class and overhauled all workaround stubs to use them (btw. that's also how I can tell that I currently have 224 of them). Of course I would immediately (and gladly) replace it by a java.meta.NotImplementedYetError() (or a java.lang.NotImplementedYetError or UnimplementedMethodError or whatever, as long as it's the proper means). But I doubt that it will ever happen. Independant of how useful or how needed or how unproblematically orthogonal a new feature is, there are always people screaming "no, this has no business in the java language, it's horribly bad style because of blablabla".

So the example from above now looks like this (and with it, every other stub should look like it from now on). Oh and as a side note: a not yet implemented method is of course an error to be fixed (hence the critical FIXME), not a mere "TODO" like "Maybe, if you have time, come back and implement it".

@Override
public boolean prepend(final E element)
{
 // FIXME Auto-generated method stub
 throw new net.jadoth.meta.NotImplementedYetError();
}

At the end a small note about the mentioned annotation: same problem here, as said. Current workaround is to use @Deprecated, but that is of course no clean solution. Again, an extenstion of Java itself would be required, like @java.lang.NotImplementedYet . I don't bother too much, here, as methods get mostly called via interfaces and not via classes, so the markup would not be visible anyway. The important thing is the runtime error.

Sonntag, 22. Mai 2011

Assignment Inlining

What is the difference between the following two code samples:

1.)

final int[] values = this.someValues;
if(index < values.length){
    return values[index];
}

and

2.)

final int[] values;
if(index < (values = this.someValues).length){
    return values[index];
}

Both do exactly the same thing and both a equally long. The second one is a little more complicated to read, granted. But it is also reasonably faster.
Why is it faster?

For a very natural reason, that applies to CPUs as well as to human beings. Let's say CPUs are only human and work on a desk with a shelf (= CPU registers). A local variable is effectively just a CPU register or a slot in the shelf, for that matter. To fetch a sheet of paper (load a value of the variable) from the shelf (the register) or to put (store) it there takes time.

Example 1 instructs the CPU via bytecode to:
- write the address of the array on a sheet of paper
- put the sheet into slot 1 of the shelf
- fetch the sheet from slot 1
- compare the length of the array on the sheet with the index
- ...

Example 2 instructs the CPU via bytecode to:
- write the address of the array on a sheet of paper
- and compare the length of the array on the sheet with the index

(Okay the comparison is a little lopsided: actually, the value on the stack is duplicated and stored away once instead of stored away and loaded back. But the picture stays the same)

It's quite obvious: The instructions of the first example are a little... well.. silly for the executing CPU or human being, because the newly written sheet is put away just to be fetched again. Number two is a little smarter, because it says "continue to use what you've already got".

The performance gain is very small in numbers, BUT: If a method doesn't do much else (like in the example), or if multiple optimizations of that type can be done, then the small gain compared to the anyway small workload becomes relatively high.
And that comes without "side effect". No structural disadvantage, no instability, nothing. Even variable finality is kept. It's just that the characters in the source code are arranged differently without changing their meaning and the stuff gets faster. That's all.

Sometimes, such an "assignment inlining" can even spare a local variable completely - and then it gets really fast (because storing to a register is surprisingly expansive).

The internet provides hardly any information to that topic, which I wonder about a little. Only one sophisticated bytecode optimization talks about replacing an equivalent STORE/LOAD by a DUP(licate) instruction. So I researched it on my own. It really works that way, every little test project can show it.

Of course that doesn't mean that everyone shall write bytecode-optimized code all the time.
It depends on the situation if it makes sense to do it.
If you have a piece of code that is called only once when pressing a button, or that assembles a SQL statement (which itself takes computer-AGES to be executed), etc., then there's absolutely NO point in reducing readability to gain some 0.000001% performance. In such cases, readability and maintainability of source code is much more important than any performance optimizations.

On the other hand, if you implement something like a Comparator, a tiny piece of code (3-4 lines maybe) but which gets called a million or a billion times subsequently, then such a small change can make a big difference. Especially when algorithms grow logarithmical or even exponential in runtime complexity (like sorting, for example), this can quickly mean the difference of several seconds response time per action for the user.
(and honestly: in 3-4 lines of code, even inlined statements still keep up readability)

This is one of many performance optimizations I use in my "Next Generation Collections" framework, all in all resulting for example in GrowList.add() (the behaviour-wise cleanly named equivalent to the implementation-wise named ArrayList of JDK) being 200% to 300% as fast as ArrayList.add().

Montag, 16. Mai 2011

Why equals() is broken and how to do it right

Two simple explanations about the intention of equals() (and always associated, of course, hashCode()):
1.) I once read in some Effective Java book from Josh Bloch something like: "When designing a class, you should think about: is it supposed to be a value type, then override .equals() and .hashCode(), otherwise don't and leave the default identity comparison."
2.) Most tutorials briefly streak the implementation of equals() as considering the "significant" fields of the class and then they go on and on about how equals() and hashCode() have to be implemented consistently (a little unnatural, isn't it?) and that it is needed for hash collections etc.

Both statments are conveniently simple and as most conveniently simple things, they are only partly correct.

Ad 1.) Nice idea, but what ARE value types? Okay String and Integer are value types. What about Date? Or Person? Aren't those value types, too? What the simple value type definition is lacking is the view if immutability. Date is not immutable, an instance of Person most probably won't be, either. Collections themselves are not immutable in general. So what happens when you put those mutable objects in a hash collection and change their internal state: the collection becomes inconsistent. It becomes broken (I won't go into the basics here). This is often described in guides as "the collections will do weird things" (how funny if your application depends on correctness) or they sell it as being the programmer's fault (hello? I simply NEED to have mutable objects in many places. Why does that mean that I can't use elemental data structures any longer with them?)

On the other hand, still some kind of logic is needed to evaluate if two objects of one type are content-wise equal.

Ad 2.) So what about the "significant" fields? Wait a second: what ARE the significant fields? Doesn't that depend on the situation? How should we know when implementing a class? Non of those guides explain which fields to use for it. Because it CAN'T be decided once and for all.
In one place, your business logic has to ensure that a collection contains only distinct person instances (say with distinct first name, last name and date of birth). Somewhere else, the persons have to be distinct by hometown. Or SSID. Or whatever. And in the next place, the logic is about actual instances (reference equality) again (say in an OR mapper registry that keeps track of loaded instances).

There really is a TON of ways what "equality" can mean in different contexts:
Equality of...
... reference (identity)
... data identity (what a PRIMARY KEY in a database is)
... full data (with transient fields, without transient fields?)
... some special key data combination (like additional UNIQUE keys in databases)
... data identity and behaviour (type)
... full data and behaviour (type)
... and so on

Which one to implement in a single hard coded equals() method?
The answer is: none, because it is not that simple. Because equality changes with the situation.
It also won't lead you anywhere to extends subclasses every now and then just to override a troublesome equals() (yes, and hashCode()) implementation (SsidEqualityPerson, lol). Your software complexity will explode without need and apart from that, there's no multiple class inheritence, which alone will kill attempts like that. Almost the same goes for creating decoration classes over and over. In any case, both approaches will kill referential integrity (which is crucial for advanced object graph technologies like OR-Mappers)

The right way to do it is: define an Equalator<T> interface, pretty similar to the already existing Comparator<T>.
So and only so, you can define the right equality for the right situation.

And what about the hashCode which has to be consistent with the equals logic? Trivial: HashEqualator<T> extends Equalator<T>.
Here you are. And notice: that way, you have NATURALLY defined the contract that whoever implements equals() (or equal(T t1, Tt2) now) also has to implement hashCode() (or hash(T t) now) as well. And all of a sudden, the architecture gets less weird (like the unnatural "you have to implement equals() and hashCode() consistently although it's not really obvious why from the language") and yet more powerful and flexible.

Another indication of why equals() is misconceptioned: why, for probably non existing god's sake, does the parameter type have to be Object?
"Oh that's because on such a low level of language, typing has to be flexible enough to... blabla... because collections must... bla... no generics back then... blablabla". Bullshit. Because it's misconceptioned, peroid. Just like any other MyClass.doStuffWithOtherStuff(Object shouldActuallyBeOfTypeStuffButHeySorry) is that we all got a dressing-down in our early days when we tried to plug in another load of functionality in an conceptionally unfitting architecture. Nothing else.

Here's a another nice example for the misconception, actually even a bug, of equals() and hashCode():
All JDK collections assume that all equals() methods are implemented as either identity equality or "safe" simple value type equality. Yet they themselves break that assumption.
Following example (again I spare to type the explicit code sample):
Define two List<String>, an ArrayList and a LinkedList, both with content "A", "B", "C".
Define a List<List<String>> and add the ArrayList and then the LinkedList.
Tell the list-list to remove the LinkedList.
Which list remains in the list-list? Right: The LinkedList remains. The ArrayList has been removed by the command to remove the LinkedList.
Why? Because lists inherently always use .equals() internally to determine identity (which is extremely stupid, tbh). And ArrayList and LinkedList both implement equals only by (again stupidly) comparing their content, but NOT by comparing the type. This means a LinkedList and an ArrayList with equal content are seen as completely equal, up to being "identical" (huh?). If that is so, why are there two implementations in the first place? Why is everyone advised (rightly) to think about which implementation is needed for a certain situation, if all list implementation types are considered irrelevantly equal among each other and by the JDK itself?
The answer is: because it's ill-conceived.
Academic example? Hardly relevant in practice? I say: it's a clean indication for a half baked architecture that every now and then causes situations like "Oh my, JDK classes don't work like that although it would be perfectly logical. Now I have to implement a workaround and copy huge collections into arrays and back all over the place which will bloat my code, kill performance and create tons of potential errors".

The conceptional problem here is that equals() got overused for different meanings. On the one hand, it is used for identifying objects (either identity-wise or immutable-value-type-wise, such as Strings) and on the other hand, it is used for content comparison of mutable objects (like with date or with collections). By this unguided overuse, the consequence is that no matter how equals() gets implemented, it is always broken for the one concearn or the other. That's a typical "overlapping of concearns" instead of clean "seperation of concearns".

And don't get started on "that's why there's collections like IdentityHashSet". Because
a) They explicitely break their own collection contracts (also a very good sign of the oversimplified equality misconception)
b) And what about all the other types of situational equality? Do we really have to implement hundreds of PersonFullDataHashSet, PersonFullDataArrayList, PersonTypedDataIdentityArrayList and so one for every single type we want to use in collections? Of course not. This is ... Sparta maybe, but no clean seperation of concearns and sure as hell no quality software development)

Again, the revolutioning simple solution is: Inherent equality of ALL things is equality of reference at first. Everything else is done by Equalators as required.
For example:

stringList.contains("Hello"); // identity comparison
stringList.contains("Hello", String.valueEquality); // value equality Equalator
stringList.contains("Hello", MyUtils.firstLetterEquality);
stringList.contains("Hello", MyUtils.stringLengthEquality);

If this basic model then gets suboptimal in certain situations (e.g. because Strings are compared value-wise in 95% of the time and you don't want to pass equalator functions every time), there can be ONE alternative implementation that accepts a custom equalator to be used as internal logic that will fulfill all needs for custom equality. Something like:

VarList<String> stringList = new VarList<String>(String.valueEquality);
stringList.contains("Hello"); // value equality

Or when using Sets:

VarSet<String> stringSet = new VarSet<String>(String.valueHashEquality);
stringSet.contains("Hello"); // internal value hashing and value equality

Only then are collections braced for all situations of equality that is needed to do efficient quality software development. Note that this is not just some simple improvement that fixes one certain type of equality handling, but it fixes equality handling in principle by exploiting functional programming (really surprising why Comparator is around for so long but there has never been introduced an Equalator)

And what about Object.equals() and Object.hashCode() ?
Well, there are severel options here:
1.) Implement one HashEqualator<?> valueHashEquality that calls them internally to keep backwards compatability if you must.
2.) For actual value types, meaning immutable types (or at lest immutable in the state used for determination of equality and hash calculation, which I call "hashimmutable"), it's still okay to use them.
3.) Apart from that, as crazy as it might sound: @Deprecated equals(), @Deprecated hashCode() (along with the even more broken @Deprecated clone(), of course, which really cleans up the Object class).

This new (and the first complete) concept of equality is one of many revolutionary core concepts of the "Next Generation Collections" framework I'm working on and which is nearing completion and will be presented in a corresponding paper here once it's done in a all around usable degree. It fixes many other misconceptions and bugs of current JDK's collections framework and adds a whole new world of functionality and performance to collections, thus massively reducing software complexity (already does in my project at work) while increasing collection usability and performance at the same time.

Donnerstag, 5. Mai 2011

Java Single Page Reference

Some time ago, I had the idea of condensing all non-trivial Java syntax, specially treated standard API (JDK) components and basic conceptional terms on one single page - a Java Single Page Reference.
The intention is not to write another programming guide or such (would hardly fit on one page :-D), but to have a nice and lean overview of common language items that are enountered in basic and advanced Java developing with each item being a link to a detailed explanation somewhere else on the web. For example if someone hasn't dealt with, say the Externalizable interface or reflection or proxy classes, etc., it would be always a simple thing to open the reference page and find a link to a nice article on the unknown topic (because, believe it or not, some Java topics are still hard to google or the search takes at least more than a quick glance. For example researching what that "SAM" meant that everybody on lambda-dev was taking about was anything but a quick thing to do back then ^^).
Also, I find it intriguing and inspiring to have the whole language (on a very condensed level, of course) at a glance.

Sadly, for now it's just a plain PDF without any links, but the goal is to create it a nice HTML site, of course.
Still, it might be worthwhile to present it even in this early form. I'm sure many people can find an intresting item to google here and there that they aren't aware of, yet. Or post a comment about a missing item that I don't know, yet, of course :-).

So, here it is: Java Single Page Reference (PDF)

Montag, 25. April 2011

The Jadoth Programming Paradigm

The first actual blog entry is about the name of the blog (or better to say the name of my project-framework)

"Jadoth" is a kind-of-acronym for "Java dot h". It is a (somewhat funny) allusion to header files in C/C++ (dot h) and hints to the programming paradigm to define everything that is used as a type as an interface instead of a concrete class at first and include the actual class as a nested class called "Implementation" inside the the interface.

Here's the motivation for it:
If you think about it, is it really the best thing to do to define a type (an interface), say named "MyMessageProcessor", and then define an implementation (a class) named "DefaultMyMessageProcessor", and that for EVERY type in your project? If you have once seen a project with dozends of "Default~" classes next (or even in an abused neighboring package exclusively created for grouping all the "Default~" stuff) to the actual types you (at least me) can't help rolling eyes; then if you even want to quickly find a certain implementation in the IDE's intellisense but all you see is a large list of "DefaultThis", "DefaultThat", "DefaultSomethingElse", you can't help but think "come on, this can't be real!".
To get some relief, what most developers do is simply define their (actually architectural abstract) types as concrete classes right away and name the implementation itself "MyMessageProcessor". I did that too, in the past, blithely thinking "a class is sufficient here, I just only need one implementation at the moment anyway, and for the future... well that will surely be alright somehow". But of course at some point in the future, you pay big time for that little sin that grew with your program, either by having to expensively refactoring a concrete-class type to be an interface in the whole project (and all projects depending on it) or by being forced to say "it must stay a class because too much structurally depends on it already, damn...". Both things are nothing that anyone can desire.
The actual problem was the short-sighted decision in the first place to implement a type in the project as a class instead of as an interface.
But then there's still the ugly default implementation naming, what to do about it?

There's a better solution for that.
Recap what you actually want to achieve: You have a type ("MyMessageProcessor" in the example) and need a convenient default implementation of that type.
Wouldn't it be good to simply being able to write something like:
MyMessageProcessor.Default
or
MyMessageProcessor.Implementation
?

Well, that is possible. Right away.
Consider the following construct:

public interface MyMessageProcessor
{
    // methods, etc.

    public class Implementation implements MyMessageProcessor
    {
        // implementation, etc.
    }

}

That way, you can easily and perfectly clean object-oriented write things like:

MyMessageProcessor p = new MyMessageProcessor.Implementation();

Or maybe

MyMessageProcessor p = MyMessageProcessor.Factory.newInstance();

And even

MyMessageProcessor.Util.doStaticStuffWith(p);

(btw.: that is, at least syntactically, although not truely bytecode-wise, the often-wished possibility to include static methods in the interfaces they belong to. I really wonder why hardly anyone mentions that possibility. At a closer look, it's not even a workaround but object-oriented-wise the clearer and thus superior way to the alternative of softening up interfaces to allow them contain method bodies on their own. And it even improves structure by visually grouping all static util methods in a properly named interface-inner-class)

This programming paradigm of including the default implementation of a interface in itself reminded me a little of old C's header files. Hence, the (a little humorous) expression "Java.h" or "Java dot h" was quickly found. And from there, to make a unique name of it, Jadoth.

Fancy naming or not, the important conclusion is:
NEVER name your default interface implementation classes "Default~" or interface util classes "~Utils", it will just ruin readability and seduce to drop clean interface architecture. Instead, embed those classes where they name-scopingly belong to, inside the interface they refer to. Instead, consider defining EVERY class, that is architectural meant to be a type (i.e. neither a static util method class nor a trivial helper class) as an interface instead. You can still include your desired implementation in it, for the small price of one sole intendation level btu with the gain of a much better program structure.

About This Blog

So I finally decided (and found the motivation *cough*) to start a personal Java Tech Blog as a public plattform for my personal projects, ideas, technologies.
Pretty late to enter the Web 2.0, of course, especially for someone with high affinity to informatics and the net as a source of information and knowledge.
Anyway, here it is.

This blog will be a front end counterpart for my web domain (www.jadoth.net) which is currently just a place to host files with no website on its own (sadly no time to do one :( ) and my Open Source Sourceforge repository (jadoth.sourceforge.net) where most of my projects are located.

It's about time to have a platform for publishing, linking and discussing articles about concepts (e.g. "Throwable is no Exception" or "Why .equals() is broken") and upcoming papers about nearly completed projects (e.g. about one of my projects: a whole new and, with all due respect to the JDK, "proper" Java Collections framework).

Sonntag, 24. April 2011

My First Blog Entry EVER

Hello World