Donnerstag, 7. Juli 2011

Seperation of Concearns meets Collections

This is the first post actually dealing with the next generation collections framework I'm working on I mentioned in side notes.
One of the many tremendous improvements over extisting JDK collections aside from better performance, more inbuilt functionality and an SQL-like querying framework is - maybe it's the most important one - proper seperation of concearns.

Of course JDK collections are dealing with seperation of concearns as well, but only on a very rough, if not to say dilettantish level.
There is a hierarchical typing to some extent dealing with general collection operations (in java.util.Collection) and then extending it for specific concearns like allowing duplicates, order (in java.util.List) or disallowing duplicates (in java.util.Set), although the latter is pretty bugged, due to misconceptions in the equality concept (which I already blogged about here: Why equals() is broken and how to do it right) . Even this group of concearns (which I call content behaviour operations - may be a little clumsy but it's actually quite hard to come up with a nicer yet fitting name) isn't modelled properly, e.g. there are types that are sorted (which is a more specific type of being ordered), but that aren't ordered (which logically can't be, it's just a misconception in JDK) or there is no type only defining order without allowing duplicates or no type allowing duplicates without order or no set that is ordered (ordered, not sorted!), which every developer sooner or later is badly missing.

Other groups of concearns have even been completely ignored in the JDK collections. For example what about seperting concearns like getting, adding, removing, etc. operations of a collection? This is absolutely crucially needed, for example to define in an API that a passed collection has to at least provide some certain operations (like say getting and adding) or that a method will only return an immutable collection. With JDK collection types, you simply CANNOT do that typing. When designing an efficient API, this is for collection what Object or missing Generics was in past times: massively lacking of badly needed typing. Yes I know there's Collections.unmodifiable~(). But there is no TYPE reflecting those (subset) operations. Those methods only return intentionally broken instances of the general purpose types Collection, List, Set and all the nice typing system is null and void when passing those cripple-collections around as full scale general purpose types.
For a similar reason, you can't just publicly return a reference to your internal collection for others to add elements to it. Because outside code might as well delete or read content it actually should never be allowed to by design. That's the reason for many defensive copies, inconsistent data structures, decoration of collection operation in business logic classes, etc. All complicating development and costing runtime performance (because everything is copied back and forth all the time).

So this post is about how properly seperated element operation concearns SHOULD look like for collections (and how the DO look in the framework I'm working on).

First of all a new (or maybe not so new but defintely new regarding the JDK's current collections API) paradigm of interface design:
Specialized interfaces don't always have to be inheritance specialization, they can also be inheritance generalization, because they cleanly handle only one type of concearn and then get recombined to form a general purpose type.

For example:
In the JDK's collections, there's only the type Collection, declaring (on its level of content behaviour abstraction) all element operations in one monolithic cauldron: getting, adding, removing.

In my extended collections framework, there are (among others) the interfaces:
XGettingCollection
XAddingCollection
XRemovingCollection

And then:
XCollection extends XGettingCollection, XAddingCollection, XRemovingCollection

(As a side note: The X stands for extended, of course. Not only as a short distinction from the java.util. types, but also to quickly narrow down the IntelliSense content to collection types. Very similar to Swing's J~, with the difference that here it's about proper types and not narrow-minded classes).

I spare to explain everything in detail, it should be quite obvious that the getting (or querying) operations go into XGettingCollection, and so on.
The important thing is: This does NOT complicate things.
You can still just write "XCollection<string> strings = ... " and have the familiar general purpose type, all fine.
But now you CAN go into the fine grained details of single concearns if you want/have to.
You can write a simple "View" wrapper implementation wrapping a general purpose XCollection instance but implementing only XGettingCollection and thus reducing the possible operations to read-only access. (Of course I already wrote it, but you get the idea).

Now you can easily write something like:

public View<String> getStrings(){
    // View implements XGettingCollection
    return new View<String>(this.strings); 
}

Or the other way around if some client code shall be able to input additional strings but is not allowed to query what's already contained:

public Adder<String> getStringAdder(){
    // Adder implements XAddingCollection
    return new Adder<String>(this.strings); 
}

Or may add and query strings, but never remove them:

public Collector<String> getStringAdder(){
    // Collector implements XGettingCollection, XAddingCollection
    return new Collector<String>(this.strings); 
}


And, very important, properly typed immutable collections:

XImmutableCollection extends XGettingCollection

public XImmutableCollection<String> getStrings(){
    return this.strings.immure();
}


(btw it's no typo: "immure" as a figurative shortcut of "toImmutable()". Note that "immute()" would imply to make THIS collection immutable instead of instantiating a seperate immutable copy)

And all of a sudden multithreaded development and API return values become way less hurtfull but more elegant, concise, etc.

Note that there's a difference between a View and a ConstCollection (an implementation of XImmutableCollection): A View acts as a relay to reduce the operations of an otherwise still mutable collection to read-only concearns, while a ConstCollection is a hard fact, never to be changed again (unless some reflection operations mess everything up, but that's another story, applying to all immutable objects, of course).


Those are the basics of element operation concearn seperation (and I really can't help but to become a cynical when working with the half baked JDK collections that don't even provide them).



On to the next step:
How about integrating those element operation concearns of getting, adding, removing, etc. with the content behaviour concearns (List, Set, etc.)?
Quite straight forward:

XGettingList extends XGettingCollection
XAddingList extends XAddingCollection
XRemovingList extends XRemovingCollection
XList extends XGettingList, XAddingList, XRemovingList, XCollection

(note that I leave out the Generics "<E>" element typing, but of course it's there in real code)

This does the trick quite nicely: There are still seperated element operation concearns on the List level of content behaviour, but at the same time XList is still a XCollection and even XGettingList is a XGettingCollection.

Same for XSet, of course.
And for XBag (the content behaviour type defining only to allow duplicates but not order), XSequence (defining only order. Btw XSequence is actually a middle step between XList and XCollection that I left out above for greater familiarity for people only knowing the mixed up JDK collection types so far) and XEnum (which combines XSet and XSequence to have ordered uniques - salvation at last).

Admitted, this two-dimensional inheritance causes a little more complexity than just a single super type. There are sometimes 4-5 superinterfaces in one interface. But that's not a bad thing. It's still as simple as writing XSet<Double> or XCollection<File> or XGettingList<File> or whatever you need without caring much about the net of concearn combinations. Yet it's complex enough IF you need it for fine grained purposes.



Next step: More than basic
I called Getting, Adding and Removing "basic" concearns for a reason: they apply to every content behaviour type of collection. Others only apply to more specific collection types.
Like for example everything that has to do with an index is only available from sequence on (yes, sequence already, not list!), because to being able to access elements via an index, the collection must guarantee to "play along", to maintain order.

So there are advanced element operation concearns:
XPrependingSequence
XInsertingSequence
XShiftingSequence
And 2-3 more (but then it's complete, really)

Not that there is a seperate implementation for every single concearn or every combination of concearns, but it's reasonable that it might get necassary to create one for a certain combination at some point. Or to create a reducing relay wrapper (like one only allowing shifting of contained elements among each other without removing or adding anything. E.g. for sorting externally).

There's also a nice payoff included if you take the effort to properly seperate those concearns: Those wild grown special border-types like Stack, Queue and Deque become superfluous. A Stack is nothing more than the concearn types already containing some convenience methods like first(), last(), pop(), etc. A Deque is nothing more than a general purpose type implementing XAddingSequence and XPrependingSequence. So there's a medium number of properly designed and combined fundamental types, which conceptionally replace and outperform the JDK's small number of simplistic types plus medium mumber of organically grown special-purpose types. Apart from that, the JDK excrescences are mostly not even proper types (interface in Java), but just extensions of implementations. Like Stack extending Vector (not even ArrayList but the deprecated Vector!). Oh dear, really: properly designing in an interface-based language (which Java is and always has been from the start) looks different from what can be found in the JDK collections. And that for the most fundamental modules of both data structures and algorithms (which collections are. Fundamentel tools, really. Not just some ".util"). Very sad (honestly: I'd rather spent my time driving forward my next generation SQL- and ORM-technologies and not change down to fix the basic tools).




There'd still be much to write about.
Like XSortation, XSortedList, XSortedEnum types, XMap and XTable (with table being a ordered map, intuitively) and their attached satellite instances for Keys, Values and Entries. Or whole new groups of concearns like implementation storage types (mostly arrays and chains - btw. chain is a more fitting term for what everyone talks about as addon adjective "Linked~"), null handling, proper HashEquality, Functional Programming (without all the continued back-and-forth-copying and overcomplicating eager and lazy design the lambda-dev is talking about for extending the old broken collections for functional programming), integration of volatile elements (weak referencing), etc.
Oh and of course how to get it all backward compatible to JDK collections (yes it is) using satellite instances and provide reverse-upward compatibility to wrap-up JDK collections as extended collections.

While I'm currently in the process of finishing architecture and commonly used implementations in the framework, I'm concurrently working on structure and chapters for a paper in which I will describe all the ideas and new concepts that went into the next generation collections framework. Any collection topic that do not show up here in the blog will surely be contained in the paper once it's done. Sometimes it feels like I'm working on a PhD as a hobby ^^. But nevertheless.

Keine Kommentare:

Kommentar veröffentlichen