A quick and hopefully short post about collections again. Conveniently a nice little diagram I drew a while ago. It shows in a very colorful way the deficiencies in "Collection Content Behaviour Constraints" (the different constraints between List, Set, etc.) in the JDK collections and how it should (must) be done to do it properly.
Here it is: CollectionContentBehaviourConstraints.pdf.
This is only one of many aspects that are similarily problematic in the JDK collections. Others are broken equality mechanisms, broken failfast concept, missing fast but safe storage access (internal iteration), built-in weak referencing, uncontrolled grown but still insufficent collection increasing operations, immutable collections, etc.
I'll try to draw comparable diagrams to demonstrate them for the paper about my collections framework.
The fascinating thing about this diagram is: It's complete.
I tried multiple times very hard to come up with additional meaningful constraints (apart from thread safety, parallel execution, observability, etc. which belong to a completely different concearn group outside of content behaviour constraints). There aren't any more!
Okay there are Maps, OrderedMaps, SortedMaps, but those are all just optimized implementations for KeyValue-Sets with the key being the constraint-dominant element.
What else could there be? A constraint that says a collection MUST always contain duplicates? Or Triples? Nonsense. That would be just a List<Pair<E,E>> or something.
A collection that allows every element only to be added once and never again? Nonsense. Any even more weirder constraint? Nah...
So "order > sortation" and "duplicates <-> uniques" and that is it as far as content behaviour constraints are concearned.
Quite calming if you ask me, that there's one (tiny) field in informatics that is really "complete" and won't have to get extended and extended over and over in the future.
Even more sad so that it's not even nearly complete in the Java API's collections.
Donnerstag, 28. Juli 2011
Samstag, 9. Juli 2011
Illusions are powerful abstractions
A little quote from the video in the previous post that I moved to a seperate post because it makes such a good title as well:
"Illusions are powerful abstractions"
In the video, this sentence is meant with a little different meaning than one might expect because Cliff uses the term "illusion" as a synonym for "proper architectural seperation" and wants to express a good thing with it (at the beginning, at least, hehe. Later on, I'm not so sure. Maybe that's why he picked the term, to evolve it during the talk).
But it's also a very describing sentence for the common naivity and half knowledge of many people. Those things that happen everywhere all the time: someone (mostly from management) with only half or even less knowledge of some matter (say, software development, suprisingly) thinks around a little from his (shall I really add "or her"?) tiny point of view, has an idea based exclusively on the most simplest of all cases (which actually never happen in non-trivial projects beyond simple marketing examples), comes up with some sort of "solution" for them and then shouts out "Heureka! This is ingeniously simple and will cover ALL possible cases! Go implement it, you care about the remaining 'corner cases', I'll build the next great idea meanwhile".
(Design) illusions are indeed powerful abstractions. They cover all the real world cases with a minimal effort. Bravo!
Now we just need a universe, a world, a hardware, a language and a JVM that can implement such design illusions to actually work and we're fine.
"Illusions are powerful abstractions"
In the video, this sentence is meant with a little different meaning than one might expect because Cliff uses the term "illusion" as a synonym for "proper architectural seperation" and wants to express a good thing with it (at the beginning, at least, hehe. Later on, I'm not so sure. Maybe that's why he picked the term, to evolve it during the talk).
But it's also a very describing sentence for the common naivity and half knowledge of many people. Those things that happen everywhere all the time: someone (mostly from management) with only half or even less knowledge of some matter (say, software development, suprisingly) thinks around a little from his (shall I really add "or her"?) tiny point of view, has an idea based exclusively on the most simplest of all cases (which actually never happen in non-trivial projects beyond simple marketing examples), comes up with some sort of "solution" for them and then shouts out "Heureka! This is ingeniously simple and will cover ALL possible cases! Go implement it, you care about the remaining 'corner cases', I'll build the next great idea meanwhile".
(Design) illusions are indeed powerful abstractions. They cover all the real world cases with a minimal effort. Bravo!
Now we just need a universe, a world, a hardware, a language and a JVM that can implement such design illusions to actually work and we're fine.
Questionable use of Soft References
While this post isn't really about my collections framework, there's a good story in it suitable as an introduction to what I actually wanna write about:
There's an inbuilt support for volatile elements (weak referenced elements, etc.). From a developer's perspective, there's simply a factory method for, let's say a new hash set collection instance to which you can pass a ReferenceType.WEAK (or SOFT or STRONG, with STRONG being the default for an overloaded alternate version in case you don't bother selecting reference type at all).
Internally, there will be created an apropriate HashLogic instance pluged in the new hash set instance to handle elements (or references to them, to be more precise) that can just vanish any time.
Some months ago, I removed all the HashLogic variations for soft references. Not they can't be handled any more. If at some point someone (maybe me) implements them again (for the then matured architecture compared to that some months ago), they'll just work fine without the framework having to be adjusted.
The reason was: while weak references are pretty useful, I never saw any soft reference in practice, only in theory and in academic examples for the different reference types. And personally I never could get rid of the feeling: if your programm touches the boundaries of the heap, something else already went wrong before. Soft reference might be useful here, but only for a few corner cases which I don't care (at the moment) to support.
Now to the actual part:
I just watched the Google Tech Talk "A JVM Does That?" from Cliff Click (is that really his name? ^^). Very interesting. He has a high throughput of information, but not too fast, just about right.
At 30:50, he gives a very good example for why soft references and phantom references (the latter tbh I never really understood... :-/ ) are of questionable use.
Just two things as a comment:
1.) Good example
2.) I knew it! (hehe)
Later in the talk he states similar for weak reference, but at this point he just slides in ("maybe weak refs are useful").
Maybe a little objection about weak references: Yes, they are useful. Probably not how they are used most of the time (like "make everything weak an you can never get memory leaks" and then all of a sudden: unplanned null, exception, crash.), but I have one very good example where they are useful:
If you have some sort of "meta registry" that has to watch over some/all instances in your running application and has to manage them in some way, it has to keep its fingers out of the "actual work" your application does. Meaning if may only observe living instances, but never be a cause for them to live in the first place. If such a meta registry would hard-reference the instances it observes, no instance could ever get garbage collected. Or in other words: it would be the biggest, complete, "perfect" memory leak. For this reason, a meta registry MUST use weak references.
Btw: What's a real use for such a "meta registry?". The reference registry of an OR-Mapper, for example (and btw: yes, it's a REGISTRY, not a CACHE as all the existing ORMs call it. It main reason is to keep track of already handled instances to guarantee referential integrity). The ORM has to know which of the living instances as which id etc. associated even if the instance doesn't contain that information on its own (because ORM should be orthogonal and not require to add thousands of annotations and special fields like existing ORMs do simple-mindedly). For it to know the instances, it has to reference them. But it may not strongly reference them as it would cause the "perfect" memory leak (or burdon the developer using the ORM to do its management work manually by unregistering instances etc. explicitely which is a really horrible thought).
I guess those kind of situations is what brought him to saying "maybe weak refs are useful" and the reference to them not being proper architecture later on aimed more at the unwise uses of them.
Still I would agree that such application logic should be done on the application level (the ORM framework in this case) and not via some "magic" on the GC level. But for that to work on application level, the VM would have to provide a simple and fast way of telling how many times a certain instance is referenced. If that count would be only 1, a strongly referencing registry could derive "oh, no one except me is referencing it anymore, so I can let go if it for it to get collected".
Problem with that would be:
a.) AFAIK, JVMs and GCs don't work like that. There is no hard cached per-instance "reference count" the application could access.
b.) Such a logic is actually part of garbage collection. If the registry would do that, JVM engineers like Cliff would come along again and say "That's GC work, leave it to the GC. Take weak references instead and let the GC handle it!" I hope he would say that because if he would just deny both options, he'd just ignore real world and proper design needs of having an unintrusive automated reference registry for certain situations like ORM. OR, hehe, the JVM would provide a built-in service for ORM. But then still there would be other applications for keeping track of references inside the application.
One way or another: weak references do have their right to exist (even for common cases).
Soft reference most probably don't.
UPDATE:
Yesterday, a colleague pointed out to me that SoftReferences do have their use for caches. The stuff in the cache is present as long as there's enough memory. If heap runs out, the caches get cleared as needed and filled again only on demand.
Makes sense, but I'm still a little skeptic if working if a constantly full heap by design is really a good thing to do. And what about once heap is full the caches swamp each other all the time creating a live lock or at least extremely bad performance.
Still SoftReferencing sound only like some sort of "last resort" means to let the application live "a little longer" if it's already too late (memory-wise), where the actual problem is in the design that doesn't prevent memory running out.
Classic example of "I don't care about memory consumption, I just use soft references".
Another mentionable part in the video:
18:57 - People don't know how to write concurrent programms.
There's such a fitting fictional quote he's saying: "I don't know why it's busted, I've got some data race, so I throw another lock in, throw another lock in, thro - oh, the bugs went away! Okay, I'm done."
That is very well to the point. If you just look at the stupid Vector implementation in the JDK collections (oops, here's the collections topic and the evil JDK bashing again), which btw. everyone on the Java development side nowdays says that it's virtually deprected - you see a very good example for what he means. Let alone everything that normal application developers tinker all day when stumbling accross concurrency.
Synchronisation (or locking, to be more precise) is good and important. But it has to be used in the right way. For example it's very easy to implement a ThreadLocal that is NOT synchronized for reading accesses and doesn't even have to bloat the Thread class but is still thread safe (simply because you can exploit the special behaviour of thread instances if used as a hash key so that every thread will always only reach its own hash entry and never that of another thread and whooops: unsynchronized thread safe read-only access. You can find my complete and well documented implementation on sourceforge if you google a little).
Btw: just because acquiring locks gets faster and faster as the JVM guys are optimizing it further and further, it does NOT mean that one has no longer to care about them (I already hear the people saying "but the video says they only take a few nano seconds nowadays, so why should I care?").
The lock managment itself may someday cost zero time, but the code that is locked and executed while all other threads wait to enter it can still take any amount of time. So synchronisation problems are nothing that the JVM will someday optimize away, but are of architectural nature (luckily, because otherwise: if not for such complex problems, there would be no need for human developers in the future in the first place).
There's an inbuilt support for volatile elements (weak referenced elements, etc.). From a developer's perspective, there's simply a factory method for, let's say a new hash set collection instance to which you can pass a ReferenceType.WEAK (or SOFT or STRONG, with STRONG being the default for an overloaded alternate version in case you don't bother selecting reference type at all).
Internally, there will be created an apropriate HashLogic instance pluged in the new hash set instance to handle elements (or references to them, to be more precise) that can just vanish any time.
Some months ago, I removed all the HashLogic variations for soft references. Not they can't be handled any more. If at some point someone (maybe me) implements them again (for the then matured architecture compared to that some months ago), they'll just work fine without the framework having to be adjusted.
The reason was: while weak references are pretty useful, I never saw any soft reference in practice, only in theory and in academic examples for the different reference types. And personally I never could get rid of the feeling: if your programm touches the boundaries of the heap, something else already went wrong before. Soft reference might be useful here, but only for a few corner cases which I don't care (at the moment) to support.
Now to the actual part:
I just watched the Google Tech Talk "A JVM Does That?" from Cliff Click (is that really his name? ^^). Very interesting. He has a high throughput of information, but not too fast, just about right.
At 30:50, he gives a very good example for why soft references and phantom references (the latter tbh I never really understood... :-/ ) are of questionable use.
Just two things as a comment:
1.) Good example
2.) I knew it! (hehe)
Later in the talk he states similar for weak reference, but at this point he just slides in ("maybe weak refs are useful").
Maybe a little objection about weak references: Yes, they are useful. Probably not how they are used most of the time (like "make everything weak an you can never get memory leaks" and then all of a sudden: unplanned null, exception, crash.), but I have one very good example where they are useful:
If you have some sort of "meta registry" that has to watch over some/all instances in your running application and has to manage them in some way, it has to keep its fingers out of the "actual work" your application does. Meaning if may only observe living instances, but never be a cause for them to live in the first place. If such a meta registry would hard-reference the instances it observes, no instance could ever get garbage collected. Or in other words: it would be the biggest, complete, "perfect" memory leak. For this reason, a meta registry MUST use weak references.
Btw: What's a real use for such a "meta registry?". The reference registry of an OR-Mapper, for example (and btw: yes, it's a REGISTRY, not a CACHE as all the existing ORMs call it. It main reason is to keep track of already handled instances to guarantee referential integrity). The ORM has to know which of the living instances as which id etc. associated even if the instance doesn't contain that information on its own (because ORM should be orthogonal and not require to add thousands of annotations and special fields like existing ORMs do simple-mindedly). For it to know the instances, it has to reference them. But it may not strongly reference them as it would cause the "perfect" memory leak (or burdon the developer using the ORM to do its management work manually by unregistering instances etc. explicitely which is a really horrible thought).
I guess those kind of situations is what brought him to saying "maybe weak refs are useful" and the reference to them not being proper architecture later on aimed more at the unwise uses of them.
Still I would agree that such application logic should be done on the application level (the ORM framework in this case) and not via some "magic" on the GC level. But for that to work on application level, the VM would have to provide a simple and fast way of telling how many times a certain instance is referenced. If that count would be only 1, a strongly referencing registry could derive "oh, no one except me is referencing it anymore, so I can let go if it for it to get collected".
Problem with that would be:
a.) AFAIK, JVMs and GCs don't work like that. There is no hard cached per-instance "reference count" the application could access.
b.) Such a logic is actually part of garbage collection. If the registry would do that, JVM engineers like Cliff would come along again and say "That's GC work, leave it to the GC. Take weak references instead and let the GC handle it!" I hope he would say that because if he would just deny both options, he'd just ignore real world and proper design needs of having an unintrusive automated reference registry for certain situations like ORM. OR, hehe, the JVM would provide a built-in service for ORM. But then still there would be other applications for keeping track of references inside the application.
One way or another: weak references do have their right to exist (even for common cases).
Soft reference most probably don't.
UPDATE:
Yesterday, a colleague pointed out to me that SoftReferences do have their use for caches. The stuff in the cache is present as long as there's enough memory. If heap runs out, the caches get cleared as needed and filled again only on demand.
Makes sense, but I'm still a little skeptic if working if a constantly full heap by design is really a good thing to do. And what about once heap is full the caches swamp each other all the time creating a live lock or at least extremely bad performance.
Still SoftReferencing sound only like some sort of "last resort" means to let the application live "a little longer" if it's already too late (memory-wise), where the actual problem is in the design that doesn't prevent memory running out.
Classic example of "I don't care about memory consumption, I just use soft references".
Another mentionable part in the video:
18:57 - People don't know how to write concurrent programms.
There's such a fitting fictional quote he's saying: "I don't know why it's busted, I've got some data race, so I throw another lock in, throw another lock in, thro - oh, the bugs went away! Okay, I'm done."
That is very well to the point. If you just look at the stupid Vector implementation in the JDK collections (oops, here's the collections topic and the evil JDK bashing again), which btw. everyone on the Java development side nowdays says that it's virtually deprected - you see a very good example for what he means. Let alone everything that normal application developers tinker all day when stumbling accross concurrency.
Synchronisation (or locking, to be more precise) is good and important. But it has to be used in the right way. For example it's very easy to implement a ThreadLocal that is NOT synchronized for reading accesses and doesn't even have to bloat the Thread class but is still thread safe (simply because you can exploit the special behaviour of thread instances if used as a hash key so that every thread will always only reach its own hash entry and never that of another thread and whooops: unsynchronized thread safe read-only access. You can find my complete and well documented implementation on sourceforge if you google a little).
Btw: just because acquiring locks gets faster and faster as the JVM guys are optimizing it further and further, it does NOT mean that one has no longer to care about them (I already hear the people saying "but the video says they only take a few nano seconds nowadays, so why should I care?").
The lock managment itself may someday cost zero time, but the code that is locked and executed while all other threads wait to enter it can still take any amount of time. So synchronisation problems are nothing that the JVM will someday optimize away, but are of architectural nature (luckily, because otherwise: if not for such complex problems, there would be no need for human developers in the future in the first place).
Donnerstag, 7. Juli 2011
Seperation of Concearns meets Collections
This is the first post actually dealing with the next generation collections framework I'm working on I mentioned in side notes.
One of the many tremendous improvements over extisting JDK collections aside from better performance, more inbuilt functionality and an SQL-like querying framework is - maybe it's the most important one - proper seperation of concearns.
Of course JDK collections are dealing with seperation of concearns as well, but only on a very rough, if not to say dilettantish level.
There is a hierarchical typing to some extent dealing with general collection operations (in java.util.Collection) and then extending it for specific concearns like allowing duplicates, order (in java.util.List) or disallowing duplicates (in java.util.Set), although the latter is pretty bugged, due to misconceptions in the equality concept (which I already blogged about here: Why equals() is broken and how to do it right) . Even this group of concearns (which I call content behaviour operations - may be a little clumsy but it's actually quite hard to come up with a nicer yet fitting name) isn't modelled properly, e.g. there are types that are sorted (which is a more specific type of being ordered), but that aren't ordered (which logically can't be, it's just a misconception in JDK) or there is no type only defining order without allowing duplicates or no type allowing duplicates without order or no set that is ordered (ordered, not sorted!), which every developer sooner or later is badly missing.
Other groups of concearns have even been completely ignored in the JDK collections. For example what about seperting concearns like getting, adding, removing, etc. operations of a collection? This is absolutely crucially needed, for example to define in an API that a passed collection has to at least provide some certain operations (like say getting and adding) or that a method will only return an immutable collection. With JDK collection types, you simply CANNOT do that typing. When designing an efficient API, this is for collection what Object or missing Generics was in past times: massively lacking of badly needed typing. Yes I know there's Collections.unmodifiable~(). But there is no TYPE reflecting those (subset) operations. Those methods only return intentionally broken instances of the general purpose types Collection, List, Set and all the nice typing system is null and void when passing those cripple-collections around as full scale general purpose types.
For a similar reason, you can't just publicly return a reference to your internal collection for others to add elements to it. Because outside code might as well delete or read content it actually should never be allowed to by design. That's the reason for many defensive copies, inconsistent data structures, decoration of collection operation in business logic classes, etc. All complicating development and costing runtime performance (because everything is copied back and forth all the time).
So this post is about how properly seperated element operation concearns SHOULD look like for collections (and how the DO look in the framework I'm working on).
First of all a new (or maybe not so new but defintely new regarding the JDK's current collections API) paradigm of interface design:
Specialized interfaces don't always have to be inheritance specialization, they can also be inheritance generalization, because they cleanly handle only one type of concearn and then get recombined to form a general purpose type.
For example:
In the JDK's collections, there's only the type Collection, declaring (on its level of content behaviour abstraction) all element operations in one monolithic cauldron: getting, adding, removing.
In my extended collections framework, there are (among others) the interfaces:
XGettingCollection
XAddingCollection
XRemovingCollection
And then:
XCollection extends XGettingCollection, XAddingCollection, XRemovingCollection
(As a side note: The X stands for extended, of course. Not only as a short distinction from the java.util. types, but also to quickly narrow down the IntelliSense content to collection types. Very similar to Swing's J~, with the difference that here it's about proper types and not narrow-minded classes).
I spare to explain everything in detail, it should be quite obvious that the getting (or querying) operations go into XGettingCollection, and so on.
The important thing is: This does NOT complicate things.
You can still just write "XCollection<string> strings = ... " and have the familiar general purpose type, all fine.
But now you CAN go into the fine grained details of single concearns if you want/have to.
You can write a simple "View" wrapper implementation wrapping a general purpose XCollection instance but implementing only XGettingCollection and thus reducing the possible operations to read-only access. (Of course I already wrote it, but you get the idea).
Now you can easily write something like:
Or the other way around if some client code shall be able to input additional strings but is not allowed to query what's already contained:
Or may add and query strings, but never remove them:
And, very important, properly typed immutable collections:
XImmutableCollection extends XGettingCollection
(btw it's no typo: "immure" as a figurative shortcut of "toImmutable()". Note that "immute()" would imply to make THIS collection immutable instead of instantiating a seperate immutable copy)
And all of a sudden multithreaded development and API return values become way less hurtfull but more elegant, concise, etc.
Note that there's a difference between a View and a ConstCollection (an implementation of XImmutableCollection): A View acts as a relay to reduce the operations of an otherwise still mutable collection to read-only concearns, while a ConstCollection is a hard fact, never to be changed again (unless some reflection operations mess everything up, but that's another story, applying to all immutable objects, of course).
Those are the basics of element operation concearn seperation (and I really can't help but to become a cynical when working with the half baked JDK collections that don't even provide them).
On to the next step:
How about integrating those element operation concearns of getting, adding, removing, etc. with the content behaviour concearns (List, Set, etc.)?
Quite straight forward:
XGettingList extends XGettingCollection
XAddingList extends XAddingCollection
XRemovingList extends XRemovingCollection
XList extends XGettingList, XAddingList, XRemovingList, XCollection
(note that I leave out the Generics "<E>" element typing, but of course it's there in real code)
This does the trick quite nicely: There are still seperated element operation concearns on the List level of content behaviour, but at the same time XList is still a XCollection and even XGettingList is a XGettingCollection.
Same for XSet, of course.
And for XBag (the content behaviour type defining only to allow duplicates but not order), XSequence (defining only order. Btw XSequence is actually a middle step between XList and XCollection that I left out above for greater familiarity for people only knowing the mixed up JDK collection types so far) and XEnum (which combines XSet and XSequence to have ordered uniques - salvation at last).
Admitted, this two-dimensional inheritance causes a little more complexity than just a single super type. There are sometimes 4-5 superinterfaces in one interface. But that's not a bad thing. It's still as simple as writing XSet<Double> or XCollection<File> or XGettingList<File> or whatever you need without caring much about the net of concearn combinations. Yet it's complex enough IF you need it for fine grained purposes.
Next step: More than basic
I called Getting, Adding and Removing "basic" concearns for a reason: they apply to every content behaviour type of collection. Others only apply to more specific collection types.
Like for example everything that has to do with an index is only available from sequence on (yes, sequence already, not list!), because to being able to access elements via an index, the collection must guarantee to "play along", to maintain order.
So there are advanced element operation concearns:
XPrependingSequence
XInsertingSequence
XShiftingSequence
And 2-3 more (but then it's complete, really)
Not that there is a seperate implementation for every single concearn or every combination of concearns, but it's reasonable that it might get necassary to create one for a certain combination at some point. Or to create a reducing relay wrapper (like one only allowing shifting of contained elements among each other without removing or adding anything. E.g. for sorting externally).
There's also a nice payoff included if you take the effort to properly seperate those concearns: Those wild grown special border-types like Stack, Queue and Deque become superfluous. A Stack is nothing more than the concearn types already containing some convenience methods like first(), last(), pop(), etc. A Deque is nothing more than a general purpose type implementing XAddingSequence and XPrependingSequence. So there's a medium number of properly designed and combined fundamental types, which conceptionally replace and outperform the JDK's small number of simplistic types plus medium mumber of organically grown special-purpose types. Apart from that, the JDK excrescences are mostly not even proper types (interface in Java), but just extensions of implementations. Like Stack extending Vector (not even ArrayList but the deprecated Vector!). Oh dear, really: properly designing in an interface-based language (which Java is and always has been from the start) looks different from what can be found in the JDK collections. And that for the most fundamental modules of both data structures and algorithms (which collections are. Fundamentel tools, really. Not just some ".util"). Very sad (honestly: I'd rather spent my time driving forward my next generation SQL- and ORM-technologies and not change down to fix the basic tools).
There'd still be much to write about.
Like XSortation, XSortedList, XSortedEnum types, XMap and XTable (with table being a ordered map, intuitively) and their attached satellite instances for Keys, Values and Entries. Or whole new groups of concearns like implementation storage types (mostly arrays and chains - btw. chain is a more fitting term for what everyone talks about as addon adjective "Linked~"), null handling, proper HashEquality, Functional Programming (without all the continued back-and-forth-copying and overcomplicating eager and lazy design the lambda-dev is talking about for extending the old broken collections for functional programming), integration of volatile elements (weak referencing), etc.
Oh and of course how to get it all backward compatible to JDK collections (yes it is) using satellite instances and provide reverse-upward compatibility to wrap-up JDK collections as extended collections.
While I'm currently in the process of finishing architecture and commonly used implementations in the framework, I'm concurrently working on structure and chapters for a paper in which I will describe all the ideas and new concepts that went into the next generation collections framework. Any collection topic that do not show up here in the blog will surely be contained in the paper once it's done. Sometimes it feels like I'm working on a PhD as a hobby ^^. But nevertheless.
One of the many tremendous improvements over extisting JDK collections aside from better performance, more inbuilt functionality and an SQL-like querying framework is - maybe it's the most important one - proper seperation of concearns.
Of course JDK collections are dealing with seperation of concearns as well, but only on a very rough, if not to say dilettantish level.
There is a hierarchical typing to some extent dealing with general collection operations (in java.util.Collection) and then extending it for specific concearns like allowing duplicates, order (in java.util.List) or disallowing duplicates (in java.util.Set), although the latter is pretty bugged, due to misconceptions in the equality concept (which I already blogged about here: Why equals() is broken and how to do it right) . Even this group of concearns (which I call content behaviour operations - may be a little clumsy but it's actually quite hard to come up with a nicer yet fitting name) isn't modelled properly, e.g. there are types that are sorted (which is a more specific type of being ordered), but that aren't ordered (which logically can't be, it's just a misconception in JDK) or there is no type only defining order without allowing duplicates or no type allowing duplicates without order or no set that is ordered (ordered, not sorted!), which every developer sooner or later is badly missing.
Other groups of concearns have even been completely ignored in the JDK collections. For example what about seperting concearns like getting, adding, removing, etc. operations of a collection? This is absolutely crucially needed, for example to define in an API that a passed collection has to at least provide some certain operations (like say getting and adding) or that a method will only return an immutable collection. With JDK collection types, you simply CANNOT do that typing. When designing an efficient API, this is for collection what Object or missing Generics was in past times: massively lacking of badly needed typing. Yes I know there's Collections.unmodifiable~(). But there is no TYPE reflecting those (subset) operations. Those methods only return intentionally broken instances of the general purpose types Collection, List, Set and all the nice typing system is null and void when passing those cripple-collections around as full scale general purpose types.
For a similar reason, you can't just publicly return a reference to your internal collection for others to add elements to it. Because outside code might as well delete or read content it actually should never be allowed to by design. That's the reason for many defensive copies, inconsistent data structures, decoration of collection operation in business logic classes, etc. All complicating development and costing runtime performance (because everything is copied back and forth all the time).
So this post is about how properly seperated element operation concearns SHOULD look like for collections (and how the DO look in the framework I'm working on).
First of all a new (or maybe not so new but defintely new regarding the JDK's current collections API) paradigm of interface design:
Specialized interfaces don't always have to be inheritance specialization, they can also be inheritance generalization, because they cleanly handle only one type of concearn and then get recombined to form a general purpose type.
For example:
In the JDK's collections, there's only the type Collection, declaring (on its level of content behaviour abstraction) all element operations in one monolithic cauldron: getting, adding, removing.
In my extended collections framework, there are (among others) the interfaces:
XGettingCollection
XAddingCollection
XRemovingCollection
And then:
XCollection extends XGettingCollection, XAddingCollection, XRemovingCollection
(As a side note: The X stands for extended, of course. Not only as a short distinction from the java.util. types, but also to quickly narrow down the IntelliSense content to collection types. Very similar to Swing's J~, with the difference that here it's about proper types and not narrow-minded classes).
I spare to explain everything in detail, it should be quite obvious that the getting (or querying) operations go into XGettingCollection, and so on.
The important thing is: This does NOT complicate things.
You can still just write "XCollection<string> strings = ... " and have the familiar general purpose type, all fine.
But now you CAN go into the fine grained details of single concearns if you want/have to.
You can write a simple "View" wrapper implementation wrapping a general purpose XCollection instance but implementing only XGettingCollection and thus reducing the possible operations to read-only access. (Of course I already wrote it, but you get the idea).
Now you can easily write something like:
public View<String> getStrings(){ // View implements XGettingCollection return new View<String>(this.strings); }
Or the other way around if some client code shall be able to input additional strings but is not allowed to query what's already contained:
public Adder<String> getStringAdder(){ // Adder implements XAddingCollection return new Adder<String>(this.strings); }
Or may add and query strings, but never remove them:
public Collector<String> getStringAdder(){ // Collector implements XGettingCollection, XAddingCollection return new Collector<String>(this.strings); }
And, very important, properly typed immutable collections:
XImmutableCollection extends XGettingCollection
public XImmutableCollection<String> getStrings(){ return this.strings.immure(); }
(btw it's no typo: "immure" as a figurative shortcut of "toImmutable()". Note that "immute()" would imply to make THIS collection immutable instead of instantiating a seperate immutable copy)
And all of a sudden multithreaded development and API return values become way less hurtfull but more elegant, concise, etc.
Note that there's a difference between a View and a ConstCollection (an implementation of XImmutableCollection): A View acts as a relay to reduce the operations of an otherwise still mutable collection to read-only concearns, while a ConstCollection is a hard fact, never to be changed again (unless some reflection operations mess everything up, but that's another story, applying to all immutable objects, of course).
Those are the basics of element operation concearn seperation (and I really can't help but to become a cynical when working with the half baked JDK collections that don't even provide them).
On to the next step:
How about integrating those element operation concearns of getting, adding, removing, etc. with the content behaviour concearns (List, Set, etc.)?
Quite straight forward:
XGettingList extends XGettingCollection
XAddingList extends XAddingCollection
XRemovingList extends XRemovingCollection
XList extends XGettingList, XAddingList, XRemovingList, XCollection
(note that I leave out the Generics "<E>" element typing, but of course it's there in real code)
This does the trick quite nicely: There are still seperated element operation concearns on the List level of content behaviour, but at the same time XList is still a XCollection and even XGettingList is a XGettingCollection.
Same for XSet, of course.
And for XBag (the content behaviour type defining only to allow duplicates but not order), XSequence (defining only order. Btw XSequence is actually a middle step between XList and XCollection that I left out above for greater familiarity for people only knowing the mixed up JDK collection types so far) and XEnum (which combines XSet and XSequence to have ordered uniques - salvation at last).
Admitted, this two-dimensional inheritance causes a little more complexity than just a single super type. There are sometimes 4-5 superinterfaces in one interface. But that's not a bad thing. It's still as simple as writing XSet<Double> or XCollection<File> or XGettingList<File> or whatever you need without caring much about the net of concearn combinations. Yet it's complex enough IF you need it for fine grained purposes.
Next step: More than basic
I called Getting, Adding and Removing "basic" concearns for a reason: they apply to every content behaviour type of collection. Others only apply to more specific collection types.
Like for example everything that has to do with an index is only available from sequence on (yes, sequence already, not list!), because to being able to access elements via an index, the collection must guarantee to "play along", to maintain order.
So there are advanced element operation concearns:
XPrependingSequence
XInsertingSequence
XShiftingSequence
And 2-3 more (but then it's complete, really)
Not that there is a seperate implementation for every single concearn or every combination of concearns, but it's reasonable that it might get necassary to create one for a certain combination at some point. Or to create a reducing relay wrapper (like one only allowing shifting of contained elements among each other without removing or adding anything. E.g. for sorting externally).
There's also a nice payoff included if you take the effort to properly seperate those concearns: Those wild grown special border-types like Stack, Queue and Deque become superfluous. A Stack is nothing more than the concearn types already containing some convenience methods like first(), last(), pop(), etc. A Deque is nothing more than a general purpose type implementing XAddingSequence and XPrependingSequence. So there's a medium number of properly designed and combined fundamental types, which conceptionally replace and outperform the JDK's small number of simplistic types plus medium mumber of organically grown special-purpose types. Apart from that, the JDK excrescences are mostly not even proper types (interface in Java), but just extensions of implementations. Like Stack extending Vector (not even ArrayList but the deprecated Vector!). Oh dear, really: properly designing in an interface-based language (which Java is and always has been from the start) looks different from what can be found in the JDK collections. And that for the most fundamental modules of both data structures and algorithms (which collections are. Fundamentel tools, really. Not just some ".util"). Very sad (honestly: I'd rather spent my time driving forward my next generation SQL- and ORM-technologies and not change down to fix the basic tools).
There'd still be much to write about.
Like XSortation, XSortedList, XSortedEnum types, XMap and XTable (with table being a ordered map, intuitively) and their attached satellite instances for Keys, Values and Entries. Or whole new groups of concearns like implementation storage types (mostly arrays and chains - btw. chain is a more fitting term for what everyone talks about as addon adjective "Linked~"), null handling, proper HashEquality, Functional Programming (without all the continued back-and-forth-copying and overcomplicating eager and lazy design the lambda-dev is talking about for extending the old broken collections for functional programming), integration of volatile elements (weak referencing), etc.
Oh and of course how to get it all backward compatible to JDK collections (yes it is) using satellite instances and provide reverse-upward compatibility to wrap-up JDK collections as extended collections.
While I'm currently in the process of finishing architecture and commonly used implementations in the framework, I'm concurrently working on structure and chapters for a paper in which I will describe all the ideas and new concepts that went into the next generation collections framework. Any collection topic that do not show up here in the blog will surely be contained in the paper once it's done. Sometimes it feels like I'm working on a PhD as a hobby ^^. But nevertheless.
Sonntag, 3. Juli 2011
Variables are not fields, functions are not methods
There's a certain mistake in terminology I notice more often the more insight I get in advanced Java development (and that I of course did in times past as well) that increasingly annoys me:
A "function" is not the same as "method". And "variable" is not the same as "field". There really is a reason why they have different terms, a very important one. Really.
If you are not aware of the reasons behind it, you might very quickly think "Oh my, what's he up to now? Let go already, it's about the same, so no point in being so pedantic." I for sure did in the past. So it's fair if it's me who writes a text saying "No really, it's important, listen...".
So what's the important difference:
Methods and fields are object oriented constructs with enhanced characteristics, while variables and functions are just plain procedural things with reduced characteristics. And that difference in characteristics can mean the difference between proper understanding (and in the end working code) and half-baked understanding (or broken code). If we want to be very correct and play around with OOP terminology a bit, we could say "Field extends Variable" and "Method extends Function" (or even more precisely: "Function extends Procedure" and, well, "Method extends Procedure or Function depending on context"). But I'll keep that part simple and say that they are "completely" different things.
One might argue here that it actually should be "Field extends Variable" and "LocalVariable extends Variable" with Variable being the abstract super type and thus calling a field a variable is absolutely correct and that the word "local " before the variable has to be used consequently to determine the difference. I say: okay, but: a) the problem remains that by just saying "variable" it's not clear what is meant with it and b) it's much easier and clearer if "field" stands for fields and "variable" stands for local variables. Honestly, think about it. This still allows to add the "local" all the time if you wish, no problem, but it solves the ambiguity when talking/writing.
1.) Field vs Variable
Variables (in Java) are those things you write inside blocks. Like "int value = 5;". It holds a value (a primitive value or a reference value) and it stand for a index in the stack (the fast accessible exclusively owned peace of memory of every thread). That's about it.
Fields are much more. They, too, hold values (the same ones variables do), but there the similarity ended already. First difference: their values lie on the heap (the not so fast and big thread-common area in memory). Performance optimisations like escape analysis etc. can cause the values to lie on the stack, as well, but in general, field values are located on the heap. That means: every thread can access the same field. Which of course is good, because it allows for efficient thread communication. But it can also cause all kinds of trouble (see video below).
Fields also belong to a class. Fields are accessible and analyzable (and corruptable) via reflection. Fields are complex object oriented structures, not just mere procedural labels representing a value.
The simple nature of (local) variables is also an notable advantage: they are fast. And they are (thread)safe. If an algorithms works on a variable, there's peace, calm, efficiency, as no other thread can possible mess up with it while "your" thread is working on it. And it's also as fast as you can get, as everything happens on the stack. While on the other hand when working on fields, you always have to keep in mind "can this field be accessed by other threads simulateously?" and "are there unnecessarily repeated accesses to that field in a loop?". Fields are powerful, of course, but they are also much more hazzle that has to be taken care of.
So if someone talks about a field, but says "variable" all the time and you really internalized the difference, you have a hard time understanding what he's talking about. Funny thing is: if you see both as synonyms and don't know or care about the crucial differences, you aren't confused at all. Like having less insight into a matter lets you understand it better o_0. Of course this is just an illusive phrase as ignoring the differences will at some point make you pay for it.
Nice example: There's an excellent Google Tech Talk about "The Java Memory Model" (which in the end just deals with concurrency - and I'm proud to say that I already knew the important stuff. Like "synchronized" has a double meaning of block & synchronize, that volatile is a synchronize-only for fields - NOT variables, by the way! - and that you have to care about "happens before" relationships when dealing with concurrency. But really, if he'll say "volable" just ONE more time, then I'm... argh).
What caused a little confusion was, that the guy always mixed up "field" and "variable". For example by saying things like "another thread might change your variable". Knowing about the difference and seeing him as the expert that he undoubtly is, I immediatly "paniced" and thought: "WHAT? Can threads unsynchronizedly access variables as well? Is everything I build my concepts on wrong?". Luckily for me, my world doesn't have to fall apart. He just used the wrong term. Phew.
Moreless the same slopiness is to omit the "this." before a field. Of course it is optional for the compiler, but it makes it so much easier to read and quickly understand which ones are (complicated) fields and which ones are simple variables in a piece of code. This is so important and so simple to do (even automated by the IDE) that I slowly but increasingly can't help to see people who omit it as newbies or ignorant at best.
He does that as well (see 38:00 or 43:00), but of course I wouldn't call him a noob. I'd rather say he (slopily) doesn't care about it, same as he mixes them up when talking. No problem if he gets along with it, but it's definitely more difficult for others to read his code and follow what he's saying.
On a side note: we'd need much less concurrency lessons like that if little things like differences between fields and variables would be propagated better. Most of the problems with concurrency come from not making that distinction in the first place. At many of his examples what common mistakes and wrong intentions one might fall for I just thought "how could one possibly expect this field to be safely usable in a multithreaded program?" and that just because I know about the difference of fields and variables.
If you don't care for the difference, don't watch your terminogly and don't use "this.", then the problems start (or get severel times worse).
I keep two simple rules concearning fields and variables that help me tremendously in day to day Java developing:
1.) Fields are potentially always accessed by another thread (unless clean architecture guarantees otherwise) and therefore have to be at least volatile or used synchronized if the code is to be used in multithreaded environments.
2.) Working with fields is slower than working with variables, so they should be cached in variables when used in a loop.
Keeping those two rules in mind for every code you write increases the software quality by a felt 200%. At least that's my personal experience.
A side not about performance, because it's very interessting:
There's a kind of suprising relation between the performane of read and write operations of fields and variables, which is as follows (1 being the fastest, 4 the slowest)
1.) reading from variable
2.) reading from field
3.) writing to variable
4.) writing to field
One might think that both variable operations are on places 1 and 2 and the field operations are behind them. But that's not true (as explained to me by the running VM *cough*).
Reading from a field is faster than writing to a variable (a register). May have something to do with caching fields on the stack etc., but nevertheless: What we can derive from this is: Assigning stuff to a variable/field is actually pretty expensive, at least compared to reading. That's why it doesn't pay off to cache a field value in a local variable and then only use it 2-4 times. Yes those 4 local reads are faster then the field reads, but the storing beforehand took like 10 reads' time, ruining the optimisation. So caching field values in local variables pays off, but mostly only in loops (or to ease concurrency situations, of course).
But again a very valuable piece of the puzzle. Knowing this, one might think of field accesses as "tiny micro database accesses" from an algorithm's point of view, with the "application" being the local algorithm and the "database" being the instance lying on the heap. Tiny of course. Micro. You get it. Still a helpful picture to keep in mind for writing fast software.
To sum it up, it's a very good idea (if not mandatory if you want to call yourself a quality Java developer) to take advantage of your IDE's syntax coloring and automation to make the difference as readable as possible. I came up with a pattern that is quite intuitive and helpful:
1.) Activate automated "this." qualification.
2.) Make missing "this." qualification be indicated as a problem/warning (yes, really. Because it IS a problem)
3.) Make local variables and parameter green (green implies piece, calm, not critical, simply "green light", all is good)
4.) Eclipse's default blue color for fields fits in there very well: all fields and variables are colored, either green or blue, where blue means (not critical but at least) "Uuh, special, watch out".
Of course the green color has the consequence that comments have to be recolored from green to... hm, what would be a good color for some syntactiallcy meaningless, addon, optional, external text? Grey! Of course. Come one, what where they drinking when deciding that a comment should be colored green? Ah, whatever.
Here's a small example for my color scheme:
It immediately shows all the safe, uncritical, stackbased local variables at a glance and the object oriented, heap based, "micro-database" fields clearly distinct from them.
One might argue that having too much "this." everywhere in your methods clutter up the code too much. I'd say if you consider "this" qualification as clutter that should be avoided, you are ignoring large parts of the concept of object orientation which is very bad (if not for you internally, then at least for others who have to read your code) and maybe even your code should be overhauled to make less repeated field accesses.
There's one rule of thumb in Java development: Reading is more important than writing. So if you say you can write code better if you leave out readability helpers like "this.", honestly: no one cares. Make is so that many others can read and maintain your code well (at best easily self-explaining code with speaking variable names etc. plus explaining comments where they are useful), than it's good code. Not if you typed it a little faster, once.
2.) Methods vs. Functions
Most of the stuff said for variables applies to methods as well. Part of a class, etc. Qualify with "this.", although here the intention is more to clearly distinct them from static methods. Because in Java, believe it or not, there are NO functions at all. No "function" whatsoever ever being mistakenly named as such did not belong to a class. There is no ".java" file containing just a bunch of "public int calculateStuff(int a, int b)" or whatever. They are all always in a class.
In contrast to fields, there are object-oriented-wise two types of methods, of course. Static methods and instance methods. Okay. Still both of them are methods belonging to a class, having access to non-public fields of that class, etc. But non of them are functions.
Making the difference still makes sense and is even more important than the difference between fields and variables, because, imagine: (actual) functions are coming to Java!
They're called lambdas for now (google project lambda, scheduled for Java 8) and of course they will internally just be classes themselves, just like anonymous inner class instances (but maybe be substituted by method references as Remi wrote on lambda-dev, I don't exactely look into the matter enough to understand it completely), but conceptionally, they're real functions.
Bad times for all the people currently calling methods "functions" as well.
I already see Massive Confusion 2 coming up in Java theatres: "We moved it to a function" - "A function? Are you nuts?!" - "I mean a function in the class. In the instance. You know. The old function, not those new ones." - "Ah, you mean a method. Then just say method please." - "Isn't it the same?" - "OMFG..."
Honestly: No, it's not.
On a more general point of view, it's also very helpful to make something clear for one's understand of source code: We still develop procedurally. Objectorientation is not something different from procedural development, it's an extension. Again more like "ObjectOrientedProgramming extends ProceduralProgramming". And this is very simple to see in the code itself:
The "blocks" (mostly method blocks, but also initializer and static initializer blocks. So all the stuff triggering the actions between the curly braces, excluding array initializers and type bodies) are still plain procedural programming. Procedural "islands", if you will. (Only!) the things around them is the object oriented stuff. Or object oriented foundation (that older, procedural-only languages are missing) where the procedural algorithms are embedded. The curly braces for class and interface bodies are something completely different from those enclosing blocks. They might as well have used another symbol for them. The constructs floating around in the OO waters are all accessible by reflection (Class, Field, Method, Constructor. Modifiers, etc.), the stuff in the procedural blocks is not. Those block braces are like a sign "you are entering procedural land now, reflection does not apply here". Same goes for modifiers like public, protected, etc. They're just not applicable for procedural code. After learning (or no longer ignoring) the difference between fields and variables, it's also easy to see why a "volatile" modifier makes no sense for a local variable.
There's a ... <diplomacy>not so good</diplomacy> programming language that shows this difference in principle very well: Caché Object Script. There, the OO land has one syntax style (pretty much copied from Java, but anyway) and the blocks still (mostly) have the syntax style of its predecessor language MUMPS (<- no joke). Like outside blocks (OO land), you have to end a declaration with semi-colons, while inside (procedural land), you may not end a statement with semi-colons. Etc. If this language has any value, then it's the visibility of the difference between OO waters and procedural islands.
Seen on that general level, the difference of terms represents the difference of context:
Objectoriented: field, method
Procedural: variable, function
Thank you for reading (got a little long) :-D
A "function" is not the same as "method". And "variable" is not the same as "field". There really is a reason why they have different terms, a very important one. Really.
If you are not aware of the reasons behind it, you might very quickly think "Oh my, what's he up to now? Let go already, it's about the same, so no point in being so pedantic." I for sure did in the past. So it's fair if it's me who writes a text saying "No really, it's important, listen...".
So what's the important difference:
Methods and fields are object oriented constructs with enhanced characteristics, while variables and functions are just plain procedural things with reduced characteristics. And that difference in characteristics can mean the difference between proper understanding (and in the end working code) and half-baked understanding (or broken code). If we want to be very correct and play around with OOP terminology a bit, we could say "Field extends Variable" and "Method extends Function" (or even more precisely: "Function extends Procedure" and, well, "Method extends Procedure or Function depending on context"). But I'll keep that part simple and say that they are "completely" different things.
One might argue here that it actually should be "Field extends Variable" and "LocalVariable extends Variable" with Variable being the abstract super type and thus calling a field a variable is absolutely correct and that the word "local " before the variable has to be used consequently to determine the difference. I say: okay, but: a) the problem remains that by just saying "variable" it's not clear what is meant with it and b) it's much easier and clearer if "field" stands for fields and "variable" stands for local variables. Honestly, think about it. This still allows to add the "local" all the time if you wish, no problem, but it solves the ambiguity when talking/writing.
1.) Field vs Variable
Variables (in Java) are those things you write inside blocks. Like "int value = 5;". It holds a value (a primitive value or a reference value) and it stand for a index in the stack (the fast accessible exclusively owned peace of memory of every thread). That's about it.
Fields are much more. They, too, hold values (the same ones variables do), but there the similarity ended already. First difference: their values lie on the heap (the not so fast and big thread-common area in memory). Performance optimisations like escape analysis etc. can cause the values to lie on the stack, as well, but in general, field values are located on the heap. That means: every thread can access the same field. Which of course is good, because it allows for efficient thread communication. But it can also cause all kinds of trouble (see video below).
Fields also belong to a class. Fields are accessible and analyzable (and corruptable) via reflection. Fields are complex object oriented structures, not just mere procedural labels representing a value.
The simple nature of (local) variables is also an notable advantage: they are fast. And they are (thread)safe. If an algorithms works on a variable, there's peace, calm, efficiency, as no other thread can possible mess up with it while "your" thread is working on it. And it's also as fast as you can get, as everything happens on the stack. While on the other hand when working on fields, you always have to keep in mind "can this field be accessed by other threads simulateously?" and "are there unnecessarily repeated accesses to that field in a loop?". Fields are powerful, of course, but they are also much more hazzle that has to be taken care of.
So if someone talks about a field, but says "variable" all the time and you really internalized the difference, you have a hard time understanding what he's talking about. Funny thing is: if you see both as synonyms and don't know or care about the crucial differences, you aren't confused at all. Like having less insight into a matter lets you understand it better o_0. Of course this is just an illusive phrase as ignoring the differences will at some point make you pay for it.
Nice example: There's an excellent Google Tech Talk about "The Java Memory Model" (which in the end just deals with concurrency - and I'm proud to say that I already knew the important stuff. Like "synchronized" has a double meaning of block & synchronize, that volatile is a synchronize-only for fields - NOT variables, by the way! - and that you have to care about "happens before" relationships when dealing with concurrency. But really, if he'll say "volable" just ONE more time, then I'm... argh).
What caused a little confusion was, that the guy always mixed up "field" and "variable". For example by saying things like "another thread might change your variable". Knowing about the difference and seeing him as the expert that he undoubtly is, I immediatly "paniced" and thought: "WHAT? Can threads unsynchronizedly access variables as well? Is everything I build my concepts on wrong?". Luckily for me, my world doesn't have to fall apart. He just used the wrong term. Phew.
Moreless the same slopiness is to omit the "this." before a field. Of course it is optional for the compiler, but it makes it so much easier to read and quickly understand which ones are (complicated) fields and which ones are simple variables in a piece of code. This is so important and so simple to do (even automated by the IDE) that I slowly but increasingly can't help to see people who omit it as newbies or ignorant at best.
He does that as well (see 38:00 or 43:00), but of course I wouldn't call him a noob. I'd rather say he (slopily) doesn't care about it, same as he mixes them up when talking. No problem if he gets along with it, but it's definitely more difficult for others to read his code and follow what he's saying.
On a side note: we'd need much less concurrency lessons like that if little things like differences between fields and variables would be propagated better. Most of the problems with concurrency come from not making that distinction in the first place. At many of his examples what common mistakes and wrong intentions one might fall for I just thought "how could one possibly expect this field to be safely usable in a multithreaded program?" and that just because I know about the difference of fields and variables.
If you don't care for the difference, don't watch your terminogly and don't use "this.", then the problems start (or get severel times worse).
I keep two simple rules concearning fields and variables that help me tremendously in day to day Java developing:
1.) Fields are potentially always accessed by another thread (unless clean architecture guarantees otherwise) and therefore have to be at least volatile or used synchronized if the code is to be used in multithreaded environments.
2.) Working with fields is slower than working with variables, so they should be cached in variables when used in a loop.
Keeping those two rules in mind for every code you write increases the software quality by a felt 200%. At least that's my personal experience.
A side not about performance, because it's very interessting:
There's a kind of suprising relation between the performane of read and write operations of fields and variables, which is as follows (1 being the fastest, 4 the slowest)
1.) reading from variable
2.) reading from field
3.) writing to variable
4.) writing to field
One might think that both variable operations are on places 1 and 2 and the field operations are behind them. But that's not true (as explained to me by the running VM *cough*).
Reading from a field is faster than writing to a variable (a register). May have something to do with caching fields on the stack etc., but nevertheless: What we can derive from this is: Assigning stuff to a variable/field is actually pretty expensive, at least compared to reading. That's why it doesn't pay off to cache a field value in a local variable and then only use it 2-4 times. Yes those 4 local reads are faster then the field reads, but the storing beforehand took like 10 reads' time, ruining the optimisation. So caching field values in local variables pays off, but mostly only in loops (or to ease concurrency situations, of course).
But again a very valuable piece of the puzzle. Knowing this, one might think of field accesses as "tiny micro database accesses" from an algorithm's point of view, with the "application" being the local algorithm and the "database" being the instance lying on the heap. Tiny of course. Micro. You get it. Still a helpful picture to keep in mind for writing fast software.
To sum it up, it's a very good idea (if not mandatory if you want to call yourself a quality Java developer) to take advantage of your IDE's syntax coloring and automation to make the difference as readable as possible. I came up with a pattern that is quite intuitive and helpful:
1.) Activate automated "this." qualification.
2.) Make missing "this." qualification be indicated as a problem/warning (yes, really. Because it IS a problem)
3.) Make local variables and parameter green (green implies piece, calm, not critical, simply "green light", all is good)
4.) Eclipse's default blue color for fields fits in there very well: all fields and variables are colored, either green or blue, where blue means (not critical but at least) "Uuh, special, watch out".
Of course the green color has the consequence that comments have to be recolored from green to... hm, what would be a good color for some syntactiallcy meaningless, addon, optional, external text? Grey! Of course. Come one, what where they drinking when deciding that a comment should be colored green? Ah, whatever.
Here's a small example for my color scheme:
It immediately shows all the safe, uncritical, stackbased local variables at a glance and the object oriented, heap based, "micro-database" fields clearly distinct from them.
One might argue that having too much "this." everywhere in your methods clutter up the code too much. I'd say if you consider "this" qualification as clutter that should be avoided, you are ignoring large parts of the concept of object orientation which is very bad (if not for you internally, then at least for others who have to read your code) and maybe even your code should be overhauled to make less repeated field accesses.
There's one rule of thumb in Java development: Reading is more important than writing. So if you say you can write code better if you leave out readability helpers like "this.", honestly: no one cares. Make is so that many others can read and maintain your code well (at best easily self-explaining code with speaking variable names etc. plus explaining comments where they are useful), than it's good code. Not if you typed it a little faster, once.
2.) Methods vs. Functions
Most of the stuff said for variables applies to methods as well. Part of a class, etc. Qualify with "this.", although here the intention is more to clearly distinct them from static methods. Because in Java, believe it or not, there are NO functions at all. No "function" whatsoever ever being mistakenly named as such did not belong to a class. There is no ".java" file containing just a bunch of "public int calculateStuff(int a, int b)" or whatever. They are all always in a class.
In contrast to fields, there are object-oriented-wise two types of methods, of course. Static methods and instance methods. Okay. Still both of them are methods belonging to a class, having access to non-public fields of that class, etc. But non of them are functions.
Making the difference still makes sense and is even more important than the difference between fields and variables, because, imagine: (actual) functions are coming to Java!
They're called lambdas for now (google project lambda, scheduled for Java 8) and of course they will internally just be classes themselves, just like anonymous inner class instances (but maybe be substituted by method references as Remi wrote on lambda-dev, I don't exactely look into the matter enough to understand it completely), but conceptionally, they're real functions.
Bad times for all the people currently calling methods "functions" as well.
I already see Massive Confusion 2 coming up in Java theatres: "We moved it to a function" - "A function? Are you nuts?!" - "I mean a function in the class. In the instance. You know. The old function, not those new ones." - "Ah, you mean a method. Then just say method please." - "Isn't it the same?" - "OMFG..."
Honestly: No, it's not.
On a more general point of view, it's also very helpful to make something clear for one's understand of source code: We still develop procedurally. Objectorientation is not something different from procedural development, it's an extension. Again more like "ObjectOrientedProgramming extends ProceduralProgramming". And this is very simple to see in the code itself:
The "blocks" (mostly method blocks, but also initializer and static initializer blocks. So all the stuff triggering the actions between the curly braces, excluding array initializers and type bodies) are still plain procedural programming. Procedural "islands", if you will. (Only!) the things around them is the object oriented stuff. Or object oriented foundation (that older, procedural-only languages are missing) where the procedural algorithms are embedded. The curly braces for class and interface bodies are something completely different from those enclosing blocks. They might as well have used another symbol for them. The constructs floating around in the OO waters are all accessible by reflection (Class, Field, Method, Constructor. Modifiers, etc.), the stuff in the procedural blocks is not. Those block braces are like a sign "you are entering procedural land now, reflection does not apply here". Same goes for modifiers like public, protected, etc. They're just not applicable for procedural code. After learning (or no longer ignoring) the difference between fields and variables, it's also easy to see why a "volatile" modifier makes no sense for a local variable.
There's a ... <diplomacy>not so good</diplomacy> programming language that shows this difference in principle very well: Caché Object Script. There, the OO land has one syntax style (pretty much copied from Java, but anyway) and the blocks still (mostly) have the syntax style of its predecessor language MUMPS (<- no joke). Like outside blocks (OO land), you have to end a declaration with semi-colons, while inside (procedural land), you may not end a statement with semi-colons. Etc. If this language has any value, then it's the visibility of the difference between OO waters and procedural islands.
Seen on that general level, the difference of terms represents the difference of context:
Objectoriented: field, method
Procedural: variable, function
Thank you for reading (got a little long) :-D
Labels:
Architecture,
Code Quality,
General,
Performance
Abonnieren
Posts (Atom)