Samstag, 9. Juli 2011

Questionable use of Soft References

While this post isn't really about my collections framework, there's a good story in it suitable as an introduction to what I actually wanna write about:

There's an inbuilt support for volatile elements (weak referenced elements, etc.). From a developer's perspective, there's simply a factory method for, let's say a new hash set collection instance to which you can pass a ReferenceType.WEAK (or SOFT or STRONG, with STRONG being the default for an overloaded alternate version in case you don't bother selecting reference type at all).
Internally, there will be created an apropriate HashLogic instance pluged in the new hash set instance to handle elements (or references to them, to be more precise) that can just vanish any time.
Some months ago, I removed all the HashLogic variations for soft references. Not they can't be handled any more. If at some point someone (maybe me) implements them again (for the then matured architecture compared to that some months ago), they'll just work fine without the framework having to be adjusted.
The reason was: while weak references are pretty useful, I never saw any soft reference in practice, only in theory and in academic examples for the different reference types. And personally I never could get rid of the feeling: if your programm touches the boundaries of the heap, something else already went wrong before. Soft reference might be useful here, but only for a few corner cases which I don't care (at the moment) to support.

Now to the actual part:
I just watched the Google Tech Talk "A JVM Does That?" from Cliff Click (is that really his name? ^^). Very interesting. He has a high throughput of information, but not too fast, just about right.
At 30:50, he gives a very good example for why soft references and phantom references (the latter tbh I never really understood... :-/ ) are of questionable use.
Just two things as a comment:
1.) Good example
2.) I knew it! (hehe)

Later in the talk he states similar for weak reference, but at this point he just slides in ("maybe weak refs are useful").

Maybe a little objection about weak references: Yes, they are useful. Probably not how they are used most of the time (like "make everything weak an you can never get memory leaks" and then all of a sudden: unplanned null, exception, crash.), but I have one very good example where they are useful:
If you have some sort of "meta registry" that has to watch over some/all instances in your running application and has to manage them in some way, it has to keep its fingers out of the "actual work" your application does. Meaning if may only observe living instances, but never be a cause for them to live in the first place. If such a meta registry would hard-reference the instances it observes, no instance could ever get garbage collected. Or in other words: it would be the biggest, complete, "perfect" memory leak. For this reason, a meta registry MUST use weak references.

Btw: What's a real use for such a "meta registry?". The reference registry of an OR-Mapper, for example (and btw: yes, it's a REGISTRY, not a CACHE as all the existing ORMs call it. It main reason is to keep track of already handled instances to guarantee referential integrity). The ORM has to know which of the living instances as which id etc. associated even if the instance doesn't contain that information on its own (because ORM should be orthogonal and not require to add thousands of annotations and special fields like existing ORMs do simple-mindedly). For it to know the instances, it has to reference them. But it may not strongly reference them as it would cause the "perfect" memory leak (or burdon the developer using the ORM to do its management work manually by unregistering instances etc. explicitely which is a really horrible thought).

I guess those kind of situations is what brought him to saying "maybe weak refs are useful" and the reference to them not being proper architecture later on aimed more at the unwise uses of them.

Still I would agree that such application logic should be done on the application level (the ORM framework in this case) and not via some "magic" on the GC level. But for that to work on application level, the VM would have to provide a simple and fast way of telling how many times a certain instance is referenced. If that count would be only 1, a strongly referencing registry could derive "oh, no one except me is referencing it anymore, so I can let go if it for it to get collected".
Problem with that would be:
a.) AFAIK, JVMs and GCs don't work like that. There is no hard cached per-instance "reference count" the application could access.
b.) Such a logic is actually part of garbage collection. If the registry would do that, JVM engineers like Cliff would come along again and say "That's GC work, leave it to the GC. Take weak references instead and let the GC handle it!" I hope he would say that because if he would just deny both options, he'd just ignore real world and proper design needs of having an unintrusive automated reference registry for certain situations like ORM. OR, hehe, the JVM would provide a built-in service for ORM. But then still there would be other applications for keeping track of references inside the application.
One way or another: weak references do have their right to exist (even for common cases).
Soft reference most probably don't.


UPDATE:
Yesterday, a colleague pointed out to me that SoftReferences do have their use for caches. The stuff in the cache is present as long as there's enough memory. If heap runs out, the caches get cleared as needed and filled again only on demand.

Makes sense, but I'm still a little skeptic if working if a constantly full heap by design is really a good thing to do. And what about once heap is full the caches swamp each other all the time creating a live lock or at least extremely bad performance.
Still SoftReferencing sound only like some sort of "last resort" means to let the application live "a little longer" if it's already too late (memory-wise), where the actual problem is in the design that doesn't prevent memory running out.
Classic example of "I don't care about memory consumption, I just use soft references".



Another mentionable part in the video:
18:57 - People don't know how to write concurrent programms.
There's such a fitting fictional quote he's saying: "I don't know why it's busted, I've got some data race, so I throw another lock in, throw another lock in, thro - oh, the bugs went away! Okay, I'm done."

That is very well to the point. If you just look at the stupid Vector implementation in the JDK collections (oops, here's the collections topic and the evil JDK bashing again), which btw. everyone on the Java development side nowdays says that it's virtually deprected - you see a very good example for what he means. Let alone everything that normal application developers tinker all day when stumbling accross concurrency.
Synchronisation (or locking, to be more precise) is good and important. But it has to be used in the right way. For example it's very easy to implement a ThreadLocal that is NOT synchronized for reading accesses and doesn't even have to bloat the Thread class but is still thread safe (simply because you can exploit the special behaviour of thread instances if used as a hash key so that every thread will always only reach its own hash entry and never that of another thread and whooops: unsynchronized thread safe read-only access. You can find my complete and well documented implementation on sourceforge if you google a little).

Btw: just because acquiring locks gets faster and faster as the JVM guys are optimizing it further and further, it does NOT mean that one has no longer to care about them (I already hear the people saying "but the video says they only take a few nano seconds nowadays, so why should I care?").
The lock managment itself may someday cost zero time, but the code that is locked and executed while all other threads wait to enter it can still take any amount of time. So synchronisation problems are nothing that the JVM will someday optimize away, but are of architectural nature (luckily, because otherwise: if not for such complex problems, there would be no need for human developers in the future in the first place).

Keine Kommentare:

Kommentar posten