Sonntag, 3. Juli 2011

Variables are not fields, functions are not methods

There's a certain mistake in terminology I notice more often the more insight I get in advanced Java development (and that I of course did in times past as well) that increasingly annoys me:

A "function" is not the same as "method". And "variable" is not the same as "field". There really is a reason why they have different terms, a very important one. Really.
If you are not aware of the reasons behind it, you might very quickly think "Oh my, what's he up to now? Let go already, it's about the same, so no point in being so pedantic." I for sure did in the past. So it's fair if it's me who writes a text saying "No really, it's important, listen...".

So what's the important difference:
Methods and fields are object oriented constructs with enhanced characteristics, while variables and functions are just plain procedural things with reduced characteristics. And that difference in characteristics can mean the difference between proper understanding (and in the end working code) and half-baked understanding (or broken code). If we want to be very correct and play around with OOP terminology a bit, we could say "Field extends Variable" and "Method extends Function" (or even more precisely: "Function extends Procedure" and, well, "Method extends Procedure or Function depending on context"). But I'll keep that part simple and say that they are "completely" different things.
One might argue here that it actually should be "Field extends Variable" and "LocalVariable extends Variable" with Variable being the abstract super type and thus calling a field a variable is absolutely correct and that the word "local " before the variable has to be used consequently to determine the difference. I say: okay, but: a) the problem remains that by just saying "variable" it's not clear what is meant with it and b) it's much easier and clearer if "field" stands for fields and "variable" stands for local variables. Honestly, think about it. This still allows to add the "local" all the time if you wish, no problem, but it solves the ambiguity when talking/writing.

1.) Field vs Variable
Variables (in Java) are those things you write inside blocks. Like "int value = 5;". It holds a value (a primitive value or a reference value) and it stand for a index in the stack (the fast accessible exclusively owned peace of memory of every thread). That's about it.
Fields are much more. They, too, hold values (the same ones variables do), but there the similarity ended already. First difference: their values lie on the heap (the not so fast and big thread-common area in memory). Performance optimisations like escape analysis etc. can cause the values to lie on the stack, as well, but in general, field values are located on the heap. That means: every thread can access the same field. Which of course is good, because it allows for efficient thread communication. But it can also cause all kinds of trouble (see video below).
Fields also belong to a class. Fields are accessible and analyzable (and corruptable) via reflection. Fields are complex object oriented structures, not just mere procedural labels representing a value.
The simple nature of (local) variables is also an notable advantage: they are fast. And they are (thread)safe. If an algorithms works on a variable, there's peace, calm, efficiency, as no other thread can possible mess up with it while "your" thread is working on it. And it's also as fast as you can get, as everything happens on the stack. While on the other hand when working on fields, you always have to keep in mind "can this field be accessed by other threads simulateously?" and "are there unnecessarily repeated accesses to that field in a loop?". Fields are powerful, of course, but they are also much more hazzle that has to be taken care of.

So if someone talks about a field, but says "variable" all the time and you really internalized the difference, you have a hard time understanding what he's talking about. Funny thing is: if you see both as synonyms and don't know or care about the crucial differences, you aren't confused at all. Like having less insight into a matter lets you understand it better o_0. Of course this is just an illusive phrase as ignoring the differences will at some point make you pay for it.

Nice example: There's an excellent Google Tech Talk about "The Java Memory Model" (which in the end just deals with concurrency - and I'm proud to say that I already knew the important stuff. Like "synchronized" has a double meaning of block & synchronize, that volatile is a synchronize-only for fields - NOT variables, by the way! - and that you have to care about "happens before" relationships when dealing with concurrency. But really, if he'll say "volable" just ONE more time, then I'm... argh).

What caused a little confusion was, that the guy always mixed up "field" and "variable". For example by saying things like "another thread might change your variable". Knowing about the difference and seeing him as the expert that he undoubtly is, I immediatly "paniced" and thought: "WHAT? Can threads unsynchronizedly access variables as well? Is everything I build my concepts on wrong?". Luckily for me, my world doesn't have to fall apart. He just used the wrong term. Phew.
Moreless the same slopiness is to omit the "this." before a field. Of course it is optional for the compiler, but it makes it so much easier to read and quickly understand which ones are (complicated) fields and which ones are simple variables in a piece of code. This is so important and so simple to do (even automated by the IDE) that I slowly but increasingly can't help to see people who omit it as newbies or ignorant at best.
He does that as well (see 38:00 or 43:00), but of course I wouldn't call him a noob. I'd rather say he (slopily) doesn't care about it, same as he mixes them up when talking. No problem if he gets along with it, but it's definitely more difficult for others to read his code and follow what he's saying.

On a side note: we'd need much less concurrency lessons like that if little things like differences between fields and variables would be propagated better. Most of the problems with concurrency come from not making that distinction in the first place. At many of his examples what common mistakes and wrong intentions one might fall for I just thought "how could one possibly expect this field to be safely usable in a multithreaded program?" and that just because I know about the difference of fields and variables.
If you don't care for the difference, don't watch your terminogly and don't use "this.", then the problems start (or get severel times worse).


I keep two simple rules concearning fields and variables that help me tremendously in day to day Java developing:
1.) Fields are potentially always accessed by another thread (unless clean architecture guarantees otherwise) and therefore have to be at least volatile or used synchronized if the code is to be used in multithreaded environments.
2.) Working with fields is slower than working with variables, so they should be cached in variables when used in a loop.

Keeping those two rules in mind for every code you write increases the software quality by a felt 200%. At least that's my personal experience.



A side not about performance, because it's very interessting:
There's a kind of suprising relation between the performane of read and write operations of fields and variables, which is as follows (1 being the fastest, 4 the slowest)
1.) reading from variable
2.) reading from field
3.) writing to variable
4.) writing to field

One might think that both variable operations are on places 1 and 2 and the field operations are behind them. But that's not true (as explained to me by the running VM *cough*).
Reading from a field is faster than writing to a variable (a register). May have something to do with caching fields on the stack etc., but nevertheless: What we can derive from this is: Assigning stuff to a variable/field is actually pretty expensive, at least compared to reading. That's why it doesn't pay off to cache a field value in a local variable and then only use it 2-4 times. Yes those 4 local reads are faster then the field reads, but the storing beforehand took like 10 reads' time, ruining the optimisation. So caching field values in local variables pays off, but mostly only in loops (or to ease concurrency situations, of course).
But again a very valuable piece of the puzzle. Knowing this, one might think of field accesses as "tiny micro database accesses" from an algorithm's point of view, with the "application" being the local algorithm and the "database" being the instance lying on the heap. Tiny of course. Micro. You get it. Still a helpful picture to keep in mind for writing fast software.




To sum it up, it's a very good idea (if not mandatory if you want to call yourself a quality Java developer) to take advantage of your IDE's syntax coloring and automation to make the difference as readable as possible. I came up with a pattern that is quite intuitive and helpful:

1.) Activate automated "this." qualification.
2.) Make missing "this." qualification be indicated as a problem/warning (yes, really. Because it IS a problem)
3.) Make local variables and parameter green (green implies piece, calm, not critical, simply "green light", all is good)
4.) Eclipse's default blue color for fields fits in there very well: all fields and variables are colored, either green or blue, where blue means (not critical but at least) "Uuh, special, watch out".

Of course the green color has the consequence that comments have to be recolored from green to... hm, what would be a good color for some syntactiallcy meaningless, addon, optional, external text? Grey! Of course. Come one, what where they drinking when deciding that a comment should be colored green? Ah, whatever.

Here's a small example for my color scheme:

It immediately shows all the safe, uncritical, stackbased local variables at a glance and the object oriented, heap based, "micro-database" fields clearly distinct from them.

One might argue that having too much "this." everywhere in your methods clutter up the code too much. I'd say if you consider "this" qualification as clutter that should be avoided, you are ignoring large parts of the concept of object orientation which is very bad (if not for you internally, then at least for others who have to read your code) and maybe even your code should be overhauled to make less repeated field accesses.
There's one rule of thumb in Java development: Reading is more important than writing. So if you say you can write code better if you leave out readability helpers like "this.", honestly: no one cares. Make is so that many others can read and maintain your code well (at best easily self-explaining code with speaking variable names etc. plus explaining comments where they are useful), than it's good code. Not if you typed it a little faster, once.





2.) Methods vs. Functions

Most of the stuff said for variables applies to methods as well. Part of a class, etc. Qualify with "this.", although here the intention is more to clearly distinct them from static methods. Because in Java, believe it or not, there are NO functions at all. No "function" whatsoever ever being mistakenly named as such did not belong to a class. There is no ".java" file containing just a bunch of "public int calculateStuff(int a, int b)" or whatever. They are all always in a class.

In contrast to fields, there are object-oriented-wise two types of methods, of course. Static methods and instance methods. Okay. Still both of them are methods belonging to a class, having access to non-public fields of that class, etc. But non of them are functions.

Making the difference still makes sense and is even more important than the difference between fields and variables, because, imagine: (actual) functions are coming to Java!
They're called lambdas for now (google project lambda, scheduled for Java 8) and of course they will internally just be classes themselves, just like anonymous inner class instances (but maybe be substituted by method references as Remi wrote on lambda-dev, I don't exactely look into the matter enough to understand it completely), but conceptionally, they're real functions.
Bad times for all the people currently calling methods "functions" as well.
I already see Massive Confusion 2 coming up in Java theatres: "We moved it to a function" - "A function? Are you nuts?!" - "I mean a function in the class. In the instance. You know. The old function, not those new ones." - "Ah, you mean a method. Then just say method please." - "Isn't it the same?" - "OMFG..."

Honestly: No, it's not.





On a more general point of view, it's also very helpful to make something clear for one's understand of source code: We still develop procedurally. Objectorientation is not something different from procedural development, it's an extension. Again more like "ObjectOrientedProgramming extends ProceduralProgramming". And this is very simple to see in the code itself:

The "blocks" (mostly method blocks, but also initializer and static initializer blocks. So all the stuff triggering the actions between the curly braces, excluding array initializers and type bodies) are still plain procedural programming. Procedural "islands", if you will. (Only!) the things around them is the object oriented stuff. Or object oriented foundation (that older, procedural-only languages are missing) where the procedural algorithms are embedded. The curly braces for class and interface bodies are something completely different from those enclosing blocks. They might as well have used another symbol for them. The constructs floating around in the OO waters are all accessible by reflection (Class, Field, Method, Constructor. Modifiers, etc.), the stuff in the procedural blocks is not. Those block braces are like a sign "you are entering procedural land now, reflection does not apply here". Same goes for modifiers like public, protected, etc. They're just not applicable for procedural code. After learning (or no longer ignoring) the difference between fields and variables, it's also easy to see why a "volatile" modifier makes no sense for a local variable.

There's a ... <diplomacy>not so good</diplomacy> programming language that shows this difference in principle very well: Caché Object Script. There, the OO land has one syntax style (pretty much copied from Java, but anyway) and the blocks still (mostly) have the syntax style of its predecessor language MUMPS (<- no joke). Like outside blocks (OO land), you have to end a declaration with semi-colons, while inside (procedural land), you may not end a statement with semi-colons. Etc. If this language has any value, then it's the visibility of the difference between OO waters and procedural islands.

Seen on that general level, the difference of terms represents the difference of context:
Objectoriented: field, method
Procedural: variable, function


Thank you for reading (got a little long) :-D

Keine Kommentare:

Kommentar veröffentlichen