Sonntag, 22. Mai 2011

Assignment Inlining

What is the difference between the following two code samples:

final int[] values = this.someValues;
if(index < values.length){
    return values[index];

final int[] values;
if(index < (values = this.someValues).length){
    return values[index];

Both do exactly the same thing and both a equally long. The second one is a little more complicated to read, granted. But it is also reasonably faster.
Why is it faster?

For a very natural reason, that applies to CPUs as well as to human beings. Let's say CPUs are only human and work on a desk with a shelf (= CPU registers). A local variable is effectively just a CPU register or a slot in the shelf, for that matter. To fetch a sheet of paper (load a value of the variable) from the shelf (the register) or to put (store) it there takes time.

Example 1 instructs the CPU via bytecode to:
- write the address of the array on a sheet of paper
- put the sheet into slot 1 of the shelf
- fetch the sheet from slot 1
- compare the length of the array on the sheet with the index
- ...

Example 2 instructs the CPU via bytecode to:
- write the address of the array on a sheet of paper
- and compare the length of the array on the sheet with the index

(Okay the comparison is a little lopsided: actually, the value on the stack is duplicated and stored away once instead of stored away and loaded back. But the picture stays the same)

It's quite obvious: The instructions of the first example are a little... well.. silly for the executing CPU or human being, because the newly written sheet is put away just to be fetched again. Number two is a little smarter, because it says "continue to use what you've already got".

The performance gain is very small in numbers, BUT: If a method doesn't do much else (like in the example), or if multiple optimizations of that type can be done, then the small gain compared to the anyway small workload becomes relatively high.
And that comes without "side effect". No structural disadvantage, no instability, nothing. Even variable finality is kept. It's just that the characters in the source code are arranged differently without changing their meaning and the stuff gets faster. That's all.

Sometimes, such an "assignment inlining" can even spare a local variable completely - and then it gets really fast (because storing to a register is surprisingly expansive).

The internet provides hardly any information to that topic, which I wonder about a little. Only one sophisticated bytecode optimization talks about replacing an equivalent STORE/LOAD by a DUP(licate) instruction. So I researched it on my own. It really works that way, every little test project can show it.

Of course that doesn't mean that everyone shall write bytecode-optimized code all the time.
It depends on the situation if it makes sense to do it.
If you have a piece of code that is called only once when pressing a button, or that assembles a SQL statement (which itself takes computer-AGES to be executed), etc., then there's absolutely NO point in reducing readability to gain some 0.000001% performance. In such cases, readability and maintainability of source code is much more important than any performance optimizations.

On the other hand, if you implement something like a Comparator, a tiny piece of code (3-4 lines maybe) but which gets called a million or a billion times subsequently, then such a small change can make a big difference. Especially when algorithms grow logarithmical or even exponential in runtime complexity (like sorting, for example), this can quickly mean the difference of several seconds response time per action for the user.
(and honestly: in 3-4 lines of code, even inlined statements still keep up readability)

This is one of many performance optimizations I use in my "Next Generation Collections" framework, all in all resulting for example in GrowList.add() (the behaviour-wise cleanly named equivalent to the implementation-wise named ArrayList of JDK) being 200% to 300% as fast as ArrayList.add().

Keine Kommentare:

Kommentar veröffentlichen