Tuesday, May 22, 2007

Minimizing Java bytecode size

I played around with the topic mentioned in the title of this entry when I was considering entering the Java 4K Programming Contest. I ended up not participating in the contest because I couldn't dream up any cool ideas for a game, but I did come up with a list of things which help with the code size. I would like to point out that the 4K limit is for the executable class file and not the source file and thus these actions have the intention of reducing the size of the compiled class file.

Some of these are obvious. Others might be less obvious to the masses.

Action: Don't bother with generics.
Justification: Generics stuff is not used by the JVM and it only takes up space.

Action: Compile without debug information or strip it afterwards.
Justification: This includes source code linenumber information, source code file name and localvariable names. The attributes in the class file to be removed are Source, LocalVariableTable and LineNumberTable. These are not required as they only exist for debugging purposes.

Action: Rename all class, method and field names to be one character.
Justification: One character takes up less space than two or three or four.

Action: Do not define a package.
Justification: The package definition provides no functionality (in this context) and takes up space.

Action: Minimize the number of methods.
Justification: Method headers take up a lot of space.
2 (accessflags) +
2 (name index) +
2 (descriptor index) +
20 (empty Code attribute)

That's 26 bytes and still assuming you manage to reuse the name string in the constant pool and reuse the method descriptor (which defines the return type and parameters) and not throw any exceptions.

Action: Minimize the number of fields.
Field headers take up a lot of space.
2 (accessflags) +
2 (name index) +
2 (descriptor index) +
2 (0 attributes)

That's 8 bytes and still assuming you manage to reuse the name string in the constant pool and manage to reuse the descriptor.

Action: Strip any method throws information.
Justification: The VM doesn't use this information. You can't tell the compiler not to create it, but it can be stripped afterwards.

Action: Rearrange local variables, putting the four that are the most used first. Heavily reuse these variables.
Justification: Instructions that refer to the local variables 0-3 take up one byte and instructions that refer to the rest of the local variables take up two bytes.

Action: Set the scope of the local variables so that only 4 variables are in scope at any given time.
Justification: Two variables, even if they're of different types, can be stored in the same local variable "slot" if their scopes don't overlap. And why using the first four local variables is good is explained in the previous item.

Action: Reuse string constants. As well as any string literals in the code, this includes class, field and method-names. If you have one class which has one field and the main method (to have a Java application entry point) name the class and the field "main" as well.
Justification: The compiler will only need to put one string entry into the class file constant pool.

Thursday, March 01, 2007

Java 6 == NoClassDefFoundError

I thought I might as well share this.

Problem

I'm working on an open source project and recently I tried to run the executable .jar file on my windows. It wouldn't run, which was surprising since I had done some basic testing on it and I had certainly been able to run it before.

That version still had no logging support, so I went to the console to try and run it, and sure enough it died with a:
java.lang.NoClassDefFoundError: com.sun.jdi.StackFrame

Background


That com.sun.jdi package quickly gave a pretty good idea about what was going on. Time for some background information.

First of all, the project had no external dependencies. I wasn't using anything that required extra jars to be packaged with the binary. This wasn't really as much a conscious decision as it was just something that happened.

When I added support for debugging another VM to my project, I used the com.sun.jdi.* classes from tools.jar that comes in a standard JDK. I had the bright idea to still not introduce a forced dependency, making this functionality optional: If the classes were found in the classpath the functionality would be there, if not, the rest of the application would still be usable.

I did this in a somewhat lazy and half-hearted, experimental manner, just testing around until it worked the way I wanted.

Reason for problems

So when I got that NoClassDefFoundError I knew my fragile construction had broken down. I initially assumed it had something to do with the changes to the Java 6 class verification, but instead it turned out that it was about changes to Swing classes: I had a JPanel subclass which had a method which had that JDI class in it's signature, and in Java 6 a call to the JPanel's constructor does something like:

this.getClass().getDeclaredMethods();

And that bit of reflection causes the classloader to try and verify all the classes referenced in the signatures of the methods (with any version of Java).

Fix

Having learned my lesson, I wrapped all the references to the JDI classes in any method signatures in my own local interfaces and now the project once again works fine with Java 6 as well.

Is this your problem?


I don't think that a lot of sensible people would try something like my original unplanned bubblegum attempt, but I do see some other possible scenarios where the same problem might arise. They all involve an existing system with some problems which are triggered by a change to Java 6.

For example, you might have a reusable component A which is compiled separately and has some dependencies to an external library B. The component A is used by two systems C and D. System C uses functionality which requires the library B and System D uses functionality that doesn't require the library B. For some reason library B hasn't been added to the classpath of the System D and it's never caused a problem before, but there's a reference to one of the classes in library B in some signatures of a user interface class and thus moving to Java 6 triggers that problem. Easy enough to fix adding the library B to the classpath, but a very annoying problem in the case of a maintenance system whose inner workings nobody is very familiar with.

Thursday, January 25, 2007

Four basic ways to avoiding annoying bugs

Let's start with a disclaimer: This is not, and doesn't even try to be a comprehensive list of any sort. Nor am I claiming that these are the most important things. These are simply the four first things that came to my mind within the time I was willing to spend on this post. They are all rather obvious and some may be debatable.

1) Don't use empty catch blocks. Ever.

When you first write a catch block don't navigate away from it without at least putting in a call to the printStackTrace method.

try {
...
} catch (Exception e) {
e.printStackTrace();
}

The reason why this is important has to do with the predictability of a piece of code. When people first start off with a new language (or programming in general) it often leads to silly things like:

int value = 3;
if (value != 3) {
System.out.println("Problem with the assignment.");
}

The code doesn't work the way you think it should work and you start doubting the most basic things like a simple variable assignment. "Well, maybe there's some catch to it", you think. As you grow more comfortable with the language, you get past doubts like this which in turn speeds up your debugging process a lot.

The problem with exceptions is that if you get no feedback whatsoever from the program, the effect of the exception appears to be just that; An assignment that fails. Granted "int value = 3;" cannot throw an exception, but if the right-hand side of the assignment is a simple method, it might.

I actually like to keep a printStackTrace in place until I've finished a piece of code and tested that it works even if it's a block catching something very specific and controlled such as InterruptedException or NumberFormatException.

What if the exception block is something that should never happen. Or better yet, if you're absolutely certain that will never get executed, surely then it's ok to leave an empty catch block, right? Wrong. If it never gets executed, fine, the printStackTrace never does anything so it doesn't hurt. And in the eventuality that you break the code and the thing that was "never going to happen" does happen, you won't be totally lost.

For those fairly exceptional situations where you really need to silently discard the exception, I suggest you originally write the catch block with at least the stack trace printing and only remove that once the code block in question is finished and tested. And when you remove it, you put a comment explaining the silent discard.


2) Minimize scope

If you only need a variable within a loop, define it in the loop, and not at the start of the method even though it might seem like a good idea in terms of organization.

If you know your variable's scope is limited to that loop, you know that any code outside that loop can't access/modify your variable. It makes reading the code easier. It makes debugging easier. It makes modifying the code easier. It helps avoid and makes easier to spot some nasty bugs.


3) Use descriptive names

The bigger the scope of your types/variables, the more important a good name is. If the scope of a variable is 2 lines, the importance of the name is not so great.

If the scope is global - such as the name of a class - a good name is vital.

When your names are descriptive, pieces of code that do illogical things - you know, stuff that you wrote in the morning before actually waking up, stuff that you wrote when your mind was already out to lunch, etc - stand out more clearly.

Consider, for example:
bankAccountBalance = -7;

4) Don't reuse variables

Recycling is good, but it's also inherently complex. So unless you have a really valid performance reason to do this, you shouldn't. This is somewhat related to items 2 and 3. Allow me to explain:

-If you're reusing a variable instead of having 2 variables with a small scope you'll have one variable with at least doubled scope.

-If your one variable holds the bank account balance, mail server port and the width/height ratio of your dialog window it's fairly hard to come up with a good, descriptive name for it. (In case you were curious: no, in my opinion "balancePortRatio" is not a good name.)

Obfuscating by overloading method and field names

Some time ago, while testing reJ I came across an interesting form of obfuscation that I hadn't realized was possible.

This obfuscated classfile had several fields with the exact same name, but a different type. And also, several methods with identical names and parameters, but different return types.

For example:

public class Example {
private int a;
private String a;
private double[] a;

public void method() {
}

public String method() {
return null;
}
}

Obviously, this is an illegal situation in a java source file. But in the compiled code this is not a problem, as in the java bytecode all the instructions that refer to fields or methods always define the entire signature of the field or method in question. That is, including the (return) type.

Apparently ProGuard's agressive overloading produces this kind of an obfuscation.

(http://proguard.sourceforge.net/manual/usage.html#overloadaggressively):
Specifies to apply aggressive overloading while obfuscating. Multiple fields and methods can then get the same names, as long as their arguments and return types are different (not just their arguments). This option can make the output jar even smaller (and less comprehensible). Only applicable when obfuscating.