Please join me at my new location

Thursday, January 28, 2010

On Subclassing

And now for another excursion into the depths of my experience and thinking about object oriented design and philosophy. This time I'm going to talk about some pitfalls I've seen with one of the most commonly used, and abused tools in the object oriented developer's toolbox: subclassing and access modifiers.

Subclassing has been around since the dawn of objected oriented programming. It's just one of many ways to enhance code reuse, but I've seen enough blood shed due to the misuse of this feature that I've stopped using it in cases where I can get away with another approach. The core of the problem with subclassing tends to be an eagerness to reuse as much code as possible without any consideration of how code reuse decisions will effect the overall design. Yes, reusing code means that there are fewer places to change when a change must be made, but subclassing is a merely a means to an end. There are other, better approaches that give the same benefits without the problems.

So what are some of the problems with subclassing? I think the central failing is that subclassing is the most tight form of coupling there is. When you decide to subclass an implementation, you've solidified part of your implementation. A common word of wisdom is that you should always code to interfaces so that implementations can be changed at a later date with a minimal of fuss. By subclassing an implementation you are publically declaring that you are and always will be a subclass of X.

Another problem with subclassing is that you cannot narrow the interface of the class to anything smaller than the subclasses interface, even if it doesn't make sense. A textbook example of this in Java is the java.util.Stack class. java.util.Stack extends java.util.Vector which in turn extends java.util.AbstractList which is an abstract implementation of java.util.List. The java.util.List interface contains methods such as add, get, indexOf, etc. If you've followed along that means that java.util.Stack, in addition to its own interface also has these methods. A stack should really only have two methods for mutating the stack (push and pop) and perhaps some methods to determine how many items are on the stack. By subclassing, java.util.Stack has guaranteed that it implements the java.util.List interface and therefore has methods that allow a developer to work around the stack encapsulation. Ah, but you can override those methods in the subclass with ones that do nothing or throw a java.lang.MethodNotImplemented exception. Certainly you could, but then if an instance of java.lang.Stack was passed to a method that accepted java.util.List it would not produce the desired result -- either by throwing an exception or not performing the operation.

No discussion of subclassing would be complete without mentioning the Liskov Substitution Principle. I think it's great advice it theory, but as we've all seen it's not exactly easy to follow. If the JDK developers can make such an error with all of their review process, how does a lowly developer stand a chance?

Aside from these problems there's always the minefield of access modifiers. In my opinion there should really only be 2 at the class level: public and private. Protected variables are a major cause of headaches since subclasses can easily muck around with the internals of a superclass, violating encapsulation. Protected methods provide a second interface to the class which has to be maintained and documented -- as if a single interface wasn't enough problems. So, private and public is really all that's needed - stuff that's internal and stuff that's public.

So if not subclassing then what? In a word delegates, specifically the delegation pattern. With the delegation pattern you simply implement interfaces, if applicable, and delegate parts of the implementation to a concrete class. By doing so you don't have to expose any information about how your class works internally. And since you're not subclassing you don't inherit interface that doesn't make sense for your class. As an added bonus, since you're using the public interface to the concrete class you don't have to worry about methods internal to the implementation changing or becoming deprecated.

Delegates are also an effective strategy for reusing code. You can encapsulate logic into a single class and use it from anywhere in your application. This is a great benefit when you have an irregular class heirarchy, it effectively gives you mixins without the headaches of trying to figure out which pieces of implementation are coming from which mixin.

Obviously there are times when subclassing is the right thing to do. But in my experience I see it used much more often than it should be and usually for the wrong reasons. I hope my thoughts have given you something to think about next time you reach for that subclassing hammer.