February 18, 2015

Types as Units

A few years ago, Steve Yegge wrote a [great piece][yegge] arguing that software developers can be divided into conservatives and liberals, and moreover that, like in politics, there are some issues where the dividing lines are very clear. The first issue on his list was type systems. Given how different Swift’s type system is from Objective-C’s, I’m going to take a more general look at types as a concept. You’ll find that, like many people who were enthusiastic about Swift from the get-go, I am a software conservative on this issue.

The prototypical example of a liberal language is Ruby. Ruby has no compile-time type checking. As long as your syntax looks like it belongs to a Ruby program, the interpreter will happily load it and try to do something with it — but it will crash your app if you messed up and tried to add a Restaurant and a PoisonDart (you have an interesting problem space).

The paragon of conservative languages is Haskell. Its compiler will exhaustively check every implication of your declared types and only produce an executable if they are consistent. Your application will rarely crash once it’s running, but the compiler will not let you get away with trying to juggle an Airplane.

On the spectrum, Swift falls much closer to Haskell than to Ruby. Objective-C as used by most people I know and open-source projects I’ve seen is still closer to Haskell than to Ruby, but only by a little; it can potentially be very close to Ruby, as it uses dynamic dispatch under the hood and practically everything can be cast to id.

But really … what are these “types” that so divide us?

• • • • •

Before I ever wrote a line of code, I studied engineering. Early physics classes emphasized the importance of units: a ball doesn’t have a speed of 5, it has a speed of 5 meters per second. Non-controversial, but there’s a subtlety here. Suppose I want to find another speed. Nothing in arithmetic prevents me from multiplying 5 meters per second by 10 apples per pear. I’ll get fifty…meter-apples per second-pear? That makes zero sense. But why?

To say that 5 is the speed of a ball implies that 5 has units of distance per unit of time, and vice versa. The reason the result was nonsense is clearly that meter-apples per second-pear is by definition not a unit of speed — any such unit must be some distance over some time interval.

Put that way, units seem tautological, except for this: arithmetic doesn’t understand units. Units constrain arithmetic, defining how quantities with units combine to create quantities with other units — and letting you know that certain arithmetical operations give nonsense results for your problem and certain operations aren’t allowed at all.

For instance, you can always multiply quantities with disparate units — but you might get results that are in meter-apples per second-pear, which might not be what you were looking for. On the other hand, you can’t add quantities with disparate units: 5 meters per second can’t be added 20 inches per year; you have to convert one to unit to the other. Units annotate our quantities and imbue them with additional information that, when enforced, keeps results consistent.

• • • • •

Several of my classes featured equations galloping across the whiteboard like Cossack armies in the Russian winter, and my professors often brought up [dimensional analysis][] as a useful tool for double-checking our work. After a derivation that would often take a page or two, they recommended looking at the resulting units of all the quantities before plugging the final numbers in, to make sure that we were getting the right units out.

None of us ever did that.

It’s not that we didn’t care. We were learning to design airplanes; an error would result fiery death; of course we cared. (Okay, we didn’t care that much; we were students.) The problem was that it was an extra set of computations that was time-consuming and that was about as good as looking at our results and making sure they passed the smell test, i.e. that they were in the right ballpark. It wouldn’t save us on a test if we really screwed up, but we trusted our symbolic manipulation skills enough to say that if 1. we got a result and 2. the result was in the right ballpark, the result was probably right.

If we’d had a way to put the final algebraic result in a computer and have it spit out the units, though, a lot more of us would have done it. After all, manually computing the units is tedious (and error-prone), but using an automated tool to sanity-check our work would have been just common sense.

Here’s where I’m going with this: types in software development are directly equivalent to units in engineering and science. They aren’t necessary to do your work, they don’t prevent you from screwing up, and they don’t prevent you from being inconsistent — but they help you understand what the entities you’re dealing with are.

Automated type checking, on the other hand, is a tool that helps you be consistent. And the more powerful your type system, the surer you can be of your code’s internal consistency.

• • • • •

The value of consistency to a software project is hard to quantify. Consistency doesn’t equal quality, and it doesn’t even equal sanity. But I am a conservative when it comes to type systems because it does equal focus.

Back in school, when I made a mistake that resulted in a quantity with nonsense units, I felt like an idiot. Seriously, at what point did I think I’d multiply speed by distance and get a pressure? The mistake was completely preventable and I had nevertheless burned time thinking about my problem with an inconsistent mental model of the world. For however many computations until I caught a given unit error, my mental model of the universe had that error as a valid thing to do.

And sometimes the model would stay wrong for a good long while! A page or two of calculations! If someone had told me, from the get-go, that what I was trying to do was nonsense, all that work would have been saved, and I would have spent more time building an appropriate model and solving the actual problem. That someone, when I program, is the compiler.

In a language like Ruby, a type error will go undetected until runtime. Consider this simple method to add the numbers in an array:

def add_numbers(arr)
  arr.reduce { |memo, obj|
    memo + obj
  }
end

It does its job magnificently — as long as we’re passing in an array with numbers. If we pass in an array with strings, though, it concatenates them. Most of the time, this is fine. But what if I am manipulating an array of numbers represented as strings, say ["1", "2", "3"]. Mentally, I could very well be thinking of that array as numbers, and when I pass it to the array, I would get back "123" instead of 6. At some point, this inconsistency will bite me — but not until I’ve expended significant effort writing code that assumes there is no inconsistency.

In Swift, by contrast, the method is explicit:

func addNumbers(nums:[Int]) -> Int {
    return reduce(nums, 0) { memo, num in
        return memo + num
    }
}

I could never call addNumbers with ["1", "2", "3"], because that array’s type would be inferred as [String], and the compiler would immediately let me know that I am trying to do something inconsistent. That wouldn’t by itself solve the problem (do I need to make the method more generic, or do I need to convert the strings into integers?), but at least it would prevent me from writing code as if there were no problem.

• • • • •

I emphasize that this is personal preference. I can’t claim that I will write code faster, or that my code will be better, due to the type system. I merely say that the compiler will catch a certain class of mistake early in the development process, and force me to keep an internally-consistent model of the world — to focus on my model. I value that, but others might prefer the freewheeling development afforded to them by Ruby. That’s fine.

What I do want to emphasize, though, is how much like units types are. You can do science and engineering correctly without doing dimensional analysis. But if you get into an inconsistent state with your units, they will [bite you][gimli-glider] — perhaps [catastrophically][mars-orbiter]. The only way to avoid such issues is to do (correct) dimensional analysis before using the results of any calculations.

Software has it a little easier. During development, it’s okay to get in an inconsistent state: the app will crash in the test server, or the compiler will whine, and the bug will be fixed before being rolled out to the real world. Even in production, most applications aren’t so critical that an occasional crash will have dire consequences.

Some are, though. And those applications — avionics, guidance, air traffic control — are often written in type safe languages like [Ada][]. Even when written in more traditional languages, their code style guidelines are [very stringent][jpl-code-style], in hopes that the compiler and analyzer will catch more bugs. Isn’t that interesting?

It makes sense, if you think about it. Mission critical software has a very high cost of failure, and is often difficult to test until it is actually controlling a plane, or guiding a missile, or directing air traffic — and those are terrible, horrible, no good, very bad times to discover an internal consistency error. Even in a testing context, crashing a test plane is a multi-million dollar error, and I’m sure air traffic control validation tests are not cheap.

What type systems give us, then, is a particular class of verifiable facts about our code base. In some contexts, knowing those facts is not particularly useful; in others, it is absolutely critical. But no matter what the actual needs of the software system itself, one can have a preference for up-front verifiability or a preference for implicit trust in the developer’s abilities. I tend to prefer verifiability, stodgy conservative that I am.

• • • • •

“Verifiability” … that’s very very close to “provability”, isn’t it …? Does that mean that type systems let us prove certain things?

Yes, as a matter of fact.

And this newfound knowledge is slowly changing how I approach programming. I’ll go into that soon.

[yegge]: https://plus.google.com/110981030061712822816/posts/KaSKeg4vQtz
[dimensional analysis]:http://en.wikipedia.org/wiki/Dimensional_analysis
[Ada]:http://en.wikipedia.org/wiki/Ada_(programming_language)#History
[gimli-glider]:http://en.wikipedia.org/wiki/Gimli_Glider
[mars-orbiter]:http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_failure
[jpl-code-style]:http://spinroot.com/gerard/pdf/P10.pdf

102

Kudos

102

Kudos

Types as Units

Now read this

The Culmination: Final Part