r

Multiplication Considered Harmful

Today we’ll be talking about the dangers of multiplication integer
overflows in C, the type of bug responsible for the
OpenSSH hole of 2002.

Integer overflows occur when an operation causes a number to
grow or shrink out of bounds,
resulting in a much smaller or much larger number than expected.
For example, a char can only hold 256 values.
If two chars are multiplied together,
there is a possibility of overflow:

    char a, b;
    a = b = 127;
    a = a * b;  /* overflow */

Because arithmetic is done in code all over the place,
adding checks can be very tedious and is often ignored.
A common idiom to allocate memory is:

    size_t num, size;
    char *ptr;
    …
    if ((ptr = malloc(num * size)) == NULL) /* overflow */
        return (NULL);

The multiplication can create a number larger than a size_t can hold,
causing an integer overflow.
The integer overflow in this case causes less memory to be allocated
than the program expects.
The program could then take user input and write past the actual small
allocation into memory it thinks it owns,
causing problems like a generic buffer overflow.
This was the idiom used in the OpenSSH hole of 2002,
demonstrating the severity of these bugs.

We recommend using calloc(3) whenever allocating multiple objects
of the same size.
The above code would be written as:

    size_t num, size;
    char *ptr;
    …
    if ((ptr = calloc(num, size)) == NULL)
        return (NULL);

calloc(3) has checks to make sure the multiplication does not overflow.

Unfortunately there is no safe replacement for realloc(3).
To prevent multiplication overflow, a check must be added:

    size_t newsize, num, size;
    char *newptr, *ptr;
    …
    if (num && size && SIZE_MAX / num < size) {
        errno = ENOMEM;
        return (NULL);
    }
    newsize = num * size;
    if ((newptr = realloc(ptr, newsize)) == NULL)
        return (NULL);
    ptr = newptr;

This is a hassle, but it is important to add these safeguards,
especially when doing multiplication, which can quickly overflow.

To lessen developer work and simplify code,
an xrealloc() function was added to ssh.
xrealloc() has a similar API to calloc(3),
taking both a num parameter and a size parameter.
The above example can be written using xrealloc() as:

    size_t num, size;
    char *newptr, *ptr;
    …
    if ((newptr = xrealloc(ptr, num, size)) == NULL)
        return (NULL);
    ptr = newptr;

Due to its elegance, this API was also imported for OpenCVS.

For more information, please read a -current malloc(3) man page,
which was recently updated with better usage examples.

March 11, 2006

I am now an OpenBSD developer.
It’s interesting how in less than four months
I went from being a guy writing his first large (1000+ lines) C program
to an OpenBSD developer,
a status which I highly respect.

Here’s the sequence of events that occurred.

In 2004, I started submitting diffs for various stuff I noticed in man pages.
jmc@ is very responsive about these things, he’s awesome.
But these were mainly minor things (grammar, spelling, et cetera)
and weren’t really what I wanted to do, which was to code.

I knew some C, I’ve had lots of accumulated experience with
various C-like languages (C, C++, Java, PHP, Perl, in that order).
In school I never actually wrote anything big with C,
so while I knew the general syntax, I didn’t know any of the trickier
things (such as pointers, as otto@ can attest) nor did I have
experience with any of the standard libraries.

I knew I needed to learn C to code for OpenBSD,
so I set myself to learn C. I read, cover to cover, K&R.
I submitted simple diffs to tech@, mostly to do code cleanups
(style(9) and -Wall sweeps).
Cleaning old code forced me to understand the code I was cleaning up,
to make sure I didn’t introduce any bugs.
In the process, I found various old bugs lurking.

This was still not enough, though.
I needed to demonstrate that I could code,
not just be a lint(1) brush.
After all, a developer means develops code.
I also needed experience in actual code,
learning the intricacies of C.
I needed to develop a project,
one that could be committed to OpenBSD.
I then thought of writing sdiff(1).

sdiff(1) was originally a part of OpenBSD,
until the GNU diff utilities were replaced by nicer licensed ones.
diff(1), diff3(1), and patch(1) were all replaced but not sdiff(1).
I disliked having to install the GNU diff utilities just to use
mergemaster from ports.
Diff output looked relatively easy to parse,
just some line numbers here and there, lines preceeded by +, -, <, or >.
Certainly doable, with enough effort.

I set out to write sdiff(1).
To make life easier for myself I looked for the simplest diff output to parse.
I initially wrote sdiff to parse diff -n output.
Its output looks similar to ed lines, d# #, a# #.
I had a version that worked well using diff’s -n flag,
but I wanted the code to be portable across different diffs.
This required a major rewrite to parse the default diff output,
which is standardized.

I was very reluctant to do so, but there came a point when I just
didn’t have anything else to do with sdiff, and I knew that I
needed to do this eventually. It would be good experience anyway.
So I rewrote major parts of sdiff and wrote lots of regression
tests along the way, making sure that sdiff still worked with
every change.

Eventually I got so sick of doing sdiff that I didn’t want to
nitpick little things anymore. It seemed to be working fine,
I ran it several times through mergemaster. So I mailed it to
misc@, seeking reviews or comments.

A couple of days later I got some complaints from deraadt@
about my poor coding style. I fixed the points he brought up
and a few days later tedu@ committed my code. I was happy.

Developing sdiff was a very helpful experience.
As I was struggling to master C, I looked through countless
man pages and code examples in /usr/src to see how other
developers implemented the action I wanted to take.
This revealed to me bugs in the manuals,
userland utilities, and even the libraries, many of which
I submitted fixes for.

Eventually I was contacted separately by marco@ and otto@,
both inviting me to different chat rooms to get to know me
better. Several more fixes later, I was offered an account.

If anyone wants to be a developer,
start looking for things to do.
Look through old crusty code nobody cares about,
those generally are poorly maintained and can be cleaned up.
Cleaning up code forces you to understand the code to make sure you don’t
mess it up along the way. Understanding the code reveals bugs.
If you don’t understand something, look in the man pages.
If the man pages are lacking, submit diffs to improve the manuals.
If you need examples on how to use certain functions or system calls,
look through the source tree. They usually provide great examples.
When you see all the different ways functions like fgetln(3)
are used, you start seeing more bugs. That was how I came across
the cgetnext(3) and cap_mkdb(1) bugs. Repeating these steps
just dropped me deeper in the rabbit hole.

There are certain essential skills to have.
You need to learn to use CVS.
All code is kept in CVS, diffs are easy to create (diff -upN),
and you can keep track of your own changes in your projects.
You need to learn to submit good sendbug reports.
It’s good to report bugs, but it’s better to submit a fix.
It’s much easier to look at a fix and verify that it works
than to come up with a fix for a problem that wasn’t encountered
by myself.
Learn to use GDB. Core dumps are suddenly not just a waste of space.
You can use backtraces to figure out where the NULL pointer dereference
was, or the double free(), or buffer overflow. It’s much nicer
than using printf(3) all over the place.

Good luck!