Devious Fish
Music daemons & more

C is both a great and a shit language

I know C probably more fluently than any other programming language, so I’m ready to say: C has problems.

If you look at the structure of the language, it’s fine. Old-school, comes without bells and whistles, but as a bare-bones programming language it’s fantastic. It doesn’t hide the metal on which your code runs from you. The language needs only minimal run-time support to get a program started. Compared to other languages, it’s simple to compile: first, you need a macro processor to handle the #defines and #ifdefs. Then for the actual compilation, it’s got arithmetic, boolean and and bit operations, some simple types, a few more complicated types (structs/unions/enums), typedefs, a few loops and conditionals, and function calls.

There are no built-in commands in C, like reading or writing files. That makes C a very clean, simple language.

Everything else is provided by the library, and that library is where the mess is. The C standard library was written for a different age, when machines were smaller and simpler, with considerably more limited resources. Thus, most of the standard library is simple to the point of brokenness.

  • gets is a buffer overflow waiting to happen.
  • fgets takes a buffer length, but it’s up to you to check for and handle long lines. Fail to do this, and long lines may be treated as multiple, shorter lines.
  • strncat may not null-terminate your string
  • scanf and strtok will can split fields at particular character, but both will silently skip empty fields.
  • Avoiding buffer overflows with strncat is on you.

These are a couple of standard annoyances off the top of my head. You’ve probably got some of your own.

Now before you start complaining that I don’t understand the essence of C, let me say that I don’t mind that it’s on the programmer to match format strings with parameters in printf and friends. No, it’s not safe; if you do it wrong, you get a core dump or garbage output. printf can do great things with a small format string, but it’s on you to do it right. That’s perfectly okay: it’s the trade-off to have dense code, and when you get it right, it’s reliable. Instead, I’m saying that oftentimes, to get it right—and that means reliable—the libraries aren’t adequate, and thus requires us to jump through hurdles in our code. Consider some various attempts to fix things:

In the BSD world, there are strlcpy and strlcat which offer easier bounds checking and guarantees of null-termination if the inputs are null terminated, or better yet strdup and strndup which make freshly-allocated copies. glibc offers a different swath of improvement: getline will read a line from a file, allocating space as needed; strsep is a strtok-like function but treats each delimiter as significant, and asprintf makes it easy to assemble strings into a freshly allocated buffer, no risk of buffer overflow. Awesome!

But the problem with all of these is that they’re all extensions. They’re not in the standard, so while they may be available, there’s no guarantee they’ll be there. And use sometimes requires a #define of some feature macro, even on platforms where they exist.

And in as much as C (the core language) isn’t really meaningful except as C (including the standard library), this makes C suck.

C is supposed to be a least-common-denominator of a language, something you can always rely on being there. But if that’s what you need, then the only facilities the language provides are so primitive as to be useless. You’ll end up reinventing wheels to work around deficiencies in the standard library, because the only way to use existing wheels is to rely on extensions, thus making your code non-portable.

If we want to fix C, then the standard library needs to be reworked. Things like strsep, asprintf, and strdup and either glibc’s getline or BSD’s fgetln need to be incorporated and reliably available, and gets and other security-holes-in-waiting should just go away. And we still won’t be able to use these enhancements for a few years while we’re waiting for them to become ubiquitous. Eventually, though, this could fix the current toy that much of C’s library is.

The primitive library made sense when least-common-denominator hardware meant 128KB, and I respect the original developers for what they could accomplish with so little. It was a 1970s approach for 1970s hardware. But unless you’re developing for an embedded device, that makes no sense anymore. And software for embedded devices that small can avoid bulky functions, if they use the standard library at all.

In a time when the guys writing the programs were in charge of the data that would be fed through it, the existing library was probably fine. Exploitation wasn’t a concern, but making it fit and making it quick were. But today we have to worry about buffer overflows and security holes if our code is deficient, and line-length and buffer-size limitations are arbitrary hassles to be tripped over down the road.

We should eliminate the constraints imposed by a half-century old library, and give ourselves the tools to make the next half-century’s worth of software.