The C Programming Language’s Inadequacy
I know C probably more fluently than any other programming language, so I’m ready to say: C has problems.
If you look at the structure of the language, it’s fine.
Old-school, comes without bells and whistles, but as a bare-bones
programming language it’s fantastic. It doesn’t hide
the metal on which your code runs from you. The language needs only
minimal run-time support to get a program started. Compared to
other languages, it’s simple to compile: first, you need a
macro processor to handle the #define
s and
#ifdef
s. Then for the actual compilation, it’s
got arithmetic, boolean and bit operations, some simple types, a
few more complicated types
(struct
s/union
s/enum
s),
typedef
s, a few loops and conditionals, and function
calls.
There are no built-in commands in C, like reading or writing files. That makes C a very clean, simple language.
Everything else is provided by the library, and that library is where the mess is. The C standard library was written for a different age, when machines were smaller and simpler, with considerably more limited resources. Thus, most of the standard library is simple to the point of brokenness.
gets
is a buffer overflow waiting to happen.fgets
takes a buffer length, but it’s up to you to check for and handle long lines. Fail to do this, and long lines may be treated as multiple, shorter lines.strncat
may not null-terminate your stringscanf
andstrtok
will can split fields at particular character, but both will silently skip empty fields.- Avoiding buffer overflows with
strncat
is on you.
These are a couple of standard annoyances off the top of my head. You’ve probably got some of your own.
Now before you start complaining that I don’t understand
the essence of C, let me say that I don’t mind that
it’s on the programmer to match format strings with
parameters in printf
and friends. No, it’s not
safe; if you do it wrong, you get a core dump or garbage output.
printf
can do great things with a small format string,
but it’s on you to do it right. That’s perfectly okay:
it’s the trade-off to have dense code, and when you get it
right, it’s reliable. Instead, I’m saying that
oftentimes, to get it right—and that means reliable—the
libraries aren’t adequate, and thus requires us to jump
through hurdles in our code. Consider some various attempts to fix
things:
In the BSD world, there are strlcpy
and
strlcat
which offer easier bounds checking and
guarantees of null-termination if the inputs are null terminated,
or better yet strdup
and strndup
which
make freshly-allocated copies. glibc
offers a
different swath of improvement: getline
will read a
line from a file, allocating space as needed; strsep
is a strtok
-like function but treats each delimiter as
significant, and asprintf
makes it easy to assemble
strings into a freshly allocated buffer, no risk of buffer
overflow. Awesome!
But the problem with all of these is that they’re
all extensions. They’re not in the standard, so while they
may be available, there’s no guarantee they’ll be
there. And use sometimes requires a #define
of some
feature macro, even on platforms where they exist.
And in as much as C (the core language) isn’t really meaningful except as C (including the standard library), this makes C suck.
C is supposed to be a least-common-denominator of a language, something you can always rely on being there. But if that’s what you need, then the only facilities the language provides are so primitive as to be useless. You’ll end up reinventing wheels to work around deficiencies in the standard library, because the only way to use existing wheels is to rely on extensions, thus making your code non-portable.
If we want to fix C, then the standard library needs to be
reworked. Things like strsep
, asprintf
,
and strdup
and either glibc’s
getline
or BSD’s fgetln
need to be
incorporated and reliably available, and gets
and
other security-holes-in-waiting should just go away. And we still
won’t be able to use these enhancements for a few years while
we’re waiting for them to become ubiquitous. Eventually,
though, this could fix the current toy that much of C’s
library is.
The primitive library made sense when least-common-denominator hardware meant 128KB, and I respect the original developers for what they could accomplish with so little. It was a 1970s approach for 1970s hardware. But unless you’re developing for an embedded device, that makes no sense anymore. And software for embedded devices that small can avoid bulky functions, if they use the standard library at all.
In a time when the guys writing the programs were in charge of the data that would be fed through it, the existing library was probably fine. Exploitation wasn’t a concern, but making it fit and making it quick were. But today we have to worry about buffer overflows and security holes if our code is deficient, and line-length and buffer-size limitations are arbitrary hassles to be tripped over down the road.
We should eliminate the constraints imposed by a half-century old library, and give ourselves the tools to make the next half-century’s worth of software.
However, there’s no point in doing that in C, because it’s already been done in several other languages. The insistence on keeping C’s library restricted to what AT&T made in the 1970s, and refusing to evolve, means it is now useless and so far behind that it’s not worth fixing.
Most of C’s replacements—be they C++, Rust, Go, C# and many others—make improvements to the languages too. But a big part of what they offer are richer, more full-featured libraries than C. And critically, those libraries don’t come with the haphazard, awkward pitfalls at the core of the C library.
There was a time C could have been saved, and doing so may have cut down on the number of diverging languages we have now. But at this point, it’s too late. For most projects, there are better choices than C.