C is both a great and a shit language
I know C probably more fluently than any other programming language, so I’m ready to say: C has problems.
If you look at the structure of the language, it’s fine.
Old-school, comes without bells and whistles, but as a bare-bones
programming language it’s fantastic. It doesn’t hide
the metal on which your code runs from you. The language needs only
minimal run-time support to get a program started. Compared to
other languages, it’s simple to compile: first, you need a
macro processor to handle the
#ifdefs. Then for the actual compilation, it’s
got arithmetic, boolean and and bit operations, some simple types,
a few more complicated types
typedefs, a few loops and conditionals, and function
There are no built-in commands in C, like reading or writing files. That makes C a very clean, simple language.
Everything else is provided by the library, and that library is where the mess is. The C standard library was written for a different age, when machines were smaller and simpler, with considerably more limited resources. Thus, most of the standard library is simple to the point of brokenness.
getsis a buffer overflow waiting to happen.
fgetstakes a buffer length, but it’s up to you to check for and handle long lines. Fail to do this, and long lines may be treated as multiple, shorter lines.
strncatmay not null-terminate your string
strtokwill can split fields at particular character, but both will silently skip empty fields.
- Avoiding buffer overflows with
strncatis on you.
These are a couple of standard annoyances off the top of my head. You’ve probably got some of your own.
Now before you start complaining that I don’t understand
the essence of C, let me say that I don’t mind that
it’s on the programmer to match format strings with
printf and friends. No, it’s not
safe; if you do it wrong, you get a core dump or garbage output.
printf can do great things with a small format string,
but it’s on you to do it right. That’s perfectly okay:
it’s the trade-off to have dense code, and when you get it
right, it’s reliable. Instead, I’m saying that
oftentimes, to get it right—and that means reliable—the
libraries aren’t adequate, and thus requires us to jump
through hurdles in our code. Consider some various attempts to fix
In the BSD world, there are
strlcat which offer easier bounds checking and
guarantees of null-termination if the inputs are null terminated,
or better yet
make freshly-allocated copies.
glibc offers a
different swath of improvement:
getline will read a
line from a file, allocating space as needed;
strtok-like function but treats each delimiter as
asprintf makes it easy to assemble
strings into a freshly allocated buffer, no risk of buffer
But the problem with all of these is that they’re
all extensions. They’re not in the standard, so while they
may be available, there’s no guarantee they’ll be
there. And use sometimes requires a
#define of some
feature macro, even on platforms where they exist.
And in as much as C (the core language) isn’t really meaningful except as C (including the standard library), this makes C suck.
C is supposed to be a least-common-denominator of a language, something you can always rely on being there. But if that’s what you need, then the only facilities the language provides are so primitive as to be useless. You’ll end up reinventing wheels to work around deficiencies in the standard library, because the only way to use existing wheels is to rely on extensions, thus making your code non-portable.
If we want to fix C, then the standard library needs to be
reworked. Things like
strdup and either glibc’s
getline or BSD’s
fgetln need to be
incorporated and reliably available, and
other security-holes-in-waiting should just go away. And we still
won’t be able to use these enhancements for a few years while
we’re waiting for them to become ubiquitous. Eventually,
though, this could fix the current toy that much of C’s
The primitive library made sense when least-common-denominator hardware meant 128KB, and I respect the original developers for what they could accomplish with so little. It was a 1970s approach for 1970s hardware. But unless you’re developing for an embedded device, that makes no sense anymore. And software for embedded devices that small can avoid bulky functions, if they use the standard library at all.
In a time when the guys writing the programs were in charge of the data that would be fed through it, the existing library was probably fine. Exploitation wasn’t a concern, but making it fit and making it quick were. But today we have to worry about buffer overflows and security holes if our code is deficient, and line-length and buffer-size limitations are arbitrary hassles to be tripped over down the road.
We should eliminate the constraints imposed by a half-century old library, and give ourselves the tools to make the next half-century’s worth of software.