Devious Fish
Music daemons & more

Thoughts on Programming Languages

C

Devised in 1972, C is one of the oldest and probably the most widely available programming language in the world today. It’s a simple language in terms of writing a compiler: it’s got operators, conditionals, loops, and types. Beyond that everything is provided by the libraries.

Dynamic memory, I/O, string handling, and everything else is provided by libraries; the C compiler really has no concept of these. Variable function arguments allow dense, powerful functions like printf, at the expense of type safety.

C is an awesome old language. Often, you can see the metal below when coding in it. That’s great for doing low-level coding.

But C’s simplicity is a double-edge sword. On the upside, interfacing libraries written in C to other languages to reuse is straightforward. On the downside, the problems we are solving with software become ever more complex, and trying to use a simple language requires contortions to make it work. (If you want to see this craziness in person, take a look at GObject or the glib stuff built on it: object oriented code in a non-object language.)

In short: The world is outgrowing C. Unless there is a damn good reason to use it, other choices are better.

C with Classes: C++

C++ started as “C with classes,” and has grown into its own. Despite small refinements, C has been stagnant for decades. C++ is where the language continues to develop.

C++ adds object-oriented programming and generic programming paradigms, allowing more output with less code. Both add benefits of type safety when done correctly.

Namespaces help organize code and prevent naming collisions; modules can be written with shorter forms of names, instead of having to write it with prefixes appended to every function name to avoid naming collisions. It’s a small thing, but it matters.

C++ has a built-in concept of dynamic memory and an object life-cycle. With RAII (resource allocation is initialization), the C problem of “uninitialized memory” goes away. When it’s allocated, it gets initialized; when it’s deallocated, it gets cleaned up before disposal. Those rules apply to both the stack (variables going into/out of scope) or the heap (dynamic allocations).

And there’s the STL (standard template library). It provides containers, iterators, and algorithms. While C++ doesn’t provide a garbage collector, the STL provides pointer containers which (because of RAII) clean up their pointed-to objects when they go out of scope.

A downside is that C++ syntax is ugly. C++11 fixes a lot of the worst problems, including my personal favorite, adding initializers on class member variables.

But it’s got a well-defined standard and rich libraries, which (if you stick to them) offer a decent chance of portable and efficient code.

C# (C Sharp)

I have limited experience with C#, but I like what I’ve seen. The syntax is cleaner than C++, taking the best ideas from C/C++ and less popular languages like Pascal, Delhi and scripting languages to make an overall nice, modern language.

It’s very legible, though I haven’t used it at great depth. I certainly wouldn’t mind using it more.

Objective C

Objective C is a language whose death is long overdue. Taking a different tack from “C with classes”, Objective C crufts some Smalltalk additions to the C compiler, which then bolts a dispatcher to the executable by adding an extra library during linking.

The result is a very different language than C++. The C and Smalltalk syntaxes are entirely different, and in case that’s not good enough, the Smalltalk is [set_off in: brackets with: named_parameters] as opposed to C’s (positional, parameters). But more deeply, C++ function calls (including object member functions) are still bound at compile time, like everything in C. But in Objective C, this isn’t true: when you invoke an object’s “protocol”, it calls a dynamic dispatcher, a fancy name for some high-speed, compile-time-assisted interpretation.

This means Objective C’s compiler isn’t as rigorous in finding problems. Apple made great strides trying to get it to, since their GUI is built with Objective C, but in the end, there’s still runtime interpretation.

As a result, Objective C poses interpreted-style troubles. Even though it’s compiled, you might get a runtime error for a missing function. Or not, if you check at run-time that it exists before dispatching to it. Objective C was an interesting idea, but it’s been taken way to far.

And just to make the brain hurt, there’s Objective C++. Objective C and C++ have sufficiently different syntax and behaviors, you can use both sorts of object orientation in one program. Together, in the same bit of code. And somehow, it works, perilous as it feels.

Objective C’s main proponent, Apple, has recognized the limitations and has devised Swift to replace Objective C going forward. Swift can call Objective C code and vice-versa, although the syntaxes are quite different. Swift is coherent; it doesn’t look like two languages mashed together. It’s got generics and other modern concepts, which make it more type-safe than Objective C.

But I suspect Objective C will remain around for decades, both due to legacy code and as a sort of intermediary when using C, C++, and other libraries with Swift.

Java & Groovy

I haven’t used Java enough to comment deeply on it. I remember the enumeration types (which didn’t make it into the language for several years) were quite nicely done when they finally arrived.

But I did experiment with Groovy, which was derivative of Java and compiles to Java byte code. It can be linked with standard Java. Groovy adds a bunch of interesting ideas including closures and operators such as safe-navigation, regular expression, spread, elvis.

But it also seems to have a war on punctuation: Parenthesis can be omitted from function calls, and statement terminators can be omitted at end-of-line. The Groovy people talk all about “syntactical sugar,” and how annoying all this punctuation is.

But there’s a joke in English, a joke which even had a book named after it:

A Panda walks into a restaurant, sits down at a table. The waiter takes his order and brings his food. The Panda eats up, then pulls out a gun, fires twice into the air, and walks out without paying.

“What the heck was that about?” the waiter asks.

“It was a Panda bear. Look it up in the dictionary,” his manager directs.

He does so, where he finds: “Panda: a bear, from Asia, which eats, shoots and leaves.”

Punctuation matters. Groovy’s disdain of punctuation leads to ambiguity. In the parenthesis case, the compiler objects when when meaning is unclear, requiring parenthesis be restored. And although obscure, it is possible for code to do different things when split onto two lines, versus put on a single line.

Instead of doing away with punctuation and calling it “syntactical sugar”, we should instead appreciate the semantic clarity punctuation yields.

Object Pascal/Delphi

I cut my object oriented teeth on Object Pascal, but it’s been ages since I used it. The object lifecycle, at least back in those days, was very simple; there was no RAII. It was up to you to call constructors after allocating memory.

I enjoyed the languages. They were perhaps a little more verbose than C, but objects are wonderful things. But both have gone by the wayside over time.

The Shells: sh, ash, dash, ksh, bash, zsh

The shell is so awesome, we keep inventing it over and over with little subtleties.

The original shell wasn’t very powerful. Mainly, it accepted a command name, looked through the $PATH to find it, and executed it with arguments. Optionally, the output could be redirected, piped, or captured into a variable. The exit status could be checked, and based on that some conditions and looping constructs were provided.

With just that bit of glue, one could do amazing things, building on top of the other utilities provided with UNIX: sed, awk, cut, grep, echo and more.

But executing external programs has overhead. And since pipeline components become separate subprocesses, setting variables in pipelines doesn’t work as expected. Over time, features were added, and more powerful clones were written that added even more features and address these shortcomings. Unfortunately, each clone went in a different direction in their subtleties. As did the numerous different implementations of sed, awk, cut, grep, echo, regular expressions and more.

Consequently, writing a portable non-trivial shell script is troublesome: you have to obey the most stringent limitations of the worst shell, or assume some specific clone will be available to run the script correctly.

What makes the shell an enticing option is that some variation of it will always be there on Unix and lookalikes.

The C shell

To make shells even more interesting, there is great divide in the shells: the original, and the C shell. C shell (along with its clones and derivatives) attempted to make a more interactive, C-like shell. It’s completely incompatible. See also: csh programming considered hazardous.

Perl

Perl came on the scene in the late 80’s and made quick inroads, probably due to the shell’s limitations. If you needed something more capable or faster than a shell script, a Perl program could be written faster and easier than a corresponding C one.

Perl is even rich compared to C: it provides hashes and regular expressions right out-of-the-box.

Perl has shortcomings, though. Super-flexible syntax makes it difficult to read, especially the postfix conditional clauses (print “hello world” if $true; instead of if $true print “hello world”). Parenthesis are often optional. There are a crap-ton of magic variables, like $@, $!, $% and dozens more. When referencing variables, a type-indicator is prefixed: $variable, @array, %hash. If you reference a @hash, you get the hash contents in an array, the keys in even elements, values in odd. And you can make something into a reference with a backslash, but it gets hairy there. If you don’t provide parameters, Perl uses defaults, often some of the magic variables. Perl wasn’t designed with object-oriented code in mind.

The upshot is that a spurt of line noise might be a valid Perl program. An undisciplined programmer (read: sysadmins) write indecipherable Perl. Even at the best of times, it’s difficult.

There is a reason why Perl went by the wayside after a sane alternative arrived. Perl’s remaining devotees tend to be older sysadmins for whom it has seemed an adequate hammer to avoid learning another language.

Python

Python came about in the early 90s, but took a while to catch on. Perhaps everyone was enamored with Perl already, so that got the attention and a fan base that grew its support, slowing Python’s acceptance. But over time, Python is supplanting Perl. It seemed to hit my circle in the mid- to late–90s, and the common reaction was, “It’s like Perl, only legible.”

Python’s syntax is more rigid than Perl, making it easier to understand. It’s also more object-friendly than Perl. A number of unusual language decisions make Python more sane and accessible than other scripting languages:

  • Local by default: unless you explicitly mark a variable as global, it’s assumed to be local. This prevents undeclared variables “leaking” into a global namespace and conflicting with each other.
  • Significant whitespace: indenting signifies code-grouping, replacing curly-brackets in C and other languages. Despite a pitfall of mixing multi-space indents and tabs in a single source file, I think this forces neophytes and non-programmers to indent correctly, and makes it easier to understand than braces.
  • Parameter count checking: function calls must provide the same number of parameters as function definition. This catches errors earlier-on, and helps catch problems when making changes to a function’s signature.

Python is presently my favorite choice for scripting.

JavaScript

JavaScript came about in the late 1990s to support small-scale scripting needs for web pages. But in the intervening two decades, page complexity has grown significantly.

Unfortunately, in the browser war rush, JavaScript hit the field with some bad design decisions. Undeclared local variables end up as globals. Parameters are completely unchecked. Although it’s object-oriented, this (the current object) is context-sensitive and doesn’t always work as expected. Statement terminators are optional, creating ambiguity similar to that of Groovy.

As the language supported by web browsers for client scripting, JavaScript’s continued existence is ensured in a way Perl’s is not. And for compatibility with existing scripts, its flaws’ existence are ensured too. To work around this, additions have been made such as variable declaration with let as an alternative to var; variables declared with let have different behavior which corrects several issues. And modules can be marked as strict, which improves things.

Despite the improvements so far, JavaScript still doesn’t scale well. Perhaps in time, the situation will improve. But the need to support the oldest browsers in use means it may be years before it’s safe to rely on the latest JavaScript syntaxes.

TypeScript

Recognizing JavaScript’s scalability issues, TypeScript came about. It’s a superset of JavaScript that adds typed variables and parameters, and more. It compiles to standard JavaScript, the compiler validating the sanity of the code along the way.

This seems a sane approach: it extends the existing JavaScript syntax, rather than starting anew. Existing JavaScript frameworks can be reused, rather than requiring reinvention. This seems like a better solution than devising an entire new language, such as Google has done with Dart.

As an idea, TypeScript it fantastic. But in practice, there are some major battles to get it to work.

Perhaps my issue was taking on the language right after the release of the TypeScript 2.0. The documentation on the website still referred to earlier versions, as did most of the answers on StackOverflow. I could pre-order a book, due out in 5 months, on this new language version—but in the mean time, I’m on my own.

This is not a problem unique to TypeScript: Python 3ʹs print suddenly required parenthesis, which were previously optional. But TypeScript changed the way modules are imported. It was originally done with comments, which were replaced with import, neither of which is JavaScript’s require, which is really node.js’s require. Transpiled code is meant for node.js (server-side) use; to use it on the client side requires feeding it through Browserify, which documents in incredible detail how it could be used in different ways with other components, yet says remarkably little on how to use in the most straightforward of manners. Coming into this cold, with the documentation out of date, posed a lot of trouble.

In time, documentation and answers will get updated. But these sort of problems shouldn’t occur if those promoting a language want it to become mainstream.

Summing Up

These are some of the languages I’ve used, but there are plenty of others and those that I haven’t encountered: Go, D, Swift, SQL, R, Fortran, Visual Basic, Simula, Clojure, Erlang, Lisp, Haskell, PHP, Ruby, CoffeeScript to name a few. And more seem to pop up regularly.

I raise the question: Why do we need all these languages?

I don’t deny that some of the older languages are out-of-date, and that new ones accomplish the same goal with less code, with better diagnostics that solve troubles more quickly.

And there are definitely languages for specific needs: HTML, CSS, and SQL.

Still, do we really need 80 languages? This seems insane.

The trouble comes about because:

  • Old languages, like C, have the advantage of longevity. They are well-specified and understood. Mucking with them begs trouble.

  • A few languages, like C++, are regulated by standards committees so they evolve clearly but slowly. New features are tested and researched before being added to the language specification, with thought given to compatibility. But the cost is having to wait for the latest whiz-bang ideas.

  • The newest languages tend to embrace the latest nifty ideas and features, at the expense of syntax being in flux because they aren’t nailed down yet.

  • The rest seem to fall in somewhere between “new” and “standard”: with an established code base, care must be given to compatibility. But they do evolve, and occasionally they introduce breaking changes in the name of fixing problems.

Sometimes, it seems like the standards bodies are too slow: if C++ evolved faster, and kept up with new features, fewer competing languages might appear.

But then we need to think about Perl and Objective C. Perl evolved so fast, adding great features and syntax varieties, it turned into a mess—and in doing so, invited itself to be replaced by Python. Objective C started as two languages crufted together, and although it subsequently evolved much slower, it was ugly all the way and starting over was eventually a better choice over more syntax retrofits.

And incompatibilities between versions prove trouble for existing code base: Python’s print change, the variations between the different shells, and TypeScript’s import changes show this to anyone who has encountered them.

Perhaps some of the faster-evolving languages act as test-beds for new ideas, prior to their consideration in the slower-evolving languages. And although Groovy’s elvis and safe-navigation operators are great, would they be helpful in C++? Well, elvis does seem handy. But on reflection, multi-level structures are different in Java/Groovy vs. C++: C++ produces an aggregate data structure in a single block of memory, rather than connecting it from pieces via references. In C++, the safe-navigation operator would be mostly superfluous.

My current favorite language is C++: I can still use C code, interface with Objective C when necessary, it supports multiple paradigms, and it produces efficient executables.

But I don’t confine myself to C++, and I acknowledge that other languages have advantages and disadvantages that make them better or worse for different sorts uses. Among other things, it’s a beast and not the ideal thing for beginners.

Still, I find it crazy that there are 80 languages in common enough use that they are tracked on the Tiobe index. We programmers need to be more judicial about reinventing the wheel. For sanity, I will pick and choose among the languages in the top 20, taking into account utility, suitability, and longevity.