Devious Fish
Music daemons & more
profile for Perette at Stack Overflow, Q&A for professional and enthusiast programmers

XML vs JSON

XML and JSON are two different beasts, meant for two different things. But, back in the Bad Old Early Days of HTML and JavaScript, they needed something to represent data exchanged between servers and clients. And since client-server transactions were already happening in XML (pre-5 versions of HTML being XML compliant), XML seemed a natural choice.

XML has advantages. If you’re annotating data, it’s the better choice:

<person>Perette Barella</person> lives in
<address>Rochester, New York</address>.

In this case, data is the whole document, and XML annotates bits of document. Bits of specific types of data are interspersed with more general data. How would we represent this in JSON? Not easily or sensibly.

But XML is a markup language. It doesn’t neatly represent the structure of data. Take, for example:

struct Person {
        struct Name {
                family: string;
                personal: string;
        }
        address: string; // oversimplifying
        phone_numbers: list of strings;
}

JSON represents this clearly and concisely:

{
        "Person": {
                "family": "Barella",
                "personal": "Perette"
        },
        "address": "Rochester, NY",
        "phone_numbers": [
                "585-317-3013",
                "585-555-121",
                "585-555-212"
        ]
}

How would we represent this in XML? Well, we’ve got choices:

<person>
        <name>
                <family>Barella</family>
                <personal>Perette</personal>
        </name>
        <address>Rochester, NY</address>
        <phone_numbers>585-317-3013</phone_numbers>
        <phone_numbers>585-555-121</phone_numbers>
        <phone_numbers>585-555-212</phone_numbers>
</person>

See the problem? Instead of making a list into which the phone numbers are grouped, they are placed where the list belongs. We can’t use an efficient hash/map/dictionary to store and find elements now, since there are duplicates. If we want the phone numbers, we have to look through <person> and check each element to see if it’s the right type.

To keep the phone numbers grouped, we’d need a new element to identify the individual numbers within their group:

<person>
        <name>
                <family>Barella</family>
                <personal>Perette</personal>
        </name>
        <address>Rochester, NY</address>
        <phone_numbers>
                <phone>585-317-3013</phone>
                <phone>585-555-121</phone>
                <phone>585-555-212</phone>
        </phone_numbers>
</person>

But this isn’t a great solution as it necessitates introduction of the <phone> element, diverging from the naming of the original structure elements.

So let’s add some elements that need attributes, height and age.

<person>
        <name>
                <family>Barella</family>
                <personal>Perette</personal>
        </name>
        <height units="cm">178</height>
        <age units="years">49</height>
        <address>Rochester, NY</address>
        <phone_numbers>585-317-3013</phone_numbers>
        <phone_numbers>585-555-121</phone_numbers>
        <phone_numbers>585-555-212</phone_numbers>
</person>

In a certain organizing sense, this is nice: the units are an attribute of the element containing the data. When data is freeform, this approach seems practical, maybe necessary: where else will attributes go?

But now we’ve got a data structure/representation issue again. There’s one element, but storing it requires 2 fields: the value and the units.