Parsnip
parsing library

Parsnip is a parsing library written in C++ that provides:

  • A generic type, Data, that can contain integers, floating point numbers, strings, booleans, dictionaries (keyed by strings), and lists.
  • Parsing of JSON into Data, and serialization of Data to JSON.
  • Pattern-based Command line parsing.
  • JSON Schema construction from command-line parsers.

Parsnip::Data type

Data allows for storing data in runtime-defined types/structures in the style of non-compiled languages. Despite the flexibility, this is a disadvantage since dynamic typing lacks the benefits of compilation-time error detection. Nevertheless, JSON has become a popular interchange and protocol language, to which the Data structure maps closely. Data is thus useful as an intermediary when extracting data from files/messages into native structures, or preparing data for storage or transmission via JSON.

Data objects behave in a C++ manner, including copy and move construction and assignment, and deep comparison. Loading a file is easy and accessing the data straightforward. Flexibility is provided where it makes sense; beyond that, data is expected to be the expected types, with exceptions thrown when inconsistencies are encountered.

Performance

Parsnip enables rapid development via a straightforward, familiar-feeling interface that's usually easy to work with. It offers reasonable performance and average situations well. Faced with massive files, or if concerns of runtime performance outweigh developer performance, then consider other JSON libraries.

If performance is a concern, there are two issues to beware: copy construction, and the make_dictionary function.

Use moves instead of copies

If you're assembling a list of dictionaries, for example, you might do something like:

Parsnip::Data response = Parsnip::make_list():
for (const auto &record : matches) {
     Parsnip::Data rec = Parsnip::Data::make_dictionary (
        {"id", record.id},
        {"name", record.name},
        ...
     );
     rec ["ytd_income"] = some_calculation();
     response.push_back (rec); // Copy construction
}
transmit (response);

The push_back will copy rec when it adds it to response. The more complex/deep rec gets, the more the performance impact. Changing the offending line to

 response.push_back (std::move (rec)); // Move construction

will avoid the copy. So before you ship, set a breakpoint on the copy constructor and make sure it never gets called.

Avoid <tt>make_dictionary</tt>

The make_dictionary function (see the above example) uses a std::initializer_list of std::paid<const char *, const Data>, thus implicitly relies on copy construction. In the above example, make_dictionary could be replaced with:

 Parsnip::Data rec { Parsnip::Data::Dictionary,
"id", record.id,
"name", record.name,
...
 };

The downside is the loss of code-clarity offered by the braced-pairs of make_dictionary.h, made worse by clang-format and astyle reformatting away neatly-arranged pairings.

Construction

Parsnip::Data can be constructed in code:

Parsnip::Data(optional initial value)
Parsnip::make_dictionary (optional initializer list)
Parsnip::make_list (optional initializer list)

Or Parsnip can construct them from JSON:

Parsnip::parse_json (std::string)
Parsnip::parse_json (std::istream, bool check_termination)

These parse JSON from either a string or from a stream. For strings, the entire string must be utilized or DataFormatError is thrown. For streams, this is only true if check_termination is true.

After construction, manipulation is straightforward:

Parsnip::Data myvalue;
myvalue = 555; // Parsnip::Data myvalue {555}; would have worked too.
std::cout << "myvalue=" << myvalue.asInteger() << std::endl; // "myvalue=555"
std::cout << "myvalue=" << myvalue.asReal() << std::endl; // also "myvalue=555"
std::cout << "myvalue=" << myvalue.asString() << std::endl; // throws IncorrectDataType.

Alternately, there are template versions of the accessor functions. Templated numeric accessors perform applicable range validation:

std::cout << "myvalue=" << myvalue.as<int>() << std::endl; // "myvalue=555"
std::cout << "myvalue=" << myvalue.as<char>() << std::endl; // throws DataRangeError

List manipulation

List members can be accessed via indexing or iterator. They can be updated via indexed assignment; new elements can be added with push_back.

For lists, both the list and the list members are Data:

Parsnip::Data mylist = Parsnip::make_list (3.14, 4, 7, 9};
mylist [1] = 5; // Implicitly invokes Data constructor on 5, then does assignment.
std::cout << mylist [0].asDouble() << std::endl; // "3.14"
std::cout << mylist [0].asInteger() << std::endl; // throws IncorrectDataType
std::cout << mylist [1].asDouble() << std::endl; // "5"
auto vector_of_float = mylist.as <std::vector <float>>(); // Succeeds
auto vector_of_int = mylist.as <std::vector <int>>(); // Throws IncorrectDataType because of 3.14
mylist.push_back ("eleven"); // Implicity invokes Data constructor and adds to list.

The list can be iterated in the usual ways:

for (const auto &element : mylist) {
    std::cout << element.asDouble() << std::endl;
}

Given the list previously created, this would output: "3.14 <newline> 5 <newline> 7 <newline> 9 <newline>", then throw upon encountering the string "eleven". This demonstrates the limited utility of iterating over lists of mixed datatypes.

An alternate implementation of the loop is to use foreach, which exhibits the same behavior but involves lambda or functor awkwardness:

auto lambda = std::function<void(const double)> = [] (const double element)->void {
    std::cout << element.asDouble() << std::endl;
}
mylist.foreach (lambda);

Dictionary manipulation

For dictionaries, the dictionary and the dictionary values are Data. Keys, however, are always strings (as they are in Javascript).

const Parsnip::Data mydict = Parsnip::make_dictionary (
        {"pi", 3.1415926535897932385},
        {"e", "2.71828182844}
};
std::cout << "e=" << mydict ["e"].as<double>() << std::endl; // "e=2.71828182844"
std::cout << "pi= << mydict ["pi"].as<float>() << std::endl; // "pi=3.14159"
std::cout << "i=" << mydict ["i"].asInteger() << std::endl; // throws NoSuchKey

Note: If mydict was *not const, would throw IncorrectDataType after implicitlyconstructing construct member "i" with a null value.*

The getOr method retrieves a dictionary value, or returns a default value if the key does not exist.

std::cout << "pi=" << mydict.getOr ("pi", 3.14) << std::endl; // "pi=3.14159265358979324"
// Compiler interprets floating points as double, so value from dictionary treated as such
std::cout << "avagadro=" << mydict.getOr ("avagadro", 6.0221409e+23) << std::endl;
// "avagadro=6.0221409e+23"; number comes from the provided default.
Parsnip::Data imaginaries = mydict.getOr ("imaginary", Parsnip::make_dictionary());
// assigns an empty dictionary to `imaginaries` since the element isn't present

Iterating over dictionary values requires using a lambda or functor:

auto lambda = std::function<void(const double)> = [] (const std::string &key,
                                                      const double element)->void {
    std::cout << key << "=" << element.asDouble() << std::endl;
}
mylist.foreach (lambda);

This example outputs "pi=3.1415925635897932384" and "e=2.71828182844", but ordering is unpredictable.

Reading and Writing JSON

Reading a JSON file into Data is straightforward:

Parsnip::Data load_json_file (const char *filename) {
    std::ifstream input (filename);
    return Parsnip::parse_json (input);
}

Writing is similarly easy:

void write_json_file (const char *filename, const Parsnip::Data &data) {
    std::ofstream output (filename);
    data.toJson (output);
}

For logging, there's a helpful dump function:

foobar.dumpJson ("foobar", std::cerr);

This prefaces the JSON representation with "foobar: ", and indents the rendered JSON for easier reading.

JSON Schemas

A schema is defined in JSON. Once you have written the definition:

#include <parsnip.h>
std::ifstream insch ("myschema.json");
Parsnip::Data def = Parsnip::parse_json (insch);
Parsnip::Schema schema (def);

To validate a Parsnip::Data object against a schema:

std::ifstream indata ("mydata.json");
Parsnip::Data my_data = Parsnip::parse_json (indata);
schema.validate (my_data);

Schemas are defined at JSON Schema. In Parsnip's implementation of schemas:

  • boolean and null are fully implemented
  • number is fully implemented, including ranges.
  • integer is fully implemented, including ranges and multiples.
  • string supports mixLength and maxLength. String schemas disallow control characters (newline, return, backspace, etc.). pattern (regular expressions) is supported (and lifts the no-control-characters prohibition). format implements the more common formats using regexps.
  • array supports minItems and maxItems. items must be a single type; tuples are not supported. (If you want a tuple, use a dictionary where things get names!).
  • object has mixed support:
    • minProperties, maxProperties, and required are fully supported.
    • dependencies can only specify other existing properties; they can not create new ones, and will throw if they try.
    • additionalProperties can only be true or false; it can't specify a type (which would be stupid anyway: if you don't know what it is yet, how do you know what type it will be? This is a great way to make trouble for your future self) and will throw if you try.
    • patternProperties are unsupported, but ignored if present.

Additionally:

  • A proprietary any type accepts anything without checks.
  • A proprietary attribute nullable can be set to true to allow the checked item to also accept null. Default is false.
  • A proprietary attribute ignoreCase exists for enumerations, const, and regular expressions.
  • enum and const only apply to strings.
  • Annotations and $comment are allowed, and ignored.
  • The combining schemas allOf, anyOf, and oneOf are supported.
  • The logic schema not is also supported.
  • The conditional schemas if, then and else are not implemented.
  • JSON Pointers and $ref are not implemented.

Command line parsing

Parsing and validating command lines is a hassle. Parsnip's command line parser accepts patterns against which command lines are later matched and validated. The ID of the matched pattern is returned, or an exception is thrown if no pattern matched.

Command patterns can contain fields marked for extraction. Values from these "blanks" are inserted into a Data object. Extraction fields may accept any value, a required value from a list, or an optional value from a list. Some patterns might be:

  • login as {username} with password {password}
  • kick user {username} [{message...}] // "message" is optional
  • shutdown <manner:immediate|graceful> // "immediate" or "graceful" is required
  • stop [now:now] // "now" is optional

When parsed, extracted values are put into a Parsnip::Data dictionary keyed by the names specified in the patterns. The above 4 examples could result in:

  • { "username": "perette", "password": "i0remember" }
  • { "username": "annoyingguy", "message": "Stop being annoying" }, or alternately, { "username": "annoyingguy" }, since message is marked as optional.
  • { "manner": "graceful" }
  • {} or {"now": "now"}

Command patterns can include references to a previously-defined option parser. Common uses of an option parser are:

  • Define common patterns for reuse.
  • Accept a series of optional, unordered name-value pairs

Parsnip also provides Interpreter and Dispatcher templates. Interpreter is an abstract class; classes that will be accepting input for processing can subclass and implement this.

The dispatcher is initialized with a reference to a parser. It must also be taught mappings from command IDs to handlers. Subsequently, it may be called with a command line and context object; the command line is parsed, and the appropriate handler/interpreter invoked.

Configuring the parser

To use the parser, create a parser instance, then load statement definitions into it. Each statement has an ID number, which may or may not be unique depending on your needs. The parser constructs a parse tree from the statements; if statements are invalid or conflict, the parser throws an exception during its configuration.

There are 3 types of parsers:

  • Parsnip::Parser - a plain parser that does not allow use of OptionParsers.
  • Parsnip::AggregateParser - an enhanced parser that allows use of OptionParsers.
  • Parsnip::OptionParsers - a parser that contains patterns somewhat akin to subroutines.

A parser definition might look like this, though usually much longer:

typedef enum my_commands_t {
    NOP, HELP, STATUS, HISTORY, CREATEUSER,
    ADDTOGROUP, DELETEUSER
} Command_t;
static Parsnip::Parser::Definitions statements = {
    { NOP,        "" },
    { NOP,        "# [{comment...}]" },
    { HELP,       "help [{command}] ..." },
    { HELP,       "? [{command}] ..." },
    { STATUS,     "status" },
    { HISTORY,    "history [{#index:1-9999}]" },
    { CREATEUSER, "create <guest|user|admin> {user} {passwd}" },
    { ADDTOGROUP, "add user {user} to [group] {group}" },
    { DELETEUSER, "delete user {user} ... }" }
};

Statement formats can be composed of:

  • keyword matches on that word in that position (case is ignored).
  • {name} accepts any value in that position, extracting the value into name
  • <one|two> accepts any of the listed words in that position
  • <name:one,two> alternation, extracting the into the key name
  • [optional] accepts an optional keyword, which is automatically named.
  • [three|four] optionally accepts any single keyword
  • [name:three|four] names an optional single keyword which is extracted
  • [{optional-value}] accepts an optional value, only as the final word
  • ... allows 0 or more additional parameters

Values may have a type and range as follows:

  • {string}
  • {#numeric:3-5} – accepts a decimal integer in the range
  • {#numeric:3.0-5.0} – Accepts a decimal value in the range
  • {#numeric:0x3-0x5} – Accepts octal, decimal, or hexadecimal integer in the range
  • {name:optionParser} – Accepts the option parser's sequence

Values may be optional only if they are at the end of the command line, where they may be follows by "..." to indicate repetition:

  • {name} – accept exactly one value.
  • [{name}] – Accepts 0 or 1 value.
  • {name} ... – Accepts 1 or more values.
  • [{name}] ... – Accepts 0 or more values.

Option Parsers

Option parsers may be used inline or in trailing form. Inline option parsers accept exactly one option sequence. Used in trailing form, they accept 0 or more sequences, in the same manner as for values (above).

  • inline {target:targetSpecifier} <action:abort|crash|segfault> – match 1 targetSpecifier
  • trailer <type:movie|show> [{options:trailerOptions}] ... – match 0 or more trailerOptions

It may be useful to allow other patterns alongside an option sequence:

  • example <type:math|science>
  • example {grade:gradeSpecifier} <type:math|science>

This is allowed, if all gradeSpecifier patterns begin with a keyword, and that the keywords don't collide with other patterns. This is accommodated automatically if the other sequences are defined first. If the option sequence is optional or the pattern using it must be defined first, wrap it in greater-than/less-than to avoid a runtime exeption while processing parser definitions:

  • example [{grade:<gradeSpecifier>}] <type:math|science>

Suppose gradeSpecifier was defined as:

  • <year:k|kindergarten>
  • grade <year:k|kindergarten|freshman|sophomore|junior|senior>
  • grade {#year:1-12}
  • <category:undergrad|undergraduate|graduate> {#year:1-8}

Values extracted by the option parser are put into an inner dictionary named by the parser utilizing it:

  • example k math – { "type": "math" }
  • example 12 math – parse error at "12"
  • example grade 12 science – { "type": "science", "grade": { "year": 12}}
  • example undergrad 4 science – {"type": "science", "grade": { "category": "undergrad", "year": 4}}

Schemas

Schemas provide a way to ensure data conforms to a format. Schemas can be constructed in two ways:

  • Loading a schema from a file
  • Creating a schema from a parser.

Schemas for Parsers

To create a parser-based schema:

#include <parsnip_command.h>
Parsnip::Parser my_parser ();
my_parser.addOptionParser (...);
my_parser.addStatements (...);
SchemaSet my_schema { my_parser };

You can tweak the schema with 3 calls: addMember, replaceMember, and removeMember.

Recall, command lines have a distinct command ID. The SchemaSet contains schemas for each of these. Where multiple patterns are defined for the same command ID, the schema accommodates all variants.

There are different approaches to embedding the request name in the request. For this example, let's say it is stored as a string in a member named "requestType".

We'll add a new member "requestType" to all the schemas, or the unexpected message will cause schema validation to fail:

#include <parsnip/parsnip_schema.h>
my_schema.addMember ("requestType", Parsnip::StringSchema(), true);

Now we can validate:

function process_json_message (const std::string &message_text) {
    Parsnip::Data message = Parsnip::parse_json (message_text);
    int command_id = my_lookup_function (message ["requestType"]);
    my_schema.validate (command_id, message);
    // No exception was thrown, so do the processing.
    switch (command_id) {
        case ...
    }
}