![]() |
Parsnip
parsing library
|
Parsnip is a parsing library written in C++ that provides:
Data
, that can contain integers, floating point numbers, strings, booleans, dictionaries (keyed by strings), and lists.Data
, and serialization of Data
to JSON.Data
allows for storing data in runtime-defined types/structures in the style of non-compiled languages. Despite the flexibility, this is a disadvantage since dynamic typing lacks the benefits of compilation-time error detection. Nevertheless, JSON has become a popular interchange and protocol language, to which the Data
structure maps closely. Data
is thus useful as an intermediary when extracting data from files/messages into native structures, or preparing data for storage or transmission via JSON.
Data
objects behave in a C++ manner, including copy and move construction and assignment, and deep comparison. Loading a file is easy and accessing the data straightforward. Flexibility is provided where it makes sense; beyond that, data is expected to be the expected types, with exceptions thrown when inconsistencies are encountered.
Parsnip enables rapid development via a straightforward, familiar-feeling interface that's usually easy to work with. It offers reasonable performance and handles average situations well. Faced with massive files, or if concerns of runtime performance outweigh developer performance, then consider other JSON libraries.
If performance is a concern, there are two issues to beware: copy construction, and the make_dictionary
function.
If you're assembling a list of dictionaries, for example, you might do something like:
Parsnip::Data response = Parsnip::make_list(): for (const auto &record : matches) { Parsnip::Data rec = Parsnip::Data::make_dictionary ( {"id", record.id}, {"name", record.name}, ... ); rec ["ytd_income"] = some_calculation(); response.push_back (rec); // Copy construction } transmit (response);
The push_back
will copy rec
when it adds it to response
. The more complex/deep rec
gets, the more the performance impact. Changing the offending line to
response.push_back (std::move (rec)); // Move construction
will avoid the copy. So before you ship, set a breakpoint on the copy constructor and make sure it never gets called.
The make_dictionary
function (see the above example) uses a std::initializer_list
of std::pair<const char *, const Data>
, thus implicitly relies on copy construction. In the above example, make_dictionary
could be replaced with:
Parsnip::Data rec { Parsnip::Data::Dictionary, "id", record.id, "name", record.name, ... };
The downside is the loss of code-clarity offered by the braced-pairs of make_dictionary.h
, made worse by clang-format
and astyle
reformatting away neatly-arranged pairings.
Parsnip::Data
can be constructed in code:
Parsnip::Data(optional initial value) Parsnip::make_dictionary (optional initializer list) Parsnip::make_list (optional initializer list)
Or Parsnip can construct them from JSON:
Parsnip::parse_json (std::string) Parsnip::parse_json (std::istream, bool check_termination)
These parse JSON from either a string or from a stream. For strings, the entire string must be utilized or DataFormatError is thrown. For streams, this is only true if check_termination
is true
.
After construction, manipulation is straightforward:
Parsnip::Data myvalue; myvalue = 555; // Parsnip::Data myvalue {555}; would have worked too. std::cout << "myvalue=" << myvalue.asInteger() << std::endl; // "myvalue=555" std::cout << "myvalue=" << myvalue.asReal() << std::endl; // also "myvalue=555" std::cout << "myvalue=" << myvalue.asString() << std::endl; // throws IncorrectDataType.
Alternately, there are template versions of the accessor functions. Templated numeric accessors perform applicable range validation:
std::cout << "myvalue=" << myvalue.as<int>() << std::endl; // "myvalue=555" std::cout << "myvalue=" << myvalue.as<char>() << std::endl; // throws DataRangeError
List members can be accessed via indexing or iterator. They can be updated via indexed assignment; new elements can be added with push_back
.
For lists, both the list and the list members are Data
:
Parsnip::Data mylist = Parsnip::make_list (3.14, 4, 7, 9}; mylist [1] = 5; // Implicitly invokes Data constructor on 5, then does assignment. std::cout << mylist [0].asDouble() << std::endl; // "3.14" std::cout << mylist [0].asInteger() << std::endl; // throws IncorrectDataType std::cout << mylist [1].asDouble() << std::endl; // "5" auto vector_of_float = mylist.as <std::vector <float>>(); // Succeeds auto vector_of_int = mylist.as <std::vector <int>>(); // Throws IncorrectDataType because of 3.14 mylist.push_back ("eleven"); // Implicity invokes Data constructor and adds to list.
The list can be iterated in the usual ways:
for (const auto &element : mylist) { std::cout << element.asDouble() << std::endl; }
Given the list previously created, this would output: "3.14 <newline> 5 <newline> 7 <newline> 9 <newline>", then throw upon encountering the string "eleven". This demonstrates the limited utility of iterating over lists of mixed datatypes.
An alternate implementation of the loop is to use foreach
, which exhibits the same behavior but involves lambda or functor awkwardness:
auto lambda = std::function<void(const double)> = [] (const double element)->void { std::cout << element.asDouble() << std::endl; } mylist.foreach (lambda);
For dictionaries, the dictionary and the dictionary values are Data
. Keys, however, are always strings (as they are in Javascript).
const Parsnip::Data mydict = Parsnip::make_dictionary ( {"pi", 3.1415926535897932385}, {"e", "2.71828182844} }; std::cout << "e=" << mydict ["e"].as<double>() << std::endl; // "e=2.71828182844" std::cout << "pi= << mydict ["pi"].as<float>() << std::endl; // "pi=3.14159" std::cout << "i=" << mydict ["i"].asInteger() << std::endl; // throws NoSuchKey
Note: If mydict was *not const, would throw IncorrectDataType after implicitlyconstructing construct member "i" with a null value.*
The getOr
method retrieves a dictionary value, or returns a default value if the key does not exist.
std::cout << "pi=" << mydict.getOr ("pi", 3.14) << std::endl; // "pi=3.14159265358979324" // Compiler interprets floating points as double, so value from dictionary treated as such std::cout << "avagadro=" << mydict.getOr ("avagadro", 6.0221409e+23) << std::endl; // "avagadro=6.0221409e+23"; number comes from the provided default. Parsnip::Data imaginaries = mydict.getOr ("imaginary", Parsnip::make_dictionary()); // assigns an empty dictionary to `imaginaries` since the element isn't present
Iterating over dictionary values requires using a lambda or functor:
auto lambda = std::function<void(const double)> = [] (const std::string &key, const double element)->void { std::cout << key << "=" << element.asDouble() << std::endl; } mylist.foreach (lambda);
This example outputs "pi=3.1415925635897932384" and "e=2.71828182844", but ordering is unpredictable.
Reading a JSON file into Data
is straightforward:
Parsnip::Data load_json_file (const char *filename) { std::ifstream input (filename); return Parsnip::parse_json (input); }
Writing is similarly easy:
void write_json_file (const char *filename, const Parsnip::Data &data) { std::ofstream output (filename); data.toJson (output); }
For logging, there's a helpful dump function:
foobar.dumpJson ("foobar", std::cerr);
This prefaces the JSON representation with "foobar: ", and indents the rendered JSON for easier reading.
A schema is defined in JSON. Once you have written the definition:
#include <parsnip.h> std::ifstream insch ("myschema.json"); Parsnip::Data def = Parsnip::parse_json (insch); Parsnip::Schema schema (def);
To validate a Parsnip::Data object against a schema:
std::ifstream indata ("mydata.json"); Parsnip::Data my_data = Parsnip::parse_json (indata); schema.validate (my_data);
Schemas are defined at JSON Schema. In Parsnip's implementation of schemas:
boolean
and null
are fully implementednumber
is fully implemented, including ranges.integer
is fully implemented, including ranges and multiples.string
supports mixLength
and maxLength
. String schemas disallow control characters (newline, return, backspace, etc.). pattern
(regular expressions) is supported (and lifts the no-control-characters prohibition). format
implements the more common formats using regexps.array
supports minItems
and maxItems
. items
must be a single type; tuples are not supported. (If you want a tuple, use a dictionary where things get names!). object
has mixed support:minProperties
, maxProperties
, and required
are fully supported.dependencies
can only specify other existing properties; they can not create new ones, and will throw if they try.additionalProperties
can only be true
or false
; it can't specify a type (which would be stupid anyway: if you don't know what it is yet, how do you know what type it will be? This is a great way to make trouble for your future self) and will throw if you try.patternProperties
are unsupported, but ignored if present.Additionally:
any
type accepts anything without checks.nullable
can be set to true
to allow the checked item to also accept null
. Default is false.ignoreCase
exists for enumerations, const
, and regular expressions.enum
and const
only apply to strings.$comment
are allowed, and ignored.allOf
, anyOf
, and oneOf
are supported.not
is also supported.if
, then
and else
are not implemented.$ref
are not implemented.Parsing and validating command lines is a hassle. Parsnip's command line parser accepts patterns against which command lines are later matched and validated. The ID of the matched pattern is returned, or an exception is thrown if no pattern matched.
Command patterns can contain fields marked for extraction. Values from these "blanks" are inserted into a Data object. Extraction fields may accept any value, a required value from a list, or an optional value from a list. Some patterns might be:
login as {username} with password {password}
kick user {username} [{message...}]
// "message" is optionalshutdown <manner:immediate|graceful>
// "immediate" or "graceful" is requiredstop [now:now]
// "now" is optionalWhen parsed, extracted values are put into a Parsnip::Data
dictionary keyed by the names specified in the patterns. The above 4 examples could result in:
{ "username": "perette", "password": "i0remember" }
{ "username": "annoyingguy", "message": "Stop being annoying" }
, or alternately, { "username": "annoyingguy" }
, since message
is marked as optional.{ "manner": "graceful" }
{}
or {"now": "now"}
Command patterns can include references to a previously-defined option parser. Common uses of an option parser are:
Parsnip also provides Interpreter and Dispatcher templates. Interpreter is an abstract class; classes that will be accepting input for processing can subclass and implement this.
The dispatcher is initialized with a reference to a parser. It must also be taught mappings from command IDs to handlers. Subsequently, it may be called with a command line and context object; the command line is parsed, and the appropriate handler/interpreter invoked.
To use the parser, create a parser instance, then load statement definitions into it. Each statement has an ID number, which may or may not be unique depending on your needs. The parser constructs a parse tree from the statements; if statements are invalid or conflict, the parser throws an exception during its configuration.
There are 3 types of parsers:
A parser definition might look like this, though usually much longer:
typedef enum my_commands_t { NOP, HELP, STATUS, HISTORY, CREATEUSER, ADDTOGROUP, DELETEUSER } Command_t; static Parsnip::Parser::Definitions statements = { { NOP, "" }, { NOP, "# [{comment...}]" }, { HELP, "help [{command}] ..." }, { HELP, "? [{command}] ..." }, { STATUS, "status" }, { HISTORY, "history [{#index:1-9999}]" }, { CREATEUSER, "create <guest|user|admin> {user} {passwd}" }, { ADDTOGROUP, "add user {user} to [group] {group}" }, { DELETEUSER, "delete user {user} ... }" } };
Statement formats can be composed of:
keyword
matches on that word in that position (case is ignored).{name}
accepts any value in that position, extracting the value into name
<one|two>
accepts any of the listed words in that position<name:one|two>
alternation, extracting the into the key name
[optional]
accepts an optional keyword, which is automatically named.[three|four]
optionally accepts any single keyword[name:three|four]
names an optional single keyword which is extracted[{optional-value}]
accepts an optional value, only as the final word...
allows 0 or more additional parametersValues may have a type and range as follows:
{string}
{#numeric:3-5}
– accepts a decimal integer in the range{#numeric:3.0-5.0}
– Accepts a decimal value in the range{#numeric:0x3-0x5}
– Accepts octal, decimal, or hexadecimal integer in the range{name:optionParser}
– Accepts the option parser's sequenceValues may be optional only if they are at the end of the command line, where they may be followed by "..." to indicate repetition:
{name}
– accept exactly one value.[{name}]
– Accepts 0 or 1 value.{name} ...
– Accepts 1 or more values.[{name}] ...
– Accepts 0 or more values.Option parsers may be used inline or in trailing form. Inline option parsers accept exactly one option sequence. Used in trailing form, they accept 0 or more sequences, in the same manner as for values (above).
inline {target:targetSpecifier} <action:abort|crash|segfault>
– match 1 targetSpecifiertrailer <type:movie|show> [{options:trailerOptions}] ...
– match 0 or more trailerOptionsIt may be useful to allow other patterns alongside an option sequence:
example <type:math|science>
example {grade:gradeSpecifier} <type:math|science>
example [{grade:<gradeSpecifier>}] <type:math|science>
Suppose gradeSpecifier
was defined as:
<year:k|kindergarten>
grade <year:k|kindergarten|freshman|sophomore|junior|senior>
grade {#year:1-12}
<category:undergrad|undergraduate|graduate> {#year:1-8}
Values extracted by the option parser are put into an inner dictionary named by the parser utilizing it:
{ "type": "math" }
{ "type": "science", "grade": { "year": 12}}
{"type": "science", "grade": { "category": "undergrad", "year": 4}}
Schemas provide a way to ensure data conforms to a format. Schemas can be constructed in two ways:
To create a parser-based schema:
#include <parsnip_command.h> Parsnip::Parser my_parser (); my_parser.addOptionParser (...); my_parser.addStatements (...); SchemaSet my_schema { my_parser };
You can tweak the schema with 3 calls: addMember
, replaceMember
, and removeMember
.
Recall, command lines have a distinct command ID. The SchemaSet
contains schemas for each of these. Where multiple patterns are defined for the same command ID, the schema accommodates all variants.
There are different approaches to embedding the request name in the request. For this example, let's say it is stored as a string in a member named "requestType".
We'll add a new member "requestType" to all the schemas, or the unexpected message will cause schema validation to fail:
#include <parsnip/parsnip_schema.h> my_schema.addMember ("requestType", Parsnip::StringSchema(), true);
Now we can validate:
function process_json_message (const std::string &message_text) { Parsnip::Data message = Parsnip::parse_json (message_text); int command_id = my_lookup_function (message ["requestType"]); my_schema.validate (command_id, message); // No exception was thrown, so do the processing. switch (command_id) { case ... } }
The following compile-time defines affect Parsnip behavior:
PARSNIP_JSON_ENCODE_SOLIDUS
: When defined, the slash character, also known as "solidus", is literal encoded. The JSON specification describes this is optional.
PARSNIP_JSON_COMMENTS
: When defined, /* C-style */
and // C++ style
comments are ignored when reading JSON files.
PARSNIP_JSON_HEXADECIMAL_NUMBERS
: If defined, hexadecimal numbers are allowed in input files (i.e., 0x1f). Decimal will always be used when writing JSON.
PARSNIP_JSON_TRACK_POSITION
: If defined, the JSON parsing errors include line & character numbers.