A minimal and portable JSON tokenizer written in standard C and C++ (two separate versions). Performs validating and highly efficient parsing suitable for reading JSON directly into custom data structures. There are no code dependencies — simply include jsont.{h,hh,c,cc}
in your project.
Build and run unit tests:
make
C API:
jsont_ctx_t* S = jsont_create(0); jsont_reset(S, uint8_t* inbuf, size_t inbuf_len); tok = jsont_next(S) // branch on `tok` ... V = jsont_*_value(S[, ...]); jsont_destroy(S);
New C++ API:
jsont::Tokenizer S(const char* inbuf, size_t length); jsont::Token token; while ((token = S.next())) { if (token == jsont::Float) { printf("%g\n", S.floatValue()); } ... else if (t == jsont::Error) { // handle error break; } }
jsont::Builder json; json.startObject() .fieldName("foo").value(123.45) .fieldName("bar").startArray() .value(678) .value("nine \"ten\"") .endArray() .endObject(); std::cout << json.toString() << std::endl; // {"foo":123.45,"bar":[678,"nine \"ten\""]}
See jsont.h
and jsont.hh
for a complete overview of the API, incuding more detailed documentation. Here's an overview:
namespace jsont
Builder build()
— convenience builder factoryReads a sequence of bytes and produces tokens and values while doing so.
Tokenizer(const char* bytes, size_t length, TextEncoding encoding)
— initialize a new Tokenizer to read bytes
of length
in encoding
void reset(const char* bytes, size_t length, TextEncoding encoding)
— Reset the tokenizer, making it possible to reuse this parser so to avoid unnecessary memory allocation and deallocation.const Token& next() throw(Error)
— Read next token, possibly throwing an Error
const Token& current() const
— Access current tokenbool hasValue() const
— True if the current token has a valuesize_t dataValue(const char const** bytes)
— Returns a slice of the input which represents the current value, or nothing (returns 0) if the current token has no value (e.g. start of an object).std::string stringValue() const
— Returns a copy of the current string value.double floatValue() const
— Returns the current value as a double-precision floating-point number.int64_t intValue() const
— Returns the current value as a signed 64-bit integer.ErrorCode error() const
— Returns the error code of the last errorconst char* errorMessage() const
— Returns a human-readable message for the last error. Never returns NULL.const char* inputBytes() const
— A pointer to the input data as passed to reset
or the constructor.size_t inputSize() const
— Total number of input bytessize_t inputOffset() const
— The byte offset into input where the tokenizer is currently at. In the event of an error, this will point to the source of the error.End
— Input endedObjectStart
— {ObjectEnd
— }ArrayStart
— [ArrayEnd
— ]True
— trueFalse
— falseNull
— nullInteger
— number value without a fraction part (access as int64 through Tokenizer::intValue()
)Float
— number value with a fraction part (access as double through Tokenizer::floatValue()
)String
— string value (access value through Tokenizer::stringValue()
et al)FieldName
— field name (access value through Tokenizer::stringValue()
et al)Error
— an error occured (access error code through Tokenizer::error()
et al)UTF8TextEncoding
— Unicode UTF-8 text encodingUnspecifiedError
— Unspecified errorUnexpectedComma
— Unexpected commaUnexpectedTrailingComma
— Unexpected trailing commaInvalidByte
— Invalid input bytePrematureEndOfInput
— Premature end of inputMalformedUnicodeEscapeSequence
— Malformed Unicode escape sequenceMalformedNumberLiteral
— Malformed number literalUnterminatedString
— Unterminated stringSyntaxError
— Illegal JSON (syntax error)Aids in building JSON, providing a final sequential byte buffer.
Builder()
— initialize a new builder with an empty backing bufferBuilder& startObject()
— Start an object (appends a '{'
character to the backing buffer)Builder& endObject()
— End an object (a '}'
character)Builder& startArray()
— Start an array ('['
)Builder& endArray()
— End an array (']'
)const void reset()
— Reset the builder to its neutral state. Note that the backing buffer is reused in this case.Builder& fieldName(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)
— Adds a field name by copying length
bytes from v
.Builder& fieldName(const std::string& name, TextEncoding encoding=UTF8TextEncoding)
— Adds a field name by copying name
.Builder& value(const char* v, size_t length, TextEncoding encoding=UTF8TextEncoding)
— Adds a string value by copying length
bytes from v
which content is encoded according to encoding
.Builder& value(const char* v)
— Adds a string value by copying strlen(v)
bytes from c-string v
. Uses the default encoding of value(const char*,size_t,TextEncoding)
.Builder& value(const std::string& v)
— Adds a string value by copying v
. Uses the default encoding of value(const char*,size_t,TextEncoding)
.Builder& value(double v)
— Adds a possibly fractional numberBuilder& value(int64_t v)
, void value(int v)
, void value(unsigned int v)
, void value(long v)
— Adds an integer numberBuilder& value(bool v)
— Adds the "true" or "false" atom, depending on v
Builder& nullValue()
— Adds the "null" atomsize_t size() const
— Number of readable bytes at the pointer returned by bytes()
const char* bytes() const
— Pointer to the backing buffer, holding the resulting JSON.std::string toString() const
— Return a std::string
object holding a copy of the backing buffer, representing the JSON.const char* seizeBytes(size_t& size_out)
— "Steal" the backing buffer. After this call, the caller is responsible for calling free()
on the returned pointer. Returns NULL on failure. Sets the value of size_out
to the number of readable bytes at the returned pointer. The builder will be reset and ready to use (which will act on a new backing buffer).jsont_ctx_t
— A tokenizer context ("instance" in OOP lingo.)jsont_tok_t
— A token type (see "Token types".)jsont_err_t
— A user-configurable error type, which defaults to const char*
.jsont_ctx_t* jsont_create(void* user_data)
— Create a new JSON tokenizer context.void jsont_destroy(jsont_ctx_t* ctx)
— Destroy a JSON tokenizer context.void jsont_reset(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)
— Reset the tokenizer to parse the data pointed to by bytes
.jsont_tok_t jsont_next(jsont_ctx_t* ctx)
— Read and return the next token.jsont_tok_t jsont_current(const jsont_ctx_t* ctx)
— Returns the current token (last token read by jsont_next
).int64_t jsont_int_value(jsont_ctx_t* ctx)
— Returns the current integer value.double jsont_float_value(jsont_ctx_t* ctx)
— Returns the current floating-point number value.size_t jsont_data_value(jsont_ctx_t* ctx, const uint8_t** bytes)
— Returns a slice of the input which represents the current value.char* jsont_strcpy_value(jsont_ctx_t* ctx)
— Retrieve a newly allocated c-string.bool jsont_data_equals(jsont_ctx_t* ctx, const uint8_t* bytes, size_t length)
— Returns true if the current data value is equal to bytes
of length
bool jsont_str_equals(jsont_ctx_t* ctx, const char* str)
— Returns true if the current data value is equal to c string str
.Note that the data is not parsed until you call one of these functions. This means that if you know that a value transferred as a string will fit in a 64-bit signed integer, it's completely valid to call jsont_int_value
to parse the string as an integer.
uint8_t jsont_current_byte(jsont_ctx_t* ctx)
— Get the last byte read.size_t jsont_current_offset(jsont_ctx_t* ctx)
— Get the current offset of the last byte read.jsont_err_t jsont_error_info(jsont_ctx_t* ctx)
— Get information on the last error.void* jsont_user_data(const jsont_ctx_t* ctx)
— Returns the value passed to jsont_create
JSONT_END
— Input ended.JSONT_ERR
— Error. Retrieve details through jsont_error_info
JSONT_OBJECT_START
— {JSONT_OBJECT_END
— }JSONT_ARRAY_START
— [JSONT_ARRAY_END
— ]JSONT_TRUE
— trueJSONT_FALSE
— falseJSONT_NULL
— nullJSONT_NUMBER_INT
— number value without a fraction part (access through jsont_int_value
or jsont_float_value
)JSONT_NUMBER_FLOAT
— number value with a fraction part (access through jsont_float_value
)JSONT_STRING
— string value (access through jsont_data_value
or jsont_strcpy_value
)JSONT_FIELD_NAME
— field name (access through jsont_data_value
or jsont_strcpy_value
)example*.c
for working sample programs.LICENSE
for the MIT-style license under which this project is licensed.