| Title: | Repair Malformed JSON Strings |
|---|---|
| Description: | Repairs malformed JSON strings, particularly those generated by Large Language Models. Handles missing quotes, trailing commas, unquoted keys, and other common JSON syntax errors. |
| Authors: | Dyfan Jones [aut, cre] |
| Maintainer: | Dyfan Jones <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-03 08:39:46 UTC |
| Source: | https://github.com/DyfanJones/llmjson |
This function compiles a schema definition into an efficient internal representation that can be reused across multiple JSON repair operations. This dramatically improves performance when repairing many JSON strings with the same schema, as the schema only needs to be parsed once.
json_schema(schema, ...) ## S3 method for class 'LLMJsonSchema' json_schema(schema, ...) ## S3 method for class ''ellmer::Type'' json_schema(schema, ...)json_schema(schema, ...) ## S3 method for class 'LLMJsonSchema' json_schema(schema, ...) ## S3 method for class ''ellmer::Type'' json_schema(schema, ...)
schema |
A schema definition. Can be:
|
... |
Additional arguments passed to methods |
The function is a generic that supports:
LLMJsonSchema objects: Created with json_object(),
json_integer(), etc.
ellmer Type objects: Automatically converted from ellmer's type system (requires ellmer package)
A LLMJsonSchemaBuilt object (external pointer) that can be passed to repair_json_str(), repair_json_file(), or repair_json_raw()
repair_json_str(), repair_json_file(),
repair_json_raw(), repair_json_conn(), schema()
# Create a schema using llmjson functions schema <- json_object( name = json_string(), age = json_integer(), email = json_string() ) # Build it once built_schema <- json_schema(schema) # Reuse many times - much faster than rebuilding each time! repair_json_str('{"name": "Alice", "age": 30}', built_schema) repair_json_str('{"name": "Bob", "age": 25}', built_schema) ## Not run: # Convert from ellmer types (requires ellmer package) library(ellmer) user_type <- type_object( name = type_string(required = TRUE), age = type_integer(), status = type_enum(c("active", "inactive"), required = TRUE) ) # Automatically converts ellmer type to llmjson schema built_schema <- json_schema(user_type) repair_json_str( '{"name": "Alice", "age": 30, "status": "active"}', schema = built_schema, return_objects = TRUE ) ## End(Not run)# Create a schema using llmjson functions schema <- json_object( name = json_string(), age = json_integer(), email = json_string() ) # Build it once built_schema <- json_schema(schema) # Reuse many times - much faster than rebuilding each time! repair_json_str('{"name": "Alice", "age": 30}', built_schema) repair_json_str('{"name": "Bob", "age": 25}', built_schema) ## Not run: # Convert from ellmer types (requires ellmer package) library(ellmer) user_type <- type_object( name = type_string(required = TRUE), age = type_integer(), status = type_enum(c("active", "inactive"), required = TRUE) ) # Automatically converts ellmer type to llmjson schema built_schema <- json_schema(user_type) repair_json_str( '{"name": "Alice", "age": 30, "status": "active"}', schema = built_schema, return_objects = TRUE ) ## End(Not run)
This function reads JSON from an R connection (such as a file, URL, or pipe)
and repairs it. The connection is read and the content is passed to
repair_json_str() for repair.
repair_json_conn( conn, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )repair_json_conn( conn, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )
conn |
A connection object (e.g., from |
schema |
Optional schema definition for validation and type conversion |
return_objects |
Logical indicating whether to return R objects (TRUE) or JSON string (FALSE, default) |
ensure_ascii |
Logical; if TRUE, escape non-ASCII characters |
int64 |
Policy for handling 64-bit integers: "double" (default, may lose precision), "string" (preserves exact value), or "bit64" (requires bit64 package) |
A character string containing the repaired JSON, or an R object if return_objects is TRUE
repair_json_str(), repair_json_file(), repair_json_raw(), schema(), json_schema()
## Not run: # Read from a file connection conn <- file("malformed.json", "r") result <- repair_json_conn(conn) close(conn) # Read from a URL conn <- url("https://example.com/data.json") result <- repair_json_conn(conn, return_objects = TRUE) close(conn) # Read from a compressed file conn <- gzfile("data.json.gz", "r") result <- repair_json_conn(conn, return_objects = TRUE, int64 = "string") close(conn) # Or use with() to ensure connection is closed result <- with(file("malformed.json", "r"), repair_json_conn(conn)) ## End(Not run)## Not run: # Read from a file connection conn <- file("malformed.json", "r") result <- repair_json_conn(conn) close(conn) # Read from a URL conn <- url("https://example.com/data.json") result <- repair_json_conn(conn, return_objects = TRUE) close(conn) # Read from a compressed file conn <- gzfile("data.json.gz", "r") result <- repair_json_conn(conn, return_objects = TRUE, int64 = "string") close(conn) # Or use with() to ensure connection is closed result <- with(file("malformed.json", "r"), repair_json_conn(conn)) ## End(Not run)
This function reads a file containing malformed JSON and repairs it.
repair_json_file( path, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )repair_json_file( path, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )
path |
A character string with the file path |
schema |
Optional schema definition for validation and type conversion |
return_objects |
Logical indicating whether to return R objects (TRUE) or JSON string (FALSE, default) |
ensure_ascii |
Logical; if TRUE, escape non-ASCII characters |
int64 |
Policy for handling 64-bit integers: "double" (default, may lose precision), "string" (preserves exact value), or "bit64" (requires bit64 package) |
A character string containing the repaired JSON, or an R object if return_objects is TRUE
repair_json_str(), repair_json_raw(), repair_json_conn(), schema(), json_schema()
## Not run: repair_json_file("malformed.json") repair_json_file("malformed.json", return_objects = TRUE) repair_json_file("data.json", return_objects = TRUE, int64 = "string") # Preserve large integers ## End(Not run)## Not run: repair_json_file("malformed.json") repair_json_file("malformed.json", return_objects = TRUE) repair_json_file("data.json", return_objects = TRUE, int64 = "string") # Preserve large integers ## End(Not run)
This function repairs malformed JSON from a raw vector of bytes.
repair_json_raw( raw_bytes, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )repair_json_raw( raw_bytes, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )
raw_bytes |
A raw vector containing malformed JSON bytes |
schema |
Optional schema definition for validation and type conversion |
return_objects |
Logical indicating whether to return R objects (TRUE) or JSON string (FALSE, default) |
ensure_ascii |
Logical; if TRUE, escape non-ASCII characters |
int64 |
Policy for handling 64-bit integers: "double" (default, may lose precision), "string" (preserves exact value), or "bit64" (requires bit64 package) |
A character string containing the repaired JSON, or an R object if return_objects is TRUE
repair_json_str(), repair_json_file(), repair_json_conn(), schema(), json_schema()
## Not run: raw_data <- charToRaw('{"key": "value",}') repair_json_raw(raw_data) repair_json_raw(raw_data, return_objects = TRUE) repair_json_raw(raw_data, return_objects = TRUE, int64 = "bit64") # Use bit64 for large integers ## End(Not run)## Not run: raw_data <- charToRaw('{"key": "value",}') repair_json_raw(raw_data) repair_json_raw(raw_data, return_objects = TRUE) repair_json_raw(raw_data, return_objects = TRUE, int64 = "bit64") # Use bit64 for large integers ## End(Not run)
This function repairs malformed JSON strings, particularly those generated by Large Language Models. It handles missing quotes, trailing commas, unquoted keys, and other common JSON syntax errors.
repair_json_str( json_str, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )repair_json_str( json_str, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double" )
json_str |
A character string containing malformed JSON |
schema |
Optional schema definition for validation and type conversion |
return_objects |
Logical indicating whether to return R objects (TRUE) or JSON string (FALSE, default) |
ensure_ascii |
Logical; if TRUE, escape non-ASCII characters |
int64 |
Policy for handling 64-bit integers: "double" (default, may lose precision), "string" (preserves exact value), or "bit64" (requires bit64 package) |
A character string containing the repaired JSON, or an R object if return_objects is TRUE
repair_json_file(), repair_json_raw(), repair_json_conn(), schema(), json_schema()
repair_json_str('{"key": "value",}') # Removes trailing comma repair_json_str('{key: "value"}') # Adds quotes around unquoted key repair_json_str('{"key": "value"}', return_objects = TRUE) # Returns R list # Handle large integers (beyond i32 range) json_str <- '{"id": 9007199254740993}' # Preserves as "9007199254740993" repair_json_str( json_str, return_objects = TRUE, int64 = "string" ) # May lose precision repair_json_str( json_str, return_objects = TRUE, int64 = "double" ) # Requires bit64 package repair_json_str( json_str, return_objects = TRUE, int64 = "bit64" )repair_json_str('{"key": "value",}') # Removes trailing comma repair_json_str('{key: "value"}') # Adds quotes around unquoted key repair_json_str('{"key": "value"}', return_objects = TRUE) # Returns R list # Handle large integers (beyond i32 range) json_str <- '{"id": 9007199254740993}' # Preserves as "9007199254740993" repair_json_str( json_str, return_objects = TRUE, int64 = "string" ) # May lose precision repair_json_str( json_str, return_objects = TRUE, int64 = "double" ) # Requires bit64 package repair_json_str( json_str, return_objects = TRUE, int64 = "bit64" )
These functions create schema definitions that guide JSON repair and conversion to R objects. Schemas ensure that the repaired JSON conforms to expected types and structure.
json_object(..., .required = FALSE) json_integer(.default = 0L, .required = FALSE) json_number(.default = 0, .required = FALSE) json_string(.default = "", .required = FALSE) json_boolean(.default = FALSE, .required = FALSE) json_enum(.values, .default = .values[1], .required = FALSE) json_array(items, .required = FALSE) json_any(.required = FALSE) json_date(.default = NULL, .format = "iso8601", .required = FALSE) json_timestamp( .default = NULL, .format = "iso8601", .tz = "UTC", .required = FALSE )json_object(..., .required = FALSE) json_integer(.default = 0L, .required = FALSE) json_number(.default = 0, .required = FALSE) json_string(.default = "", .required = FALSE) json_boolean(.default = FALSE, .required = FALSE) json_enum(.values, .default = .values[1], .required = FALSE) json_array(items, .required = FALSE) json_any(.required = FALSE) json_date(.default = NULL, .format = "iso8601", .required = FALSE) json_timestamp( .default = NULL, .format = "iso8601", .tz = "UTC", .required = FALSE )
... |
Named arguments defining the schema for each field (json_object only) |
.required |
Logical; if TRUE, field must be present (default FALSE) |
.default |
Default value to use when field is missing. Only applies to required fields (.required = TRUE) |
.values |
Character vector of allowed values (json_enum only) |
items |
Schema definition for array elements (json_array only) |
.format |
Format string(s) for parsing dates/timestamps (json_date/json_timestamp only) |
.tz |
Timezone to use for parsing timestamps (json_timestamp only). Defaults to "UTC" |
A schema definition object
repair_json_str(), repair_json_file(), repair_json_raw(), repair_json_conn(), json_schema()
# Basic types json_string() json_integer() json_number() json_boolean() json_any() # Object with fields schema <- json_object( name = json_string(), age = json_integer(), email = json_string() ) # Array of integers json_array(json_integer()) # Enum with allowed values json_enum(c("active", "inactive", "pending")) # Optional fields with defaults json_object( name = json_string(.required = TRUE), age = json_integer(.default = 0L), active = json_boolean(.default = TRUE, .required = TRUE), status = json_enum(c("active", "inactive"), .required = TRUE) ) # Date and timestamp handling json_object( birthday = json_date(.format = "us_date"), created_at = json_timestamp(.format = "iso8601z", .tz = "UTC") )# Basic types json_string() json_integer() json_number() json_boolean() json_any() # Object with fields schema <- json_object( name = json_string(), age = json_integer(), email = json_string() ) # Array of integers json_array(json_integer()) # Enum with allowed values json_enum(c("active", "inactive", "pending")) # Optional fields with defaults json_object( name = json_string(.required = TRUE), age = json_integer(.default = 0L), active = json_boolean(.default = TRUE, .required = TRUE), status = json_enum(c("active", "inactive"), .required = TRUE) ) # Date and timestamp handling json_object( birthday = json_date(.format = "us_date"), created_at = json_timestamp(.format = "iso8601z", .tz = "UTC") )