Basic types

Constants

const std::size_t orcus::INDEX_NOT_FOUND

Generic constant to be used to indicate that a valid index value is expected but not found.

const xmlns_id_t orcus::XMLNS_UNKNOWN_ID

Value associated with an unknown XML namespace.

const xml_token_t orcus::XML_UNKNOWN_TOKEN

Value associated with an unknown XML token.

Type aliases

using orcus::xml_token_attrs_t = std::vector<xml_token_attr_t>
using orcus::xml_token_t = std::size_t

Integral type that represents a tokenized XML element name.

using orcus::xmlns_id_t = const char*

Type that represents a normalized XML namespace identifier. Internally it is a pointer value that points to a static char buffer that stores a namespace name.

Structs

struct date_time_t

Struct that holds a date or date-time value.

Public Functions

date_time_t()
date_time_t(int _year, int _month, int _day)
date_time_t(int _year, int _month, int _day, int _hour, int _minute, double _second)
date_time_t(const date_time_t &other)
~date_time_t()
date_time_t &operator=(date_time_t other)
bool operator==(const date_time_t &other) const
bool operator!=(const date_time_t &other) const
bool operator<(const date_time_t &other) const
std::string to_string() const

Convert the date-time value to an ISO-formatted string representation.

Returns:

ISO-formatted string representation of the date-time value.

void swap(date_time_t &other)

Swap the value with another instance.

Parameters:

other – another instance to swap values with.

Public Members

int year
int month
int day
int hour
int minute
double second

Public Static Functions

static date_time_t from_chars(std::string_view str)

Parse an ISO-formatted string representation of a date-time value, and convert it into a date_time_t value. A string representation allows either a date only or a date and time value, but it does not allow a time only value.

Here are some examples of ISO-formatted date and date-time values:

  • 2013-04-09 (date only)

  • 2013-04-09T21:34:09.55 (date and time)

Parameters:

str – string representation of a date-time value.

Returns:

converted date-time value consisting of a set of numeric values.

struct length_t

Holds a length value with unit of measurement.

Public Functions

length_t()
length_t(length_unit_t _unit, double _value)
length_t(const length_t &other)
length_t &operator=(const length_t &other)
std::string to_string() const
bool operator==(const length_t &other) const noexcept
bool operator!=(const length_t &other) const noexcept

Public Members

length_unit_t unit
double value
struct parse_error_value_t

Parser token that represents the state of a parse error, used by threaded_json_parser and threaded_sax_token_parser when transferring parse status between threads.

Public Functions

parse_error_value_t()
parse_error_value_t(const parse_error_value_t &other)
parse_error_value_t(std::string_view _str, std::ptrdiff_t _offset)
parse_error_value_t &operator=(const parse_error_value_t &other)
bool operator==(const parse_error_value_t &other) const
bool operator!=(const parse_error_value_t &other) const

Public Members

std::string_view str

error message associated with the parse error.

std::ptrdiff_t offset

offset in stream where the error occurred.

struct xml_declaration_t

Struct holding XML declaration properties.

Public Functions

xml_declaration_t()
xml_declaration_t(uint8_t _version_major, uint8_t _version_minor, character_set_t _encoding, bool _standalone)
xml_declaration_t(const xml_declaration_t &other)
~xml_declaration_t()
xml_declaration_t &operator=(const xml_declaration_t &other)
bool operator==(const xml_declaration_t &other) const
bool operator!=(const xml_declaration_t &other) const

Public Members

uint8_t version_major
uint8_t version_minor
character_set_t encoding
bool standalone
struct xml_name_t

Represents a name with a normalized namespace in XML documents. This can be used either as an element name or as an attribute name.

Public Types

enum to_string_type

Values:

enumerator use_alias
enumerator use_short_name

Public Functions

xml_name_t() noexcept
xml_name_t(xmlns_id_t _ns, std::string_view _name)
xml_name_t(const xml_name_t &other)
xml_name_t &operator=(const xml_name_t &other)
bool operator==(const xml_name_t &other) const noexcept
bool operator!=(const xml_name_t &other) const noexcept
std::string to_string(const xmlns_context &cxt, to_string_type type) const

Convert a namespace-name value pair to a string representation with the namespace value converted to either an alias or a unique “short name”. Refer to get_alias() and get_short_name() for the explanations of an alias and short name.

Parameters:
  • cxt – namespace context object associated with the XML stream currently being parsed.

  • type – policy on how to convert a namespace identifier to a string representation.

Returns:

string representation of a namespace-name value pair.

std::string to_string(const xmlns_repository &repo) const

Convert a namespace-name value pair to a string representation with the namespace value converted to a unique “short name”. Refer to get_short_name() for the explanations of a short name.

Parameters:

repo – namespace repository.

Returns:

string representation of a namespace-name value pair.

Public Members

xmlns_id_t ns
std::string_view name
struct xml_token_attr_t

Struct containing properties of a tokenized XML attribute.

Public Functions

xml_token_attr_t()
xml_token_attr_t(const xml_token_attr_t &other)
xml_token_attr_t(xmlns_id_t _ns, xml_token_t _name, std::string_view _value, bool _transient)
xml_token_attr_t(xmlns_id_t _ns, xml_token_t _name, std::string_view _raw_name, std::string_view _value, bool _transient)
xml_token_attr_t &operator=(const xml_token_attr_t &other)

Public Members

xmlns_id_t ns
xml_token_t name
std::string_view raw_name
std::string_view value
bool transient

Whether or not the attribute value is transient. A transient value is only guaranteed to be valid until the end of the start_element call, after which its validity is not guaranteed. A non-transient value is guaranteed to be valid during the life cycle of the xml stream it belongs to.

struct xml_token_element_t

Struct containing XML element properties passed to the handler of sax_token_parser via its start_element() and end_element() calls.

Public Functions

xml_token_element_t &operator=(xml_token_element_t) = delete
xml_token_element_t()
xml_token_element_t(xmlns_id_t _ns, xml_token_t _name, std::string_view _raw_name, std::vector<xml_token_attr_t> &&_attrs)
xml_token_element_t(const xml_token_element_t &other)
xml_token_element_t(xml_token_element_t &&other)

Public Members

xmlns_id_t ns
xml_token_t name
std::string_view raw_name
xml_token_attrs_t attrs

Enums

enum class orcus::character_set_t

Character set types, generated from IANA character-sets specifications.

Values:

enumerator unspecified
enumerator adobe_standard_encoding
enumerator adobe_symbol_encoding
enumerator amiga_1251
enumerator ansi_x3_110_1983
enumerator asmo_449
enumerator big5
enumerator big5_hkscs
enumerator bocu_1
enumerator brf
enumerator bs_4730
enumerator bs_viewdata
enumerator cesu_8
enumerator cp50220
enumerator cp51932
enumerator csa_z243_4_1985_1
enumerator csa_z243_4_1985_2
enumerator csa_z243_4_1985_gr
enumerator csn_369103
enumerator dec_mcs
enumerator din_66003
enumerator dk_us
enumerator ds_2089
enumerator ebcdic_at_de
enumerator ebcdic_at_de_a
enumerator ebcdic_ca_fr
enumerator ebcdic_dk_no
enumerator ebcdic_dk_no_a
enumerator ebcdic_es
enumerator ebcdic_es_a
enumerator ebcdic_es_s
enumerator ebcdic_fi_se
enumerator ebcdic_fi_se_a
enumerator ebcdic_fr
enumerator ebcdic_it
enumerator ebcdic_pt
enumerator ebcdic_uk
enumerator ebcdic_us
enumerator ecma_cyrillic
enumerator es
enumerator es2
enumerator euc_jp
enumerator euc_kr
enumerator extended_unix_code_fixed_width_for_japanese
enumerator gb18030
enumerator gb2312
enumerator gb_1988_80
enumerator gb_2312_80
enumerator gbk
enumerator gost_19768_74
enumerator greek7
enumerator greek7_old
enumerator greek_ccitt
enumerator hp_desktop
enumerator hp_legal
enumerator hp_math8
enumerator hp_pi_font
enumerator hp_roman8
enumerator hz_gb_2312
enumerator ibm00858
enumerator ibm00924
enumerator ibm01140
enumerator ibm01141
enumerator ibm01142
enumerator ibm01143
enumerator ibm01144
enumerator ibm01145
enumerator ibm01146
enumerator ibm01147
enumerator ibm01148
enumerator ibm01149
enumerator ibm037
enumerator ibm038
enumerator ibm1026
enumerator ibm1047
enumerator ibm273
enumerator ibm274
enumerator ibm275
enumerator ibm277
enumerator ibm278
enumerator ibm280
enumerator ibm281
enumerator ibm284
enumerator ibm285
enumerator ibm290
enumerator ibm297
enumerator ibm420
enumerator ibm423
enumerator ibm424
enumerator ibm437
enumerator ibm500
enumerator ibm775
enumerator ibm850
enumerator ibm851
enumerator ibm852
enumerator ibm855
enumerator ibm857
enumerator ibm860
enumerator ibm861
enumerator ibm862
enumerator ibm863
enumerator ibm864
enumerator ibm865
enumerator ibm866
enumerator ibm868
enumerator ibm869
enumerator ibm870
enumerator ibm871
enumerator ibm880
enumerator ibm891
enumerator ibm903
enumerator ibm904
enumerator ibm905
enumerator ibm918
enumerator ibm_symbols
enumerator ibm_thai
enumerator iec_p27_1
enumerator inis
enumerator inis_8
enumerator inis_cyrillic
enumerator invariant
enumerator iso_10367_box
enumerator iso_10646_j_1
enumerator iso_10646_ucs_2
enumerator iso_10646_ucs_4
enumerator iso_10646_ucs_basic
enumerator iso_10646_unicode_latin1
enumerator iso_10646_utf_1
enumerator iso_11548_1
enumerator iso_2022_cn
enumerator iso_2022_cn_ext
enumerator iso_2022_jp
enumerator iso_2022_jp_2
enumerator iso_2022_kr
enumerator iso_2033_1983
enumerator iso_5427
enumerator iso_5427_1981
enumerator iso_5428_1980
enumerator iso_646_basic_1983
enumerator iso_646_irv_1983
enumerator iso_6937_2_25
enumerator iso_6937_2_add
enumerator iso_8859_1
enumerator iso_8859_10
enumerator iso_8859_13
enumerator iso_8859_14
enumerator iso_8859_15
enumerator iso_8859_16
enumerator iso_8859_1_windows_3_0_latin_1
enumerator iso_8859_1_windows_3_1_latin_1
enumerator iso_8859_2
enumerator iso_8859_2_windows_latin_2
enumerator iso_8859_3
enumerator iso_8859_4
enumerator iso_8859_5
enumerator iso_8859_6
enumerator iso_8859_6_e
enumerator iso_8859_6_i
enumerator iso_8859_7
enumerator iso_8859_8
enumerator iso_8859_8_e
enumerator iso_8859_8_i
enumerator iso_8859_9
enumerator iso_8859_9_windows_latin_5
enumerator iso_8859_supp
enumerator iso_ir_90
enumerator iso_unicode_ibm_1261
enumerator iso_unicode_ibm_1264
enumerator iso_unicode_ibm_1265
enumerator iso_unicode_ibm_1268
enumerator iso_unicode_ibm_1276
enumerator it
enumerator jis_c6220_1969_jp
enumerator jis_c6220_1969_ro
enumerator jis_c6226_1978
enumerator jis_c6226_1983
enumerator jis_c6229_1984_a
enumerator jis_c6229_1984_b
enumerator jis_c6229_1984_b_add
enumerator jis_c6229_1984_hand
enumerator jis_c6229_1984_hand_add
enumerator jis_c6229_1984_kana
enumerator jis_encoding
enumerator jis_x0201
enumerator jis_x0212_1990
enumerator jus_i_b1_002
enumerator jus_i_b1_003_mac
enumerator jus_i_b1_003_serb
enumerator koi7_switched
enumerator koi8_r
enumerator koi8_u
enumerator ks_c_5601_1987
enumerator ksc5636
enumerator kz_1048
enumerator latin_greek
enumerator latin_greek_1
enumerator latin_lap
enumerator macintosh
enumerator microsoft_publishing
enumerator mnem
enumerator mnemonic
enumerator msz_7795_3
enumerator nats_dano
enumerator nats_dano_add
enumerator nats_sefi
enumerator nats_sefi_add
enumerator nc_nc00_10_81
enumerator nf_z_62_010
enumerator nf_z_62_010_1973
enumerator ns_4551_1
enumerator ns_4551_2
enumerator osd_ebcdic_df03_irv
enumerator osd_ebcdic_df04_1
enumerator osd_ebcdic_df04_15
enumerator pc8_danish_norwegian
enumerator pc8_turkish
enumerator pt
enumerator pt2
enumerator ptcp154
enumerator scsu
enumerator sen_850200_b
enumerator sen_850200_c
enumerator shift_jis
enumerator t_101_g2
enumerator t_61_7bit
enumerator t_61_8bit
enumerator tis_620
enumerator tscii
enumerator unicode_1_1
enumerator unicode_1_1_utf_7
enumerator unknown_8bit
enumerator us_ascii
enumerator us_dk
enumerator utf_16
enumerator utf_16be
enumerator utf_16le
enumerator utf_32
enumerator utf_32be
enumerator utf_32le
enumerator utf_7
enumerator utf_7_imap
enumerator utf_8
enumerator ventura_international
enumerator ventura_math
enumerator ventura_us
enumerator videotex_suppl
enumerator viqr
enumerator viscii
enumerator windows_1250
enumerator windows_1251
enumerator windows_1252
enumerator windows_1253
enumerator windows_1254
enumerator windows_1255
enumerator windows_1256
enumerator windows_1257
enumerator windows_1258
enumerator windows_31j
enumerator windows_874
enum class orcus::dump_format_t

Formats supported by orcus as output formats.

Values:

enumerator unknown
enumerator none
enumerator check
enumerator csv
enumerator flat
enumerator html
enumerator json
enumerator xml
enumerator yaml
enumerator debug_state
enum class orcus::format_t

Input formats that orcus can import.

Values:

enumerator unknown
enumerator ods
enumerator xlsx
enumerator gnumeric
enumerator xls_xml
enumerator csv
enumerator parquet
enum class orcus::length_unit_t

Unit of length, as used in length_t.

Values:

enumerator unknown
enumerator centimeter
enumerator millimeter
enumerator xlsx_column_digit

Special unit of length used by Excel, defined as the maximum digit width of font used as the “Normal” style font.

Note

Since it’s not possible to determine the actual length using this unit, it is approximated by 1.9 millimeters.

enumerator inch
enumerator point
enumerator twip

One twip is a twentieth of a point equal to 1/1440 of an inch.

enumerator pixel

Utility functions

std::vector<std::pair<std::string_view, dump_format_t>> orcus::get_dump_format_entries()

Get a list of available output format entries. Each entry consists of the name of a format and its enum value equivalent.

Returns:

list of available output format entries.

character_set_t orcus::to_character_set(std::string_view s)

Parse a string that represents a character set and convert it to a corresponding enum value.

Parameters:

s – string representing a character set.

Returns:

enum value representing a character set, or character_set_t::unspecified in case it cannot be determined.

dump_format_t orcus::to_dump_format_enum(std::string_view s)

Parse a string that represents an output format type and convert it to a corresponding enum value.

Parameters:

s – string representing an output format type.

Returns:

enum value representing a character set, or character_set_t::unknown in case it cannot be determined.