Brian Silverman | 836e90c | 2018-08-04 16:19:46 -0700 | [diff] [blame^] | 1 | [/ |
| 2 | / Copyright (c) 2008 Marcin Kalicinski (kalita <at> poczta dot onet dot pl) |
| 3 | / Copyright (c) 2009 Sebastian Redl (sebastian dot redl <at> getdesigned dot at) |
| 4 | / |
| 5 | / Distributed under the Boost Software License, Version 1.0. (See accompanying |
| 6 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| 7 | /] |
| 8 | [section XML Parser] |
| 9 | [def __xml__ [@http://en.wikipedia.org/wiki/XML XML format]] |
| 10 | [def __xml_parser.hpp__ [headerref boost/property_tree/xml_parser.hpp xml_parser.hpp]] |
| 11 | [def __RapidXML__ [@http://rapidxml.sourceforge.net/ RapidXML]] |
| 12 | [def __boost__ [@http://www.boost.org Boost]] |
| 13 | The __xml__ is an industry standard for storing information in textual |
| 14 | form. Unfortunately, there is no XML parser in __boost__ as of the |
| 15 | time of this writing. The library therefore contains the fast and tiny |
| 16 | __RapidXML__ parser (currently in version 1.13) to provide XML parsing support. |
| 17 | RapidXML does not fully support the XML standard; it is not capable of parsing |
| 18 | DTDs and therefore cannot do full entity substitution. |
| 19 | |
| 20 | By default, the parser will preserve most whitespace, but remove element content |
| 21 | that consists only of whitespace. Encoded whitespaces (e.g.  ) does not |
| 22 | count as whitespace in this regard. You can pass the trim_whitespace flag if you |
| 23 | want all leading and trailing whitespace trimmed and all continuous whitespace |
| 24 | collapsed into a single space. |
| 25 | |
| 26 | Please note that RapidXML does not understand the encoding specification. If |
| 27 | you pass it a character buffer, it assumes the data is already correctly |
| 28 | encoded; if you pass it a filename, it will read the file using the character |
| 29 | conversion of the locale you give it (or the global locale if you give it none). |
| 30 | This means that, in order to parse a UTF-8-encoded XML file into a wptree, you |
| 31 | have to supply an alternate locale, either directly or by replacing the global |
| 32 | one. |
| 33 | |
| 34 | XML / property tree conversion schema (__read_xml__ and __write_xml__): |
| 35 | |
| 36 | * Each XML element corresponds to a property tree node. The child elements |
| 37 | correspond to the children of the node. |
| 38 | * The attributes of an XML element are stored in the subkey [^<xmlattr>]. There |
| 39 | is one child node per attribute in the attribute node. Existence of the |
| 40 | [^<xmlattr>] node is not guaranteed or necessary when there are no attributes. |
| 41 | * XML comments are stored in nodes named [^<xmlcomment>], unless comment |
| 42 | ignoring is enabled via the flags. |
| 43 | * Text content is stored in one of two ways, depending on the flags. The default |
| 44 | way concatenates all text nodes and stores them as the data of the element |
| 45 | node. This way, the entire content can be conveniently read, but the |
| 46 | relative ordering of text and child elements is lost. The other way stores |
| 47 | each text content as a separate node, all called [^<xmltext>]. |
| 48 | |
| 49 | The XML storage encoding does not round-trip perfectly. A read-write cycle loses |
| 50 | trimmed whitespace, low-level formatting information, and the distinction |
| 51 | between normal data and CDATA nodes. Comments are only preserved when enabled. |
| 52 | A write-read cycle loses trimmed whitespace; that is, if the origin tree has |
| 53 | string data that starts or ends with whitespace, that whitespace is lost. |
| 54 | [endsect] [/xml_parser] |