James Kuszmaul | b13e13f | 2023-11-22 20:44:04 -0800 | [diff] [blame] | 1 | = WPILib Packed Struct Serialization Specification, Version 1.0 |
| 2 | WPILib Developers <wpilib@wpi.edu> |
| 3 | Revision 1.0 (0x0100), 6/8/2023 |
| 4 | :toc: |
| 5 | :toc-placement: preamble |
| 6 | :sectanchors: |
| 7 | |
| 8 | A simple format and schema for serialization of packed fixed size structured data. |
| 9 | |
| 10 | [[motivation]] |
| 11 | == Motivation |
| 12 | |
| 13 | Schema-based serialization formats such as Protobuf and Flatbuffers are extremely flexible and can handle data type evolution, complex nested data structures, variable size / repeated data, optional fields, etc. However, this flexibility comes at a cost in both serialized data size and processing overhead. Many simple data structures, such as screen coordinates or robot poses, are fixed in size and can be stored much more compactly and serialized/deserialized much more quickly, especially on embedded or real-time platforms. |
| 14 | |
| 15 | Simply storing a C-style packed structure is very compact and fast, but information about the layout of the structure and the meaning of each member must be separately communicated for introspection by other tools such as interactive dashboards for data analysis of individual structure members. The motivation for this standard layout and schema is to provide a standardized means to communicate this information and enable dynamic decoding. |
| 16 | |
| 17 | Python's struct module uses a character-based approach to describe data layout of structures, but has no provisions for naming each member to communicate intent/meaning. |
| 18 | |
| 19 | [[references]] |
| 20 | == References |
| 21 | |
| 22 | [[c-struct-declaration]] |
| 23 | * Struct declaration, https://en.cppreference.com/w/c/language/struct |
| 24 | |
| 25 | [[definitions]] |
| 26 | == Definitions |
| 27 | |
| 28 | [[schema]] |
| 29 | == Schema |
| 30 | |
| 31 | The schema is a text-based format with similar syntax to the list of variable declarations in a C structure. The C syntax is flexible, easy to parse, and matches the intent of specifying a fixed size structure. |
| 32 | |
| 33 | Each member of the struct is defined by a single declaration. Each declaration is either a standard declaration or a bit-field declaration. Declarations are separated by semicolons. The last declaration may optionally have a trailing semicolon. Empty declarations (e.g. two semicolons back-to-back or separated by only whitespace) are allowed but are ignored. Unlike C structures, every declaration must be separated by a semicolon; commas cannot be used to declare multiple members with the same type. Declarations may also start and end with whitespace. |
| 34 | |
| 35 | [[variable]] |
| 36 | === Standard Declaration |
| 37 | |
| 38 | Standard declarations declare a member of a certain type or a fixed-size array of that type. The structure of a standard declaration is: |
| 39 | |
| 40 | * optional enum specification (integer data types only) |
| 41 | * optional whitespace |
| 42 | * type name |
| 43 | * whitespace |
| 44 | * identifier name |
| 45 | * optional array size, consisting of: |
| 46 | * optional whitespace |
| 47 | * `[` |
| 48 | * optional whitespace |
| 49 | * size of array |
| 50 | * optional whitespace |
| 51 | * `]` |
| 52 | |
| 53 | The type name may be one of these: |
| 54 | |
| 55 | [cols="1,1,3", options="header"] |
| 56 | |=== |
| 57 | |Type Name|Description|Payload Data Contents |
| 58 | |`bool`|boolean|single byte (0=false, 1=true) |
| 59 | |`char`|character|single byte (assumed UTF-8) |
| 60 | |`int8`|integer|1-byte (8-bit) signed value |
| 61 | |`int16`|integer|2-byte (16-bit) signed value |
| 62 | |`int32`|integer|4-byte (32-bit) signed value |
| 63 | |`int64`|integer|8-byte (64-bit) signed value |
| 64 | |`uint8`|unsigned integer|1-byte (8-bit) unsigned value |
| 65 | |`uint16`|unsigned integer|2-byte (16-bit) unsigned value |
| 66 | |`uint32`|unsigned integer|4-byte (32-bit) unsigned value |
| 67 | |`uint64`|unsigned integer|8-byte (64-bit) unsigned value |
| 68 | |`float` or `float32`|float|4-byte (32-bit) IEEE-754 value |
| 69 | |`double` or `float64`|double|8-byte (64-bit) IEEE-754 value |
| 70 | |=== |
| 71 | |
| 72 | If it is not one of the above, the type name must be the name of another struct. |
| 73 | |
| 74 | Examples of valid standard declarations: |
| 75 | |
| 76 | * `bool value` (boolean value, 1 byte) |
| 77 | * `double arr[4]` (array of 4 doubles, 32 bytes total) |
| 78 | * `enum {a=1, b=2} int8 val` (enumerated value, 1 byte) |
| 79 | |
| 80 | [[enum]] |
| 81 | ==== Enum Specification |
| 82 | |
| 83 | Integer declarations may have an enum specification to provide meaning to specific values. Values that are not specified may be communicated, but have no specific defined meaning. The structure of an enum specification is: |
| 84 | |
| 85 | * optional `enum` |
| 86 | * optional whitespace |
| 87 | * `{` |
| 88 | * zero or more enum values, consisting of: |
| 89 | * optional whitespace |
| 90 | * identifier |
| 91 | * optional whitespace |
| 92 | * `=` |
| 93 | * optional whitespace |
| 94 | * integer value |
| 95 | * optional whitespace |
| 96 | * comma (optional for last value) |
| 97 | * optional whitespace |
| 98 | * `}` |
| 99 | |
| 100 | Examples of valid enum specifications: |
| 101 | |
| 102 | * `enum{}` |
| 103 | * `enum { a = 1 }` |
| 104 | * `enum{a=1,b=2,}` |
| 105 | * `{a=1}` |
| 106 | |
| 107 | Examples of invalid enum specifications: |
| 108 | |
| 109 | * `enum` (no `{}`) |
| 110 | * `enum{=2}` (missing identifier) |
| 111 | * `enum{a=1,b,c}` (missing values) |
| 112 | |
| 113 | [[]] |
| 114 | === Bit-field Declaration |
| 115 | |
| 116 | Bit-field declarations declare a member with an explicit width in bits. The structure of a bit-field declaration is: |
| 117 | |
| 118 | * optional enum specification (integer data types only) |
| 119 | * optional whitespace |
| 120 | * type name; must be boolean or one of the integer data types |
| 121 | * whitespace |
| 122 | * identifier name |
| 123 | * optional whitespace |
| 124 | * colon (`:`) |
| 125 | * optional whitespace |
| 126 | * integer number of bits; minimum 1; maximum 1 for boolean types; for integer types, maximum is the width of the type (e.g. 32 for int32) |
| 127 | |
| 128 | As with non-bit-field integer variable declarations, an enum can be specified for integer bit-fields (e.g. `enum {a=1, b=2} uint32 value : 2`). |
| 129 | |
| 130 | It is not possible to have an array of bit-fields. |
| 131 | |
| 132 | Examples of valid bit-field declarations: |
| 133 | |
| 134 | * `bool value : 1` |
| 135 | * `enum{a=1,b=2}int8 value:2` |
| 136 | |
| 137 | Examples of invalid bit-field declarations: |
| 138 | |
| 139 | * `double val:2` (must be integer or boolean) |
| 140 | * `int32 val[2]:2` (cannot be array) |
| 141 | * `bool val:3` (bool must be 1 bit) |
| 142 | * `int16 val:17` (bit field larger than storage size) |
| 143 | |
| 144 | [[layout]] |
| 145 | == Data Layout |
| 146 | |
| 147 | Members are stored in the same order they appear in the schema. Individual members are stored in little-endian order. Members are not aligned to any particular boundary; no byte-level padding is present in the data. |
| 148 | |
| 149 | [source] |
| 150 | ---- |
| 151 | bool b; |
| 152 | int16 i; |
| 153 | ---- |
| 154 | |
| 155 | results in a 3-byte encoding: |
| 156 | |
| 157 | `bbbbbbbb iiiiiiii iiiiiiii` |
| 158 | |
| 159 | where the first `iiiiiiii` is the least significant byte of `i`. |
| 160 | |
| 161 | [[layout-array]] |
| 162 | === Array Data Layout |
| 163 | |
| 164 | For array members, the individual items of the array are stored consecutively with no padding between each item. |
| 165 | |
| 166 | [source] |
| 167 | ---- |
| 168 | int16 i[2]; |
| 169 | ---- |
| 170 | |
| 171 | results in a 4-byte encoding: |
| 172 | |
| 173 | `i0i0i0i0 i0i0i0i0 i1i1i1i1 i1i1i1i1` |
| 174 | |
| 175 | where `i0` is the first element of the array, `i1` is the second element. |
| 176 | |
| 177 | [[layout-nested-structure]] |
| 178 | |
| 179 | Nested structures also have no surrounding padding. |
| 180 | |
| 181 | Given the Inner schema |
| 182 | |
| 183 | [source] |
| 184 | ---- |
| 185 | int16 i; |
| 186 | int8 x; |
| 187 | ---- |
| 188 | |
| 189 | and an outer schema of |
| 190 | |
| 191 | [source] |
| 192 | ---- |
| 193 | char c; |
| 194 | Inner s; |
| 195 | bool b; |
| 196 | ---- |
| 197 | |
| 198 | results in a 5-byte encoding: |
| 199 | |
| 200 | `cccccccc iiiiiiii iiiiiiii xxxxxxxx bbbbbbbb` |
| 201 | |
| 202 | [[layout-bit-field]] |
| 203 | === Bit-Field Data Layout |
| 204 | |
| 205 | Multiple adjacant bit-fields of the same integer type width are packed together to fit in the minimum number of multiples of that type. The bit-fields are packed, starting from the least significant bit, in the order they appear in the schema. Individual bit-fields must not span across multiple underlying types; if a bit-field is larger than the remaining space in the data type, a new element of that type is started and the bit-field starts from the least significant bit of the new element. Unused bits should be set to 0 during serialization and must be ignored during deserialization. |
| 206 | |
| 207 | Boolean bit-fields are always a single bit wide. The underlying data type is by default uint8, but if a boolean bit-field immediately follows a bit-field of another integer type (and fits), it is packed into that type. |
| 208 | |
| 209 | [source] |
| 210 | ---- |
| 211 | int8 a:4; |
| 212 | int16 b:4; |
| 213 | ---- |
| 214 | |
| 215 | results in a 3-byte encoding: |
| 216 | |
| 217 | `0000aaaa 0000bbbb 00000000` |
| 218 | |
| 219 | as the integer type widths are different, even though the bits would fit. |
| 220 | |
| 221 | [source] |
| 222 | ---- |
| 223 | int16 a:4; |
| 224 | uint16 b:5; |
| 225 | bool c:1; |
| 226 | int16 d:7; |
| 227 | ---- |
| 228 | |
| 229 | results in a 4-byte encoding: |
| 230 | |
| 231 | `bbbbaaaa 000000cb 0ddddddd 00000000` |
| 232 | |
| 233 | As `c` is packed into the preceding int16, and `d` is too large to fit in the remaining bits of the first type. |
| 234 | |
| 235 | [source] |
| 236 | ---- |
| 237 | uint8 a:4; |
| 238 | int8 b:2; |
| 239 | bool c:1; |
| 240 | int16 d:1; |
| 241 | ---- |
| 242 | |
| 243 | results in a 3-byte encoding: |
| 244 | |
| 245 | `0cbbaaaa 0000000d 00000000` |
| 246 | |
| 247 | as `d` is int16, versus the `int8` of the previous values. |
| 248 | |
| 249 | [source] |
| 250 | ---- |
| 251 | bool a:1; |
| 252 | bool b:1; |
| 253 | int8 c:2; |
| 254 | ---- |
| 255 | |
| 256 | results in a 1-byte encoding: |
| 257 | |
| 258 | `0000ccba` |
| 259 | |
| 260 | as `c` is an int8. |
| 261 | |
| 262 | [source] |
| 263 | ---- |
| 264 | bool a:1; |
| 265 | bool b:1; |
| 266 | int16 c:2; |
| 267 | ---- |
| 268 | |
| 269 | results in a 3-byte encoding: |
| 270 | |
| 271 | `000000ba 000000cc 00000000` |
| 272 | |
| 273 | as `c` is an int16. |
| 274 | |
| 275 | Bit-fields do not "look inside" of nested structures. Given Inner |
| 276 | |
| 277 | [source] |
| 278 | ---- |
| 279 | int8 a:1; |
| 280 | ---- |
| 281 | |
| 282 | and outer |
| 283 | |
| 284 | [source] |
| 285 | ---- |
| 286 | int8 b:1; |
| 287 | Outer s; |
| 288 | int8 c:1; |
| 289 | ---- |
| 290 | |
| 291 | the result is a 3-byte encoding: |
| 292 | |
| 293 | `0000000b 0000000a 0000000c` |
| 294 | |
| 295 | [[layout-character-arrays]] |
| 296 | === Character Array (String) Data Layout |
| 297 | |
| 298 | Character arrays, as with other arrays, must be fixed length. The text they contain should be UTF-8. If a string is shorter than the length of the character array, the string starts at the first byte of the array, and any unused bytes at the end of the array must be filled with 0. |
| 299 | |
| 300 | [source] |
| 301 | ---- |
| 302 | char s[4]; |
| 303 | ---- |
| 304 | |
| 305 | with a string of "a" results in: |
| 306 | |
| 307 | `01100001 00000000 00000000 00000000` |
| 308 | |
| 309 | with a string of "abcd" results in: |
| 310 | |
| 311 | `01100001 01100010 01100011 01100100` |