Brian Silverman | 8ed424f | 2018-08-04 23:36:27 -0700 | [diff] [blame^] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| 2 | |
| 3 | <html> |
| 4 | <head> |
| 5 | <meta http-equiv="Content-Language" content="en-us"> |
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> |
| 7 | |
| 8 | <title>Type-safe 'printf-like' format class</title> |
| 9 | </head> |
| 10 | |
| 11 | <body bgcolor="#FFFFFF" text="#000000"> |
| 12 | <h1><img align="middle" alt="boost.png (6897 bytes)" height="86" src= |
| 13 | "../../../boost.png" width="277">Type-safe 'printf-like' <b>format |
| 14 | class</b></h1> |
| 15 | |
| 16 | <h2>Choices made</h2> |
| 17 | |
| 18 | <p>"Le pourquoi du comment" ( - "the why of the how")</p> |
| 19 | <hr> |
| 20 | |
| 21 | <h3>The syntax of the format-string</h3> |
| 22 | |
| 23 | <p>Format is a new library. One of its goal is to provide a replacement for |
| 24 | printf, that means format can parse a format-string designed for printf, |
| 25 | apply it to the given arguments, and produce the same result as printf |
| 26 | would have.<br> |
| 27 | With this constraint, there were roughly 3 possible choices for the syntax |
| 28 | of the format-string :</p> |
| 29 | |
| 30 | <ol> |
| 31 | <li>Use the exact same syntax of printf. It's well known by many |
| 32 | experienced users, and fits almost all needs. But with C++ streams, the |
| 33 | type-conversion character, crucial to determine the end of a directive, |
| 34 | is only useful to set some associated formatting options, in a C++ |
| 35 | streams context (%x for setting hexa, etc..) It would be better to make |
| 36 | this obligatory type-conversion character, with modified meaning, |
| 37 | optional.</li> |
| 38 | |
| 39 | <li>extend printf syntax while maintaining compatibility, by using |
| 40 | characters and constructs not yet valid as printf syntax. e.g. : "%1%", |
| 41 | "%[1]", "%|1$d|", .. Using begin / end marks, all sort of extension can |
| 42 | be considered.</li> |
| 43 | |
| 44 | <li>Provide a non-legacy mode, in parallel of the printf-compatible one, |
| 45 | that can be designed to fit other objectives without constraints of |
| 46 | compatibilty with the existing printf syntax.<br> |
| 47 | But Designing a replacement to printf's syntax, that would be clearly |
| 48 | better, and as much powerful, is yet another task than building a format |
| 49 | class. When such a syntax is designed, we should consider splitting |
| 50 | Boost.format into 2 separate libraries : one working hand in hand with |
| 51 | this new syntax, and another supporting the legacy syntax (possibly a |
| 52 | fast version, built with safety improvement above snprintf or the |
| 53 | like).</li> |
| 54 | </ol>In the absence of a full, clever, new syntax clearly better adapted to |
| 55 | C++ streams than printf, the second approach was chosen. Boost.format uses |
| 56 | printf's syntax, with extensions (tabulations, centered alignements) that |
| 57 | can be expressed using extensions to this syntax.<br> |
| 58 | And alternate compatible notations are provided to address the weaknesses |
| 59 | of printf's : |
| 60 | |
| 61 | <ul> |
| 62 | <li><i>"%<b>N</b>%"</i> as a simpler positional, typeless and optionless |
| 63 | notation.</li> |
| 64 | |
| 65 | <li><i>%|spec|</i> as a way to encapsulate printf directive in movre |
| 66 | visually evident structures, at the same time making printf's |
| 67 | 'type-conversion character' optional.</li> |
| 68 | </ul> |
| 69 | <hr> |
| 70 | |
| 71 | <h3>Why are arguments passed through an operator rather than a function |
| 72 | call ?</h3><br> |
| 73 | The inconvenience of the operator approach (for some people) is that it |
| 74 | might be confusing. It's a usual warning that too much of overloading |
| 75 | operators gets people real confused.<br> |
| 76 | Since the use of format objects will be in specific contexts ( most often |
| 77 | right after a "cout << ") and look like a formatting string followed |
| 78 | by arguments indeed : |
| 79 | |
| 80 | <blockquote> |
| 81 | <pre> |
| 82 | format(" %s at %s with %s\n") % x % y % z; |
| 83 | </pre> |
| 84 | </blockquote>we can hope it wont confuse people that much. |
| 85 | |
| 86 | <p>An other fear about operators, is precedence problems. What if I someday |
| 87 | write <b>format("%s") % x+y</b><br> |
| 88 | instead of <i>format("%s") % (x+y)</i> ??<br> |
| 89 | It will make a mistake at compile-time, so the error will be immediately |
| 90 | detected.<br> |
| 91 | indeed, this line calls <i>tmp = operator%( format("%s"), x)</i><br> |
| 92 | and then <i>operator+(tmp, y)</i><br> |
| 93 | tmp will be a format object, for which no implicit conversion is defined, |
| 94 | and thus the call to operator+ will fail. (except if you define such an |
| 95 | operator, of course). So you can safely assume precedence mistakes will be |
| 96 | noticed at compilation.</p> |
| 97 | |
| 98 | <p><br> |
| 99 | On the other hand, the function approach has a true inconvenience. It needs |
| 100 | to define lots of template function like :</p> |
| 101 | |
| 102 | <blockquote> |
| 103 | <pre> |
| 104 | template <class T1, class T2, .., class TN> |
| 105 | string format(string s, const T1& x1, .... , const T1& xN); |
| 106 | |
| 107 | </pre> |
| 108 | </blockquote>and even if we define those for N up to 500, that is still a |
| 109 | limitation, that C's printf does not have.<br> |
| 110 | Also, since format somehow emulates printf in some cases, but is far from |
| 111 | being fully equivalent to printf, it's best to use a radically different |
| 112 | appearance, and using operator calls succeeds very well in that ! |
| 113 | |
| 114 | <p><br> |
| 115 | Anyhow, if we actually chose the formal function call templates system, it |
| 116 | would only be able to print Classes T for which there is an</p> |
| 117 | |
| 118 | <blockquote> |
| 119 | <pre> |
| 120 | operator<< ( stream, const T&) |
| 121 | </pre> |
| 122 | </blockquote>Because allowing both const and non const produces a |
| 123 | combinatorics explosion - if we go up to 10 arguments, we need 2^10 |
| 124 | functions.<br> |
| 125 | (providing overloads on T& / const T& is at the frontier of defects |
| 126 | of the C++ standard, and thus is far from guaranteed to be supported. But |
| 127 | right now several compilers support those overloads)<br> |
| 128 | There is a lot of chances that a class which only provides the non-const |
| 129 | equivalent is badly designed, but yet it is another unjustified restriction |
| 130 | to the user.<br> |
| 131 | Also, some manipulators are functions, and can not be passed as const |
| 132 | references. The function call approach thus does not support manipulators |
| 133 | well. |
| 134 | |
| 135 | <p>In conclusion, using a dedicated binary operator is the simplest, most |
| 136 | robust, and least restrictive mechanism to pass arguments when you can't |
| 137 | know the number of arguments at compile-time.</p> |
| 138 | <hr> |
| 139 | |
| 140 | <h3>Why operator% rather than a member function 'with(..)' |
| 141 | ??</h3>technically, |
| 142 | |
| 143 | <blockquote> |
| 144 | <pre> |
| 145 | format(fstr) % x1 % x2 % x3; |
| 146 | </pre> |
| 147 | </blockquote>has the same structure as |
| 148 | |
| 149 | <blockquote> |
| 150 | <pre> |
| 151 | format(fstr).with( x1 ).with( x2 ).with( x3 ); |
| 152 | </pre> |
| 153 | </blockquote>which does not have any precedence problem. The only drawback, |
| 154 | is it's harder for the eye to catch what is done in this line, than when we |
| 155 | are using operators. calling .with(..), it looks just like any other line |
| 156 | of code. So it may be a better solution, depending on tastes. The extra |
| 157 | characters, and overall cluttered aspect of the line of code using |
| 158 | 'with(..)' were enough for me to opt for a true operator. |
| 159 | <hr> |
| 160 | |
| 161 | <h3>Why operator% rather than usual formatting operator<< ??</h3> |
| 162 | |
| 163 | <ul> |
| 164 | <li>because passing arguments to a format object is *not* the same as |
| 165 | sending variables, sequentially, into a stream, and because a format |
| 166 | object is not a stream, nor a manipulator.<br> |
| 167 | We use an operator to pass arguments. format will use them as a |
| 168 | function would, it simply takes arguments one by one.<br> |
| 169 | format objects can not provide stream-like behaviour. When you try to |
| 170 | implement a format object that acts like a manipulator, returning a |
| 171 | stream, you make the user beleive it is completely like a |
| 172 | stream-manipulator. And sooner or later, the user is deceived by this |
| 173 | point of view.<br> |
| 174 | The most obvious example of that difference in behaviour is |
| 175 | |
| 176 | <blockquote> |
| 177 | <pre> |
| 178 | cout << format("%s %s ") << x; |
| 179 | cout << y ; // uh-oh, format is not really a stream manipulator |
| 180 | </pre> |
| 181 | </blockquote> |
| 182 | </li> |
| 183 | |
| 184 | <li>precedence of % is higher than that of <<. It can be viewd as a |
| 185 | problem, because + and - thus needs to be grouped inside parentheses, |
| 186 | while it is not necessary with '<<'. But if the user forgets, the |
| 187 | mistake is catched at compilation, and hopefully he won't forget |
| 188 | again.<br> |
| 189 | On the other hand, the higher precedence makes format's behaviour very |
| 190 | straight-forward. |
| 191 | |
| 192 | <blockquote> |
| 193 | <pre> |
| 194 | cout << format("%s %s ") % x % y << endl; |
| 195 | </pre> |
| 196 | </blockquote>is treated exaclt like : |
| 197 | |
| 198 | <blockquote> |
| 199 | <pre> |
| 200 | cout << ( format("%s %s ") % x % y ) << endl; |
| 201 | </pre> |
| 202 | </blockquote>So using %, the life of a format object does not interfere |
| 203 | with the surrounding stream context. This is the simplest possible |
| 204 | behaviour, and thus the user is able to continue using the stream after |
| 205 | the format object.<br> |
| 206 | <br> |
| 207 | With operator<<, things are much more problematic in this |
| 208 | situation. This line : |
| 209 | |
| 210 | <blockquote> |
| 211 | <pre> |
| 212 | cout << format("%s %s ") << x << y << endl; |
| 213 | </pre> |
| 214 | </blockquote>is understood as : |
| 215 | |
| 216 | <blockquote> |
| 217 | <pre> |
| 218 | ( ( ( cout << format("%s %s ") ) << x ) << y ) << endl; |
| 219 | </pre> |
| 220 | </blockquote>Several alternative implementations chose |
| 221 | operator<<, and there is only one way to make it work :<br> |
| 222 | the first call to |
| 223 | |
| 224 | <blockquote> |
| 225 | <pre> |
| 226 | operator<<( ostream&, format const&) |
| 227 | </pre> |
| 228 | </blockquote>returns a proxy, encapsulating both the final destination |
| 229 | (cout) and the format-string information<br> |
| 230 | Passing arguments to format, or to the final destination after |
| 231 | completion of the format are indistinguishable. This is a problem. |
| 232 | |
| 233 | <p>I examined several possible implementations, and none is completely |
| 234 | satsifying.<br> |
| 235 | E.g. : In order to catch users mistake, it makes sense to raise |
| 236 | exceptions when the user passes too many arguments. But in this |
| 237 | context, supplementary arguments are most certainly aimed at the final |
| 238 | destination. There are several choices here :</p> |
| 239 | |
| 240 | <ul> |
| 241 | <li>You can give-up detection of arity excess, and have the proxy's |
| 242 | template member operator<<( const T&) simply forward all |
| 243 | supplementary arguments to cout.</li> |
| 244 | |
| 245 | <li>Require the user to close the format arguments with a special |
| 246 | manipulator, 'endf', in this way : |
| 247 | |
| 248 | <blockquote> |
| 249 | <pre> |
| 250 | cout << format("%s %s ") << x << y << endf << endl; |
| 251 | </pre> |
| 252 | </blockquote>You can define endf to be a function that returns the |
| 253 | final destination stored inside the proxy. Then it's okay, after |
| 254 | endf the user is calling << on cout again. |
| 255 | </li> |
| 256 | |
| 257 | <li>An intermediate solution, is to adress the most frequent use, |
| 258 | where the user simply wants to output one more manipulator item to |
| 259 | cout (a std::flush, or endl, ..) |
| 260 | |
| 261 | <blockquote> |
| 262 | <pre> |
| 263 | cout << format("%s %s \n") << x << y << flush ; |
| 264 | </pre> |
| 265 | </blockquote>Then, the solution is to overload the operator<< |
| 266 | for manipulators. This way You don't need endf, but outputting a |
| 267 | non-manipulator item right after the format arguments is a mistake. |
| 268 | </li> |
| 269 | </ul><br> |
| 270 | The most complete solution is the one with the endf manipualtor. With |
| 271 | operator%, there is no need for this end-format function, plus you |
| 272 | instantly see which arguments are going into the format object, and |
| 273 | which are going to the stream. |
| 274 | </li> |
| 275 | |
| 276 | <li>Esthetically : '%' is the same letter as used inside the |
| 277 | format-string. That is quite nice to have the same letter used for |
| 278 | passing each argument. '<<' is 2 letters, '%' is one. '%' is also |
| 279 | smaller in size. It overall improves visualisation (we see what goes with |
| 280 | what) : |
| 281 | |
| 282 | <blockquote> |
| 283 | <pre> |
| 284 | cout << format("%s %s %s") %x %y %z << "And avg is" << format("%s\n") %avg; |
| 285 | </pre> |
| 286 | </blockquote>compared to : |
| 287 | |
| 288 | <blockquote> |
| 289 | <pre> |
| 290 | cout << format("%s %s %s") << x << y << z << endf <<"And avg is" << format("%s\n") << avg; |
| 291 | </pre> |
| 292 | </blockquote>"<<" misleadingly puts the arguments at the same |
| 293 | level as any object passed to the stream. |
| 294 | </li> |
| 295 | |
| 296 | <li>python also uses % for formatting, so you see it's not so "unheard |
| 297 | of" ;-)</li> |
| 298 | </ul> |
| 299 | <hr> |
| 300 | |
| 301 | <h3>Why operator% rather than operator(), or operator[] ??</h3> |
| 302 | |
| 303 | <p>operator() has the merit of being the natural way to send an argument |
| 304 | into a function. And some think that operator[] 's meaning apply well to |
| 305 | the usage in format.<br> |
| 306 | They're as good as operator% technically, but quite ugly. (that's a matter |
| 307 | of taste)<br> |
| 308 | And deepd down, using operator% for passing arguments that were referred to |
| 309 | by "%" in the format string seems much more natural to me than using those |
| 310 | operators.</p> |
| 311 | <hr> |
| 312 | |
| 313 | <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= |
| 314 | "../../../doc/images/valid-html401.png" alt="Valid HTML 4.01 Transitional" |
| 315 | height="31" width="88"></a></p> |
| 316 | |
| 317 | <p>Revised |
| 318 | <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->02 December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38510" --></p> |
| 319 | |
| 320 | <p><i>Copyright © 2001 Samuel Krempp</i></p> |
| 321 | |
| 322 | <p><i>Distributed under the Boost Software License, Version 1.0. (See |
| 323 | accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or |
| 324 | copy at <a href= |
| 325 | "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> |
| 326 | </body> |
| 327 | </html> |