{"id":5280,"date":"2012-05-06T00:18:13","date_gmt":"2012-05-05T16:18:13","guid":{"rendered":"http:\/\/onthe8spot.com\/?p=5280"},"modified":"2012-05-06T00:18:13","modified_gmt":"2012-05-05T16:18:13","slug":"erik-naggums-xml-rant","status":"publish","type":"post","link":"http:\/\/onthe8spot.com\/index.php\/2012\/05\/06\/erik-naggums-xml-rant\/","title":{"rendered":"Erik Naggum&#039;s XML rant"},"content":{"rendered":"<blockquote><p>| And how would you improve on it?<br \/>\nA brief summary, then: Remove the syntactic mess that is attributes.<br \/>\n(You will then find that you do not need them at all.) Enclose the<br \/>\n\/element\/ in matching delimiters, not the tag. These simple things<br \/>\nmakes people think differently about how they use the language.<br \/>\nContrary to the foolish notion that syntax is immaterial, people<br \/>\noptimize the way they express themselves, and so express themselves<br \/>\ndifferently with different syntaxes. Next, introduce macros that<br \/>\nlook exactly like elements, but that are expanded in place between<br \/>\nthe reader and the &#8220;object model&#8221;. Then, remove the obnoxious<br \/>\ncharacter entities and escape special characters with a single<br \/>\ncharacter, like \\, and name other entities with letters following<br \/>\nthe same character. If you need a rich set of publishing symbols,<br \/>\ndiscover Unicode. Finally, introduce a language for micro-parsers<br \/>\nthan can take more convenient syntaxes for commonly used elements<br \/>\nwith complex structure and make them \/return\/ element structures<br \/>\nmore suitable for processing on the receiving end, and which would<br \/>\nalso make validation something useful. The overly simple regular<br \/>\nexpression look-alike was a good idea when processing was expensive<br \/>\nand made all decisions at the start-tag, but with a DOM and less<br \/>\nstream-like processing, a much better language should be specified<br \/>\nthat could also do serious computation before validating a document<br \/>\n&#8212; so that once again processing could become cheaper because of the<br \/>\n&#8220;markup&#8221;, not more expensive because of it.<br \/>\nBut the one thing I would change the most from a markup language<br \/>\nsuitable for marking up the incidental instruction to a type-setter<br \/>\nto the data representation language suitable for the &#8220;market&#8221; that<br \/>\nXML wants, is to go for a binary representation. The reasons for<br \/>\n\/not\/ going binary when SGML competed with ODA have been reversed:<br \/>\nWhen information should survive changes in the software, it was an<br \/>\nimportant decision to make the data format verbose enough that it<br \/>\nwas easy to implement a processor for it and that processors could<br \/>\nliberally accept what other processors conservatively produced, but<br \/>\nnow that the data formats that employ XML are so easily changed<br \/>\nthat the software can no longer keep up with it, we need to slam on<br \/>\nthe breaks and tell the redefiners to curb their enthusiasm, get it<br \/>\nright before they share their experiments with the world, and show<br \/>\nsome respect for their users. One way to do that is to increase the<br \/>\ncost of changes to implementations without sacrificing readability<br \/>\nand without making the data format more &#8220;brittle&#8221;, by going binary.<br \/>\nOur information infrastructure has become so much better that the<br \/>\nnature of optimization for survivability has changed qualitatively.<br \/>\nThe question of what we humans need to read and write no longer has<br \/>\nany bearing on what the computers need to work with. One of the<br \/>\nmost heinous crimes against computing machinery is therefore to<br \/>\nforce them to parse XML when all they want is the binary data. As<br \/>\nan example, think of the Internet Protocol and Transmission Control<br \/>\nProtocol in XML terms. Implementors of SNMP regularly complained<br \/>\nthat parsing the ASN.1 encodings took a disproportionate amount of<br \/>\nprocessing time, but they also acknowledged that properly done, it<br \/>\nmapped directly to the values they needed to exchange. Now, think<br \/>\nof what would have happened had it not been a Simple, but instead<br \/>\nsome moronic excuse for an eXtensible Network Management Protocol.<br \/>\nAnother thing is that we have long had amazingly rich standards for<br \/>\nsuch &#8220;display attributes&#8221; as many now use HTML and the like. The<br \/>\nchoice to use SGML for web publication was not entirely braindead,<br \/>\nbut it should have been obvious from the outset that page display<br \/>\nwould become important, if not immediately, then after watching what<br \/>\npeople were trying to do with HTML. The Web provided me with a much<br \/>\nneeded realization that information cannot be \/fully\/ separated from<br \/>\nits presentation, and showed me something I knew without verbalizing<br \/>\nexplicitly, that the presentation form we choose communicates real<br \/>\ninformation. Encoding all of it via markup would require a very<br \/>\nfine level of detail, not to mention \/awareness\/ of issues so widely<br \/>\ndispersed in the population that only a handful of people per<br \/>\nmillion grasp them. Therefore, to be successful, there must be an<br \/>\nupper limit to the complexity of the language defined with SGML, and<br \/>\none must go on to solve the next problem, not sit idle with a set of<br \/>\ngreat tools and think &#8220;I ought to use these tools for something&#8221;.<br \/>\nStultifying as the language of content models may be, it amazes me<br \/>\nthat people do not grasp that they need to use something else when<br \/>\nit becomes too painful to express with SGML, but I am in the highly<br \/>\nprivileged position of knowing a lot more than SGML when I pronounce<br \/>\nmy judgment on XML. For one thing, I knew Lisp before I saw SGML,<br \/>\nso I know what brilliant minds can do under optimal conditions and<br \/>\nwhen they ensure that the problem is still bigger than the solution.<br \/>\n&#8212;<br \/>\nErik Naggum, Oslo, Norway<br \/>\nAct from reason, and failure makes you rethink and study harder.<br \/>\nAct from faith, and failure makes you blame someone and push harder.<br \/>\nvia <a href=\"http:\/\/www.schnada.de\/grapt\/eriknaggum-xmlrant.html\">Erik Naggum&#8217;s XML rant<\/a>.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>| And how would you improve on it? A brief summary, then: Remove the syntactic mess that is attributes. (You will then find that you do not need them at all.) Enclose the \/element\/ in matching delimiters, not the tag. These simple things makes people think differently about how they use the language. Contrary to &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/onthe8spot.com\/index.php\/2012\/05\/06\/erik-naggums-xml-rant\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Erik Naggum&#039;s XML rant&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[79],"tags":[],"class_list":["post-5280","post","type-post","status-publish","format-standard","hentry","category-reposts"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/posts\/5280","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/comments?post=5280"}],"version-history":[{"count":0,"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/posts\/5280\/revisions"}],"wp:attachment":[{"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/media?parent=5280"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/categories?post=5280"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/onthe8spot.com\/index.php\/wp-json\/wp\/v2\/tags?post=5280"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}