Erik Naggum's XML rant

| And how would you improve on it?
A brief summary, then: Remove the syntactic mess that is attributes.
(You will then find that you do not need them at all.) Enclose the
/element/ in matching delimiters, not the tag. These simple things
makes people think differently about how they use the language.
Contrary to the foolish notion that syntax is immaterial, people
optimize the way they express themselves, and so express themselves
differently with different syntaxes. Next, introduce macros that
look exactly like elements, but that are expanded in place between
the reader and the “object model”. Then, remove the obnoxious
character entities and escape special characters with a single
character, like \, and name other entities with letters following
the same character. If you need a rich set of publishing symbols,
discover Unicode. Finally, introduce a language for micro-parsers
than can take more convenient syntaxes for commonly used elements
with complex structure and make them /return/ element structures
more suitable for processing on the receiving end, and which would
also make validation something useful. The overly simple regular
expression look-alike was a good idea when processing was expensive
and made all decisions at the start-tag, but with a DOM and less
stream-like processing, a much better language should be specified
that could also do serious computation before validating a document
— so that once again processing could become cheaper because of the
“markup”, not more expensive because of it.
But the one thing I would change the most from a markup language
suitable for marking up the incidental instruction to a type-setter
to the data representation language suitable for the “market” that
XML wants, is to go for a binary representation. The reasons for
/not/ going binary when SGML competed with ODA have been reversed:
When information should survive changes in the software, it was an
important decision to make the data format verbose enough that it
was easy to implement a processor for it and that processors could
liberally accept what other processors conservatively produced, but
now that the data formats that employ XML are so easily changed
that the software can no longer keep up with it, we need to slam on
the breaks and tell the redefiners to curb their enthusiasm, get it
right before they share their experiments with the world, and show
some respect for their users. One way to do that is to increase the
cost of changes to implementations without sacrificing readability
and without making the data format more “brittle”, by going binary.
Our information infrastructure has become so much better that the
nature of optimization for survivability has changed qualitatively.
The question of what we humans need to read and write no longer has
any bearing on what the computers need to work with. One of the
most heinous crimes against computing machinery is therefore to
force them to parse XML when all they want is the binary data. As
an example, think of the Internet Protocol and Transmission Control
Protocol in XML terms. Implementors of SNMP regularly complained
that parsing the ASN.1 encodings took a disproportionate amount of
processing time, but they also acknowledged that properly done, it
mapped directly to the values they needed to exchange. Now, think
of what would have happened had it not been a Simple, but instead
some moronic excuse for an eXtensible Network Management Protocol.
Another thing is that we have long had amazingly rich standards for
such “display attributes” as many now use HTML and the like. The
choice to use SGML for web publication was not entirely braindead,
but it should have been obvious from the outset that page display
would become important, if not immediately, then after watching what
people were trying to do with HTML. The Web provided me with a much
needed realization that information cannot be /fully/ separated from
its presentation, and showed me something I knew without verbalizing
explicitly, that the presentation form we choose communicates real
information. Encoding all of it via markup would require a very
fine level of detail, not to mention /awareness/ of issues so widely
dispersed in the population that only a handful of people per
million grasp them. Therefore, to be successful, there must be an
upper limit to the complexity of the language defined with SGML, and
one must go on to solve the next problem, not sit idle with a set of
great tools and think “I ought to use these tools for something”.
Stultifying as the language of content models may be, it amazes me
that people do not grasp that they need to use something else when
it becomes too painful to express with SGML, but I am in the highly
privileged position of knowing a lot more than SGML when I pronounce
my judgment on XML. For one thing, I knew Lisp before I saw SGML,
so I know what brilliant minds can do under optimal conditions and
when they ensure that the problem is still bigger than the solution.

Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
via Erik Naggum’s XML rant.

Leave a Reply

Your email address will not be published. Required fields are marked *