| And how would you improve on it?

A brief summary, then: Remove the syntactic mess that is attributes.

(You will then find that you do not need them at all.) Enclose the

/element/ in matching delimiters, not the tag. These simple things

makes people think differently about how they use the language.

Contrary to the foolish notion that syntax is immaterial, people

optimize the way they express themselves, and so express themselves

differently with different syntaxes. Next, introduce macros that

look exactly like elements, but that are expanded in place between

the reader and the “object model”. Then, remove the obnoxious

character entities and escape special characters with a single

character, like \, and name other entities with letters following

the same character. If you need a rich set of publishing symbols,

discover Unicode. Finally, introduce a language for micro-parsers

than can take more convenient syntaxes for commonly used elements

with complex structure and make them /return/ element structures

more suitable for processing on the receiving end, and which would

also make validation something useful. The overly simple regular

expression look-alike was a good idea when processing was expensive

and made all decisions at the start-tag, but with a DOM and less

stream-like processing, a much better language should be specified

that could also do serious computation before validating a document

– so that once again processing could become cheaper because of the

“markup”, not more expensive because of it.

But the one thing I would change the most from a markup language

suitable for marking up the incidental instruction to a type-setter

to the data representation language suitable for the “market” that

XML wants, is to go for a binary representation. The reasons for

/not/ going binary when SGML competed with ODA have been reversed:

When information should survive changes in the software, it was an

important decision to make the data format verbose enough that it

was easy to implement a processor for it and that processors could

liberally accept what other processors conservatively produced, but

now that the data formats that employ XML are so easily changed

that the software can no longer keep up with it, we need to slam on

the breaks and tell the redefiners to curb their enthusiasm, get it

right before they share their experiments with the world, and show

some respect for their users. One way to do that is to increase the

cost of changes to implementations without sacrificing readability

and without making the data format more “brittle”, by going binary.

Our information infrastructure has become so much better that the

nature of optimization for survivability has changed qualitatively.

The question of what we humans need to read and write no longer has

any bearing on what the computers need to work with. One of the

most heinous crimes against computing machinery is therefore to

force them to parse XML when all they want is the binary data. As

an example, think of the Internet Protocol and Transmission Control

Protocol in XML terms. Implementors of SNMP regularly complained

that parsing the ASN.1 encodings took a disproportionate amount of

processing time, but they also acknowledged that properly done, it

mapped directly to the values they needed to exchange. Now, think

of what would have happened had it not been a Simple, but instead

some moronic excuse for an eXtensible Network Management Protocol.

Another thing is that we have long had amazingly rich standards for

such “display attributes” as many now use HTML and the like. The

choice to use SGML for web publication was not entirely braindead,

but it should have been obvious from the outset that page display

would become important, if not immediately, then after watching what

people were trying to do with HTML. The Web provided me with a much

needed realization that information cannot be /fully/ separated from

its presentation, and showed me something I knew without verbalizing

explicitly, that the presentation form we choose communicates real

information. Encoding all of it via markup would require a very

fine level of detail, not to mention /awareness/ of issues so widely

dispersed in the population that only a handful of people per

million grasp them. Therefore, to be successful, there must be an

upper limit to the complexity of the language defined with SGML, and

one must go on to solve the next problem, not sit idle with a set of

great tools and think “I ought to use these tools for something”.

Stultifying as the language of content models may be, it amazes me

that people do not grasp that they need to use something else when

it becomes too painful to express with SGML, but I am in the highly

privileged position of knowing a lot more than SGML when I pronounce

my judgment on XML. For one thing, I knew Lisp before I saw SGML,

so I know what brilliant minds can do under optimal conditions and

when they ensure that the problem is still bigger than the solution.

Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.

Act from faith, and failure makes you blame someone and push harder.

via Erik Naggum’s XML rant.

0 Comments