Markup languagesContents
Some people consider XML for a standardisation device, let's
say for parsing, validation, editing, querying, transformation or
storage. XML tools and processors are indeed available for a
variety of systems and programming languages. XML serves a
purpose indeed, but XML being in the fashion, it is mandated far
too often that it should, nowadays. One should resort to XML
only for applications needing to handle complex
structures between heterogenous and incompatible systems and
languages.
There are XML-fanatics — the same as there are
Unicode-fanatics ☺. A while ago, I've seen a pushy trend by which
/etc/passwd and a lot of other simple system files
would be reformulated as XML, to be XML all over. (Maybe after an
announcement of Microsoft — real or made-up, I do not know — that
they aim such a thing for the next MS-Windows.)
standardised is like a buzzword, here. This is no
more advantage being XML-standardised for the only
sake of being XML-standardised than going XML for the
only sake of going XML. There should be a real, sounded, clear
reason to go XML for an application, and administrators should
firmly resist XML until such reason has been demonstrated.
Note that if I'm speaking about Python, below, this is merely
as the language I most often use in these times, but the same
considerations might likely apply to many other languages.
Any file is a hierarchy of some sort. We often see a file
being a sequence of lines, a line being a sequence of fields or
tokens, and tokens being a sequence of characters. In many, many,
really many applications, this organisation in lines and fields
is wholly satisfactory. Reusing the enumeration above, it is easy
to parse, easy to validate, easy to edit, easy to query, easy to
transform and easy to store. Let's be honest. People are
comfortable with lines and fields, examples and tools merely
abound. XML becomes more sensible when you have a
lot of structure, something which is complex, difficult,
and which you have to exchange with away parties. For simple
things, it is just annoying and heavy overkill, really...
Some might wonder what benefits could be gained from coming up
with one's own formats. For simple things, this is obvious: ease,
speed, simplicity, readability. Don't fear it. The world will
survive, you know, even if you sometimes don't use XML. ☺
As Guido van Rossum once put it:
- The most important user of a config file is not the
programmer who has to get data out of it; the most important
user is the user who has to edit the config file. The
outrageous verbosity of XML makes the above example a complete
usability liability.
- Now, if you're talking about config files that represent
options that the user edits in a convenient
application-specific options dialog, that's a different story;
I think XML is well-suited for that; but I'm talking about the
classic configuration file pattern where you use your favorite
flat-file text editor to edit the options file. In that
situation, using XML is insane.
XML may sometimes be useful when a configuration
has to be a tree of thick (contents heavy) nodes. For
simpler configurations files, and this covers a flurry of cases,
Guido above expresses my feelings exactly.
We should not go overboard with XML. Configuration files with
lines and words for structuring of a two-level hierarchy have
long proven their value. ConfigParser .ini files add
a third level to these. and with good sense, we can still go
another long way without resorting to XML.
Besides, when configuration files are going to have some more
complex structure, but explicitly made to drive big applications
written in Python — a common case for most of us presumably — I
found out that Python itself is a very convenient and powerful
tool for expressing such parameterisation. Users can easily
modify a carefully designed Python-written configuration template
without knowing a bit about Python itself. If they know the
concepts of the applications they are configuring, in my
experience at least, they quickly get what needs to be grasped
about the bits of syntax to respect and follow. With proper care,
this can be made incredibly powerful, while staying
way more legible than XML. Also compare speed and
simplicity, for the application programmer, for bringing such
configuration in memory, all typing concerns already
addressed.
Of course, XML better inter-operates with foreign programming
languages and systems, and would undoubtedly allow easier
communication with alien races from neighbouring galaxies ☺. But
deep down, I do not think that this need in configuration
neutrality is common enough to warrant the added complexity, and
loss of legibility, for most practical cases.
As a Python lover, I'm tempted to prefer easy and
legible over standardised wherever I can,
wherever they contradict one another. People who consider XML as
legible most usually confuse the meanings of
legible and decipherable, but they are still
speakable. On the other hand, XML fanatics confuse much more
widely and inconsiderately.
Speaking for my own situation only, as a Python lover, XML is
gross overkill even for quite complex things. It is extremely
simple to pickle rather complex structures, transmit them over
wires to applications on other machines, and unpickle them there.
Using Python as an API for such usages is natural and very
comfortable, and not to say, immensely faster than XML.
Of course, I would prefer XML is I had to speak outside a
Python environment, with people offering nothing simpler than an
XML interfaces. I've looked into some of these fashioned avenues.
So far, they invariably seem extremely complex and hairy to me,
at least for what they provide. XML is there to give users a
reinsurance on the fact they have a last-resort control, after
all, to inspect what is going on, or to intervene if they ever
need to. So, for them, I really understand how valuable XML may
be. It's a good thing.
In my simple situations, Python is much, much better than XML
as a solution. Moreover, if I have no choice than communicate
with an outside world groking XML, Python offers me a good set of
XML tools and interfaces ! For one, when I really need a marking
language for my users, without having XML imposed from the
outside, SGML is often a better solution, as it is closer to
humans than XML. There might also be better solutions than SGML,
too. XML is mainly there to help implementors. Surely, I like
humans far more than I like machines, and this feeling mainly
drives my efforts. ☺
Some people argue that XML is more readable than pickles. I
can read a pickle with Python and dump it as a readable,
pretty-printed Python structure. Conversely, I can have Python to
read in a text containing the source of a Python structure, and
produce a pickle from the result. The programs to do so are very
small. In practice, I do not maintain my original structures as
pickles, but rather as very straight Python source files
containing about nothing but data structures. These are easy to
edit.
For most people, even to those not familiar with Python, a
Python structured constant is probably easier to read that any
XML rendering of it. Python will parse and validate it for me. I
can save the original text, or the pickle if I prefer so, in
files and databases, and transmit either over networks. Python,
the language, is also a wonderful and probably unequalled generic
framework for transforming data structures.
About being compatible with nearly any language
off-the-shelf (as someone put it), of course, this is
where some difficulty may rise. Until I have this problem, I'll
rather stay comfortable with Python than push myself into
miseries I do not need. Then, I could decide to transmit either
lines and fields, or XML if available at the other end, or even
analyse and produce source or data files for the other language,
probably all from within Python.
Last modified: 2009-05-17 07:06
|