Markup
languages
Some people
consider XML for a standardisation device, let's
say for parsing, validation, editing, querying,
transformation or storage. XML tools and processors
are indeed available for a variety of systems and
programming languages. XML serves a purpose indeed,
but XML being in the fashion, it is mandated far
too often that it should, nowadays. One should
resort to XML only for applications needing
to handle complex structures between heterogenous
and incompatible systems and languages.
There are
XML-fanatics — the same as there are
Unicode-fanatics ☺. A while ago, I've seen a pushy
trend by which /etc/passwd and a lot of
other simple system files would be reformulated as
XML, to be XML all over. (Maybe after an
announcement of Microsoft — real or made-up, I do
not know — that they aim such a thing for the next
MS-Windows.)
standardised is like a buzzword, here. This
is no more advantage being
XML-standardised for the only sake of being
XML-standardised than going XML for the only
sake of going XML. There should be a real, sounded,
clear reason to go XML for an application, and
administrators should firmly resist XML until such
reason has been demonstrated.
Note that
if I'm speaking about Python, below, this is merely
as the language I most often use in these times,
but the same considerations might likely apply to
many other languages.
Any file is
a hierarchy of some sort. We often see a file being
a sequence of lines, a line being a sequence of
fields or tokens, and tokens being a sequence of
characters. In many, many, really many
applications, this organisation in lines and fields
is wholly satisfactory. Reusing the enumeration
above, it is easy to parse, easy to validate, easy
to edit, easy to query, easy to transform and easy
to store. Let's be honest. People are comfortable
with lines and fields, examples and tools merely
abound. XML becomes more sensible when you
have a lot of structure, something which is
complex, difficult, and which you have to exchange
with away parties. For simple things, it is just
annoying and heavy overkill, really...
Some might
wonder what benefits could be gained from coming up
with one's own formats. For simple things, this is
obvious: ease, speed, simplicity, readability.
Don't fear it. The world will survive, you know,
even if you sometimes don't use XML. ☺
As Guido
van Rossum once put it:
- The most important user of a config file
is not the programmer who has to get data out of
it; the most important user is the user who has
to edit the config file. The outrageous verbosity
of XML makes the above example a complete
usability liability.
- Now, if you're talking about config files
that represent options that the user edits in a
convenient application-specific options dialog,
that's a different story; I think XML is
well-suited for that; but I'm talking about the
classic configuration file pattern where you use
your favorite flat-file text editor to edit the
options file. In that situation, using XML is
insane.
XML may
sometimes be useful when a configuration
has to be a tree of thick (contents heavy)
nodes. For simpler configurations files, and this
covers a flurry of cases, Guido above expresses my
feelings exactly.
We should
not go overboard with XML. Configuration files with
lines and words for structuring of a two-level
hierarchy have long proven their value.
ConfigParser .ini files add a third level to
these. and with good sense, we can still go another
long way without resorting to XML.
Besides,
when configuration files are going to have some
more complex structure, but explicitly made to
drive big applications written in Python — a common
case for most of us presumably — I found out that
Python itself is a very convenient and powerful
tool for expressing such parameterisation. Users
can easily modify a carefully designed
Python-written configuration template without
knowing a bit about Python itself. If they know the
concepts of the applications they are configuring,
in my experience at least, they quickly get what
needs to be grasped about the bits of syntax to
respect and follow. With proper care, this can be
made incredibly powerful, while staying
way more legible than XML. Also compare
speed and simplicity, for the application
programmer, for bringing such configuration in
memory, all typing concerns already addressed.
Of course,
XML better inter-operates with foreign programming
languages and systems, and would undoubtedly allow
easier communication with alien races from
neighbouring galaxies ☺. But deep down, I do not
think that this need in configuration neutrality is
common enough to warrant the added complexity, and
loss of legibility, for most practical cases.
As a Python
lover, I'm tempted to prefer easy and
legible over standardised wherever I
can, wherever they contradict one another. People
who consider XML as legible most usually confuse
the meanings of legible and
decipherable, but they are still speakable.
On the other hand, XML fanatics confuse much more
widely and inconsiderately.
Speaking
for my own situation only, as a Python lover, XML
is gross overkill even for quite complex things. It
is extremely simple to pickle rather complex
structures, transmit them over wires to
applications on other machines, and unpickle them
there. Using Python as an API for such usages is
natural and very comfortable, and not to say,
immensely faster than XML.
Of course,
I would prefer XML is I had to speak outside a
Python environment, with people offering nothing
simpler than an XML interfaces. I've looked into
some of these fashioned avenues. So far, they
invariably seem extremely complex and hairy to me,
at least for what they provide. XML is there to
give users a reinsurance on the fact they have a
last-resort control, after all, to inspect what is
going on, or to intervene if they ever need to. So,
for them, I really understand how valuable XML may
be. It's a good thing.
In my
simple situations, Python is much, much better than
XML as a solution. Moreover, if I have no choice
than communicate with an outside world groking XML,
Python offers me a good set of XML tools and
interfaces ! For one, when I really need a marking
language for my users, without having XML imposed
from the outside, SGML is often a better solution,
as it is closer to humans than XML. There might
also be better solutions than SGML, too. XML is
mainly there to help implementors. Surely, I like
humans far more than I like machines, and this
feeling mainly drives my efforts. ☺
Some people
argue that XML is more readable than pickles. I can
read a pickle with Python and dump it as a
readable, pretty-printed Python structure.
Conversely, I can have Python to read in a text
containing the source of a Python structure, and
produce a pickle from the result. The programs to
do so are very small. In practice, I do not
maintain my original structures as pickles, but
rather as very straight Python source files
containing about nothing but data structures. These
are easy to edit.
For most
people, even to those not familiar with Python, a
Python structured constant is probably easier to
read that any XML rendering of it. Python will
parse and validate it for me. I can save the
original text, or the pickle if I prefer so, in
files and databases, and transmit either over
networks. Python, the language, is also a wonderful
and probably unequalled generic framework for
transforming data structures.
About being
compatible with nearly any language
off-the-shelf (as someone put it), of course,
this is where some difficulty may rise. Until I
have this problem, I'll rather stay comfortable
with Python than push myself into miseries I do not
need. Then, I could decide to transmit either lines
and fields, or XML if available at the other end,
or even analyse and produce source or data files
for the other language, probably all from within
Python.
|