Tangentium

January '04: Menu

This month's introduction
Feature essay 1: Language Engineering
Feature essay 2: The Digital Learning Divide
Supplementary essay 1: Electronic Text Technologies
Supplementary essay 2: Language, knowledge and exclusion
Key terms defined
Snippets
Reviewed links and resources
Main menu

All material on this site remains © the original authors: please see our submission guidelines for more information. If no author is shown material is © Drew Whitworth. For any reproduction beyond fair dealing, permission must be sought: e-mail drew@comp.leeds.ac.uk.

ISSN number: 1746-4757

Electronic Text Technologies

Drew Whitworth (based on materials created by Martin Thomas)

Page 1 ¦ Page 2 ¦ Page 3 ¦ Printer-friendly version

Like HTML, XML is a simplified derivative of SGML, but it adheres more closely to the principles of structural markup, and more effectively enforces proper practice on those who use it to markup text. As documents marked up in XML are more reliable than with HTML, it is now possible to use a further technology, XSLT (Extensible Stylesheet Language Transformations) to take an XML document and transform it into a variety of formats. HTML (web-friendly) would be one, but there is no reason why a second XSLT transformation could not be performed on the same file and produce content suitable for transmission to a mobile phone, say, or a digital television.

It is through this kind of application that suggestions made by Kevin Carey in this month's feature essay might be approached. To take one of his more involved examples, it may well be possible for an interpreted or "plain English" version of the Maastricht Treaty to be derived from the "pure" version. This, or similar projects, require two things:

The willingness to go through a source document and mark it up appropriately
The willingness to develop a text-handling application which could interpret this markup and then create the alternative version of the document according to certain rules.

For a document as large and complex as the Maastricht Treaty this would not be a straightforward task! "Machine translation" between languages is a technique that is only just beginning to be developed, and its limitations would merely be replicated when trying to automate translation "within" a language. Nevertheless there are great improvements being made here and in any case, what is presently lacking (as Carey suggests) is not technology but the political will to spend time and energy providing alternative versions of text. Even HTML can quite easily encode hypertext links, as anyone who has browsed the Web will know. Therefore, placing source documents online, and linking to them from any other document which draws on that source, is straightforward if the will is there to do it. As this month's second supplementary essay argues, however, there are often political reasons to withhold sources.

Like many applications of ICT, exploiting the possibilities of electronic text is potentially democratising, for the reasons Carey describes. The convergence of computing and other technologies, such as TV or mobile phones, and the ability of XML/XSLT to produce real multimedia content, may even answer some critics who point to the increasing migration of important information to a medium (computing) which is expensive to everyone, and inaccessible to many. Electronic texts are more accessible; more analysable; more searchable; and more flexible than traditional printed words.

Yet nor can the drawbacks of electronic text be ignored. Truly open texts, accessible (and perhaps even adjustable) by all, are firstly a threat to copyright and intellectual property: secondly they may be politically sensitive. The book remains a very convenient medium for the storage of text, despite its rather one-dimensional nature. And as is becoming clear, the mere provision of information does not necessarily provoke the revitalisation of the public sphere which a more democratic, political society requires. (For reasons why, look at almost any other issue of Tangentium, as this is, essentially, our central theme.) At the present time intermediaries would still be required to both markup text in the first place, and write tools which can interpret that markup: and any intermediary will insert their own assumptions and prejudices between the author and reader (as Carey observes). In the end we must remember that though "XML may help humans predict what information might lie 'between the tags'", computers have no intrinsic understanding of their own, and ultimately, to a computer "<trunk> and <i> and <booktitle> are all equally (and totally) meaningless" (both quotes from Robin Cover, XML and Semantic Transparency). Computers cannot invest any text, electronic or not, with meaning. Only humans can do that.

Despite all this, there seem to be certain types of text for which digitisation is eminently suited - public documents, as Carey notes. This essay has summarised the technologies which already exist to simplify and widen access to these documents - in all the multi-faceted ways which truly democratic "access" requires. Applying these to our public information sphere is now a political challenge as much, or more than, it is a technological one.

Back to the top of this page