Click here to return to the  front page

Tangentium

 

January '04: Menu



All material on this site remains © the original authors: please see our submission guidelines for more information. If no author is shown material is © Drew Whitworth. For any reproduction beyond fair dealing, permission must be sought: e-mail drew@comp.leeds.ac.uk.

ISSN number: 1746-4757

 

Electronic Text Technologies

Drew Whitworth (based on materials created by Martin Thomas)

Page 1 ¦ Page 2 ¦ Page 3 ¦ Printer-friendly version

Computers were originally invented to process numerical data. Over time their functions have evolved and diffused into their modern forms. Number-crunching still takes place of course, but networking has also turned computers into communications devices. Whereas networking was originally developed so computers could talk to each other, it has become more common for people to use computers to talk to other people. Number-crunching and CMC (computer-mediated communication) probably constitute the main applications of ICT in most people's minds.

However, an equally important function of modern ICT is in the handling of electronic text. Indeed, without the set of software technologies and standards which have developed in this area, one of the main media of CMC - the World Wide Web - would not exist. Nor could any of the democratising technologies suggested by Kevin Carey in this month's first feature essay work either.

Yet the history and workings of electronic text are often unfamiliar to the general public, even when they use ICT regularly. This essay is intended to act as a brief summary of these issues. For more detail the reader is directed to resources listed on this month's page of links.

The fundamental principle of electronic text is that of markup. Without markup, electronic text handling would be limited merely to character encoding: that is, standards for translating binary digits 0001001100101111 into letters and other symbols. Markup actually predates computing, with punctuation being the most familiar form. Punctuation turns writing from a mere string of words into sentences with varying meanings. Symbols such as " " ? and ! are not actually spoken, but can influence meaning and intonation, "marking" a particular sentence as a quotation, question or exclamation respectively.

At another level, meaning is influenced by context. Many words have ambiguous meanings which depend on the context, and this applies even more strongly to certain technical terms or acronyms. For example, the abbreviation AA is defined in Chambers' English Dictionary as having any of the following meanings - doubtless there are more in other specialist contexts:

  • Automobile Association
  • Alcoholics Anonymous
  • Architectural Association
  • Associate of Arts
  • Australian Army
  • in equal quantities of each [on prescriptions]

For different reasons, there is no guarantee that either a human or a machine interpreter would make the right selection from this (or any similar) list. Misinterpretations are a fact of human communication; our logic systems are fuzzy, our usage often sloppy, we make mistakes. More seriously, we throw up boundaries around certain fields by the use of jargon, buzzwords or slogans (see this month's second supplementary essay). Computers, on the other hand, follow rules, but their ability to interpret context is limited (non-existent, in the case of deriving meaning from intonation and body language in spoken text).

Markup compensates for some of these problems. Simply, the computer is not left to its own devices when it tries to interpret a text. The author of the text explicitly tells the machine what a particular word, sentence or passage actually is and how it should be handled.

Broadly, markup takes two forms: presentational (or stylistic) markup, which encodes information about how the text should be displayed, and structural markup, which encodes information about what the text actually is. Presentational markup is important for technologies such as the WWW which rely more heavily on stylistic techniques (what font, colour scheme, layout etc. are to be used), but this is of less interest to us here than structural markup. The next page will go on to discuss some examples of this technique, and the technologies which have been developed over time to exploit it.

Continue to page 2

Back to the top of this page