Feature Essay: Language Engineering and Public Policy

Kevin Carey

This is a printer-friendly version of this month's first feature essay. Return to this month's introduction.

[EDITOR'S INTRODUCTION] In this feature essay, Kevin Carey discusses the ways in which IT can enhance the provision of information from public bodies. Computing technology, particularly the improvements it delivers in our ability to process and connect texts, opens up the possibility of implementing hypertext systems which may have genuine impacts on democracy and the freedom of information. Carey discusses why this is necessary, and the technological and social developments which will assist such a project.

Kevin Carey is the Director of HumanITy, a charity, lobbying organisation and think tank working to draw attention to the problem of social exclusion caused by the increased use of information technology.

This essay is an edited version of a speech made at the University of Sunderland.


The concept of hypertext is so fundamental to the World Wide Web that it appears in two of its most ubiquitous acronyms:

Yet the WWW as it currently stands is an extremely limited application of true hypertext. Connections between documents are made at the whim of the author, and often do not serve to increase understanding. What I want to talk about here is a possible implementation of hypertext which may have genuine benefits for democracy and the freedom of information within our society. First, however, I wish to spend some time setting the scene within which these new technologies have emerged.

Humans have always used technology to engineer the world around them. With information technology, our ability to do this now extends beyond the material world and into the realms of information and language. One of the reasons this has become necessary is the enormous increase in informational resources available to us. Up until the end of the 17th century it was perfectly possible for a figure such as John Milton, statesman and poet, to have read every book his culture thought significant. It is at this period, of Milton, Newton and Wren in England and of Descartes, Spinoza and Leibniz in Continental Europe that the first real signs of the impossibility of polymathy emerge. Why was this?

The proliferation of texts and knowledge arose from a variety of factors:

These all had an effect on cultural attitudes to literary resources. The way of writing a text changed. St. Augustine was regarded in his day as prolific, but his speed of writing was such that his great work, The City of God, was not so much a treatise with a proposition, discussion and conclusion as it was an intellectual journey. He began with a set of assumptions which stayed relatively intact, but the conclusions he drew from them changed radically through time, particularly after the sack of Rome in 410 CE. There was no leisure for a second edition.

By Shakespeare's time, when printing was some 200 years old, we see a man terribly careless with his manuscripts. Pirate copies of his plays circulated and within years of his death a number of editions of his work had been put together more or less professionally and honestly with an attempt to arrive at an authoritative text from a variety of sources. It became a subconscious trait in authors, editors and printers to work in the knowledge that any edition need not be the last word, intellectually or typographically.

Our contemporary word processing and web publishing packages are simply the last word on provisionality, psychologically if not physically. You only have to look at the quantity of ephemeral and authoritative prose being generated by each of us as students, academics and intellectuals to know that only a very few of us will be able to maintain Augustan quality. We adjust to, acquire and lose information at an immense pace. We are the first generation not only to produce more intellectual resources in one decade than the whole of human history before it but also the first to have forgotten more than we know.

This is the world of the constantly ephemeral, febrile and instant which forms the context for a public sector requirement for the applied resources of language engineering. We require this in order that those in the public policy sector, and politicians who formulate and implement public policy, can have a more effective dialogue with society.

In the present time, then, our relationship with text has become ephemeral and transient. It is true that in many cases we still hold to the idea of the "true" or "definitive" version of some text or other, but this is becoming increasingly rare. Rather, we become accustomed to revisions, interpretations, filters, intermediaries and so on, interposing themselves between reader and author. This has many implications, which I will explore under the subheadings which follow.

The Clarity of the Author's Intention

In October 2002 the Government put on the Order Paper of the House of Commons a debate on the "Reinvention of Urban Post offices". It turned out to be a debate on the closure of some 3,000 of these. However in the Minister's opening statement there was also the promise of matched funding for the Post Offices which did not close so they could develop their customer bases by providing a more modern and attractive environment for customers. The Opposition spokesman deliberately took "Reinvention" to be an Orwellian euphemism for closure. However one could also take it to be a deliberate attempt to focus on the package's positive side. Does it matter?

The text of the Minister's statement puts the word "Reinvention" into its proper framework as connected to the funding package. Nor did the Minister shy away from using the word "close" to describe what would have to happen to some Post Offices. Therefore, the Opposition's charge was flawed. Yet, if you were simply trying to use the headline on the Order Paper to get the essence of what was being proposed it would be entirely misleading. This is not a problem for people who exist for the consumption of small print but in the world of sound bites it is crucial. The Government made a decision to write a headline dealing with a secondary aspect of its package and therefore was attacked for misleading "spin".

At the other end of the spectrum, what about the authorial intention of parts of the Bible. Even without the theological disputes which rage about how far various texts are a contemporary ethical primer, there are issues around historicity, anthropology, poetics, language translation and emendation. Nor are these problems confined to ancient texts. In Salman Rushdie's novel Midnight's Children the reader is faced with a multitude of problems which are very similar.

In the digital world we may have to distinguish very clearly between what we think of as an artistic artefact and a collective artefact. There are two kinds of authorial intention: one individually and the other collectively owned. The first will be textually and structurally sacrosanct. The communal document, on the other hand, must be stripped of artistic pretensions and be crafted specifically so that it can be readily amplified or simplified. This division is a clear illustration of both the problem and the challenge for the preparation of collective documents which become public property.

The Requirement for Correspondence and Backwards Compatibility.

If a document is amplified or simplified it must correspond, and be traceable, to the initial source file. If a Government White Paper is edited to half the length, to avoid any accusation of manipulation the reader must be able to return to the file from which the simplification came. I have in mind here a multiple choice offer to readers something like this: a source (Government, or any other) offers a document in a variety of lengths, say 100%, 20% and 1% (or, the full text, the briefing paper and the executive summary). To engage in the compression of any text is often to make more difficult and to increase the quantity of jargon and acronyms. Therefore, the two shorter versions might have a specialist micro vocabulary with a glossary attached.

In addition we should consider the issue of interpolation or what I would call ecritology, the science of commentary and criticism. We may have consultations over White Papers but the answering facilities provided act only in respect of the questions posed. What about other commentaries on the original text? These are appropriate if a document is considered public property (as all governmental communications, by definition, should be). The technology exists to interpolate comments into texts, although this does raise some organisational problems centred around the power of context-sensitive searching and sorting. It also raises questions about intermediaries, which is our next topic.

The Role of Intermediaries

For a useful example of what is wrong with public interest information flows, pick up a newspaper. Obviously, these are full of editing. But they also deal extensively in processes which should be characterised as clarification, simplification, amplification and interpretation. These are all valuable (if often abused) tools but they are meaningless without attribution.

If I pick up a newspaper and read a story in which a speech is summarised, but can then get hold of the full text of the speech, the (potential) abuse of the editor is open to investigation. There are myriad examples of when later reporting of a statement bears little resemblance to its original (whether in the political sphere or outside). An obvious problem is that most people do not have the time, resources or inclination to check sources; this is not the fault of the newspaper or indeed our current, unsophisticated use of hypertext. Also, many speeches are simply not recorded verbatim. However, the real problem arises when there is deliberate non-attribution, as in "Sources said" or "Senior officials said". Translation without attribution is harmful, rather than simply inadequate.

The use of digital media should mean that any textual adjustment to an original document can be checkable against the original. The identity of the translator should also be declared. Both the individual translator, and the change in medium (from speech to newspaper) are an intermediary stage in the text's journey from mind to mind: and interrogating the role and activity of any intermediary is a vital aspect of democracy.

At the moment there is an intense discussion about the relationship between what are called impartial career civil servants, and political advisers, commonly known as 'spin doctors'. There are a large number of difficult issues to disentangle in this discussion so let me stick to three:

My answer to the first question is that the impartial civil servant is a myth which is perpetuated by the governing class to provide politicians with a safety net. There are all kinds of politics and turf wars within and between government departments, but the public face is not so much impartial as bland. The consultative process is, currently, largely a sham. I therefore believe that people with open political commitment are better moderators than those without; there is, after all, always the suspicion that the referee is biased.

Such bias is not, of course, limited to governments or the civil service; the observation made above about newspapers suggests this. What we must be aware of is, first, the increasing importance of intermediaries in a world of ephemeral rather than definitive texts. Second, the increasing power of the facilities available to intermediaries, as information processing technology becomes more powerful. Both these factors mean that we need to think about the roles, training and accountability of intermediaries. But it seems clear that the ability to access previous versions of a text will inherently make intermediation of any kind more transparent.

The Future of Intellectual Property

The current model of intellectual property rights which we have inherited from the analogue age is totally inadequate for our new conditions. In the analogue world the means of production meant that considerable capital was required to publish but this is not now the case (in any case, of all texts commercially published only 20% merit the collection of royalties for the author). There are many small music CD publishers, stretching down the culture into, say, individual choirs or bands who sell their own CDs on the door at concerts and nowhere else. Any self-regarding poet can stick his or her stuff online. Soon there will be community digital television and peer-to-peer publishing, or the digital equivalent of a letter with a bunch of snaps.

Whether or not the new model will be based on micropayments or whether it will be based on an initial payment for launch rights but no downstream collection is an interesting debate. Of more interest, however, is the distinction drawn earlier between the declared individual artefact and the communal document. If documents are published which may be amplified, simplified, clarified or otherwise deconstructed and reassembled, the intellectual property right required will not have anything to do with ownership but simply with attribution and backwards sourcing requirements. There might also be a condition which relates to the proper flagging of interpolation or ecritology. In other words, the original author will have rights but these will be concerned with preserving the integrity of the source material in the context of all manner of inputs to it. There are some people who resist this, saying that all documents necessarily evolve as they go through waves of commentary, but without the integrity of the source we will soon be in trouble, as pointed out earlier. Without attribution we would also soon get into the problem of impossible tangles of documents within documents.

Metadata and other tools

Computers' handling of text is accomplished by means of structural markup and metadata. The text remains the core, but these other elements are brought to bear on it in order that the text can be displayed, parsed, translated, annotated or whatever else. [Editor's Note: see this month's supplementary essay.]

How an artistic, individual document deals with these issues should remain up to the author. But as soon as a document becomes communal, there should develop certain responsibilities and standards around the use of structure and metadata. I would assert that properties of these documents be defined so that a text without an inbuilt architecture or certain functionalities isn't a document at all but simply an unauthoritative fragment.

This will move standards into the area of document creation instead of their being retrospectively applied. At the moment such standards are primarily concerned with acccessibility issues arising from measurable and severe disability. In addition, however, we need to think carefully about the public requirements for people with small vocabularies, those for whom English is not their first language and, perhaps above all, that one fifth of our population which is classified as functionally illiterate.

Having said all that I will finally turn to discussing the public policy requirements for language engineering. I group these under three main points:

Rights of Access

Not long ago I heard the following, most revealing story. A financial journalist in conversation with a Treasury official remarked that the recent Finance Bill was incomprehensible. For whom did the official think the Bill had been written? Without hesitation the official confirmed that it had been written for the lawyers. Not for citizens and not even for financial experts. This demonstrates the need for a much wider understanding of rights of access to information than the mere assertion that I should be allowed the text, the whole text and nothing but the text. A second example that comes to mind is the furore - artificial, of course - which blew up over then Chancellor Kenneth Clarke's admission that he had never read the Maastricht Treaty. Of course he hadn't; it reads like a series of cryptic instructions to a printer; e.g.: delete the "and: after "race" and insert a comma and then after "creed" insert a comma and then add "sexual orientation and disability." Now that is a relatively simple example of a phrase which might have been amended from "... on the grounds of sex, race and creed" to: "... on the grounds of sex, race, creed, sexual orientation and disability". Even for lawyers this kind of accretion is difficult but for most of us it is impossible.

At a level of less complexity, then, we need to think about the citizen's right to information, now the subject of EU legislation. Is this going to be a simple, theoretical right or are we going to make it meaningful through transparent simplification and amplification? Are we going to shift the Latin? Are we going to delete: "Anything to the contrary heretofore not withstanding" and simply state: "and this over-rides all previous laws on this subject"? Will we, then, take the legal document and translate it into English that can be understood by a person with five "O" Level passes or should we take this latter as the base document and get legal draughtsmen to sort out their side of the matter? Either way, what we do not want is a mass of manual re-writing.

In short, we can no longer issue one document and expect all citizens to access it with equal benefit. The technology allows us to do better than that. Illiterate people and those for whom English is not their first language still pay taxes, still vote and are entitled to the level of citizenship which the technology can provide at reasonable cost; what is reasonable, of course, is a political and not a technical matter. Nonetheless, rights legislation will increasingly put pressure on the public sector not only for translation between languages but translation within them.

Specialisation, heterogeneity and choice

There are reasons other than differing intellectual levels which lead to the requirement for a variety of renderings of a source document. We are both more specialised and more heterogeneous than we have ever been before in our intellectual and lifestyle pursuits. We may want a White Paper on housing in full text but one on foreign policy in a shortened or simplified form. We may simply, for example, want the Iraq Dossier without comment or we might want the ten major points as defined by the author in his tagging. Whatever the reason, this is not simply a call for simplification to accord proper rights to those with cognitive difficulties. It is a recognition of our lifestyle. These issues naturally lead on to the idea that what consumers of public affairs information require is choice; in this case not choice of raw material or choice of source material but choice of intellectual rendering.

Transparent Intermediation

Currently the varying intellectual gifts and aptitudes, time and resources of our citizens mean that in the sphere of public and political debate they rely upon the services of intermediaries. The problem is that these intermediaries are biased. Even public broadcasting has been dragged by commercial newsgathering and presentation into almost unconscious mendacity. Let me give you one example from the Afghan crisis last year which led to the first war in history where the media told more lies than the politicians. After the initial urging of George Bush not to lash out after 9/11 the 'story' looked after itself for a while, until 'Ground Zero" began to lose its pulling power. Then the round-the-clock news operations began to press for hostilities against the Taliban to begin. Almost immediately it became commonplace to state that the meteorology and topography of the country meant that any campaign would have to be over by the middle of November because of the heavy snow. As it turned out the campaign ended in mid November with correspondents reporting from Kabul in their shirt sleeves. Weather records show that it rarely snowed in the campaign area before early December and the map showed that 2/3 of Afghanistan is indeed elevated but flat.

We have reached the very dangerous situation where elected politicians are weaker than commercial news suppliers and public broadcasters dragged in their wake. Unless you have access to source material the politician cannot communicate with you in a way which explains his thinking to you. The intermediaries he relies upon are totally unreliable. Yet the source material is horribly opaque; this explains the role I see for language engineering within languages as central to the survival of the democratic process. Technology can provide transparent intermediation.

Conclusion

What language engineers have within their grasp is a set of vital tools to preserve our democratic system of governance. But the culture is against it, which is where I began.

As a society we are extremely conservative. This isn't remarkable, the periods when societies are not conservative are very short and very special. We usually express this conservatism in rather high flown language. In our case that is the language of standards: standards of authorship, standards in examinations, standards even in handwriting. These standards are always articulated in absolute terms and free of their cultural environment. So, for example, the fact that the standard of written English is steadily falling is never put up against the more important fact that the number of people behaving as authors is steadily rising. The simplification of some subject matter is never considered against the more important phenomenon of wider learning. The handwriting issue is frequently not considered in terms of other forms of writing, like computer keyboards or text messaging. The simple ubiquity of the last has led to its being condemned by the guardians of our cultural standards.

It is, then, in this political/cultural context that we have to look at machine processes for manipulating text. The title of this presentation mentions public policy as distinct from current politicians. One of the terrible ironies of the current political situation is that our politicians, as I mentioned earlier, are plagued by partial media but they have not seen the relevance of what technology has to offer, not only to offer them but also citizens who are equal prisoners of plutocratic bias.

What underlies these problems, however, is not some narrow allegiance to outdated standards though that is the form it will take; what we are looking at is a failure to understand the way that technology can change the way society operates. What is on offer is immense flexibility which will allow an individual to operate at a variety of levels of richness and complexity; it will apply the classical principles of comprehensive education, the division of labour, playing to strengths, to every aspect of life involving ideas.

Let us, then, lose our outdated assumptions about the definitive nature of texts (except in the special cases of certain artistic works). We have the technology to interconnect revisions, commentary, criticism, different renderings for different purposes, and all the other ways in which texts can mutate between different times and different contexts. Through application of this technology we may reopen channels of dialogue between government and the people.