Tangentium

January '04: Menu

This month's introduction
Feature essay 1: Language Engineering
Feature essay 2: The Digital Learning Divide
Supplementary essay 1: Electronic Text Technologies
Supplementary essay 2: Language, knowledge and exclusion
Key terms defined
Snippets
Reviewed links and resources
Main menu

All material on this site remains © the original authors: please see our submission guidelines for more information. If no author is shown material is © Drew Whitworth. For any reproduction beyond fair dealing, permission must be sought: e-mail drew@comp.leeds.ac.uk.

ISSN number: 1746-4757

Feature Essay: Language Engineering and Public Policy

Kevin Carey

Page 1 ¦ Page 2 ¦ Page 3 ¦ Page 4 ¦ Printer-friendly version

The Future of Intellectual Property

The current model of intellectual property rights which we have inherited from the analogue age is totally inadequate for our new conditions. In the analogue world the means of production meant that considerable capital was required to publish but this is not now the case (in any case, of all texts commercially published only 20% merit the collection of royalties for the author). There are many small music CD publishers, stretching down the culture into, say, individual choirs or bands who sell their own CDs on the door at concerts and nowhere else. Any self-regarding poet can stick his or her stuff online. Soon there will be community digital television and peer-to-peer publishing, or the digital equivalent of a letter with a bunch of snaps.

Whether or not the new model will be based on micropayments or whether it will be based on an initial payment for launch rights but no downstream collection is an interesting debate. Of more interest, however, is the distinction drawn earlier between the declared individual artefact and the communal document. If documents are published which may be amplified, simplified, clarified or otherwise deconstructed and reassembled, the intellectual property right required will not have anything to do with ownership but simply with attribution and backwards sourcing requirements. There might also be a condition which relates to the proper flagging of interpolation or ecritology. In other words, the original author will have rights but these will be concerned with preserving the integrity of the source material in the context of all manner of inputs to it. There are some people who resist this, saying that all documents necessarily evolve as they go through waves of commentary, but without the integrity of the source we will soon be in trouble, as pointed out earlier. Without attribution we would also soon get into the problem of impossible tangles of documents within documents.

Metadata and other tools

Computers' handling of text is accomplished by means of structural markup and metadata. The text remains the core, but these other elements are brought to bear on it in order that the text can be displayed, parsed, translated, annotated or whatever else. [Editor's Note: see this month's companion essay.]

How an artistic, individual document deals with these issues should remain up to the author. But as soon as a document becomes communal, there should develop certain responsibilities and standards around the use of structure and metadata. I would assert that properties of these documents be defined so that a text without an inbuilt architecture or certain functionalities isn't a document at all but simply an unauthoritative fragment.

This will move standards into the area of document creation instead of their being retrospectively applied. At the moment such standards are primarily concerned with acccessibility issues arising from measurable and severe disability. In addition, however, we need to think carefully about the public requirements for people with small vocabularies, those for whom English is not their first language and, perhaps above all, that one fifth of our population which is classified as functionally illiterate.

Having said all that I will finally turn to discussing the public policy requirements for language engineering. I group these under three main points:

rights of access
specialisation, heterogeneity and choice
transparent intermediaries.

Rights of Access

Not long ago I heard the following, most revealing story. A financial journalist in conversation with a Treasury official remarked that the recent Finance Bill was incomprehensible. For whom did the official think the Bill had been written? Without hesitation the official confirmed that it had been written for the lawyers. Not for citizens and not even for financial experts. This demonstrates the need for a much wider understanding of rights of access to information than the mere assertion that I should be allowed the text, the whole text and nothing but the text. A second example that comes to mind is the furore - artificial, of course - which blew up over then Chancellor Kenneth Clarke's admission that he had never read the Maastricht Treaty. Of course he hadn't; it reads like a series of cryptic instructions to a printer; e.g.: delete the "and: after "race" and insert a comma and then after "creed" insert a comma and then add "sexual orientation and disability." Now that is a relatively simple example of a phrase which might have been amended from "... on the grounds of sex, race and creed" to: "... on the grounds of sex, race, creed, sexual orientation and disability". Even for lawyers this kind of accretion is difficult but for most of us it is impossible.

At a level of less complexity, then, we need to think about the citizen's right to information, now the subject of EU legislation. Is this going to be a simple, theoretical right or are we going to make it meaningful through transparent simplification and amplification? Are we going to shift the Latin? Are we going to delete: "Anything to the contrary heretofore not withstanding" and simply state: "and this over-rides all previous laws on this subject"? Will we, then, take the legal document and translate it into English that can be understood by a person with five "O" Level passes or should we take this latter as the base document and get legal draughtsmen to sort out their side of the matter? Either way, what we do not want is a mass of manual re-writing.

In short, we can no longer issue one document and expect all citizens to access it with equal benefit. The technology allows us to do better than that. Illiterate people and those for whom English is not their first language still pay taxes, still vote and are entitled to the level of citizenship which the technology can provide at reasonable cost; what is reasonable, of course, is a political and not a technical matter. Nonetheless, rights legislation will increasingly put pressure on the public sector not only for translation between languages but translation within them.

Continue to page 4

Back to the top of this page