Why Do This?

The words in the text become objects in a connected structure – a structure which allows logic, states and numbers to flow within it, and where operators in the structure can add new connections or close off pathways. It is an Active Structure – a structure that can analyse itself.

Handling Complex Text

The first thing one notices about Complex Text – the text used for legislation and specifications – is that it is intimidating. Complex objects are described in the text – references are made to text tens of pages back, or tens of pages forward – there probably is a glossary – references are made to other documents, which may use different meanings for their words, and the document may be frequently revised. And yet you need to be constantly au fait with the whole thing. Factor in the Four Pieces Limit (see FourPiecesLimit.com for an introduction) and it seems impossible. One way around it is to turn the text into an Active Structure – a machine reads the text and the words are built into a structure with the exact meanings they are intended to have. Many common words in English have multiple meanings – ”run” has 82 definitions, “set” has 62, “on” has 77, making it easy to jumble things up if one is not sure about the exact definition, particularly after a while, when one is trying to map the specification onto something that may not have been envisaged when it was written.

We need to make it clear there will be some work upfront – a lawyer will latch onto a term and make it their own, while a technologist will latch onto the same term and think it means something else. Did we mention the problem with people of different specialties – a military strategist and a maintenance supervisor have very little common vocabulary. Getting people to agree on meanings initially is a good way of avoiding trouble down the track, where people have been pulling in different directions without realising it.

Figure 1A – Screen grab of a tool used to analyse and understand a large piece of legislation

Having a machine read the text to build a structure around the words that embodies their precise meanings and their connections to avoid confusion would be a good enough reason to do it, but there is another significant benefit. The structure that the machine builds can be activated – the words now function as pieces of machinery.

Here is a fragment of the Anti-Money Laundering and Counter-Terrorism Financing Act, where a complex object is being defined.
(we are using legislation as an example, as few large, complex specifications are in the public domain).

The AML/CTF Act describes a particular transaction:

  1. For the purposes of this Act, if:
    1. a person (the payer) instructs a person (the ordering institution) to make money controlled by the payer available to the payer by:
      1. being credited to an account held by the payer with the ordering institution; or
      2. being paid to the payer by the ordering institution; and
    2. the transfer is to be carried out wholly or partly by means of one or more electronic communications; and
    3. the ordering institution is:
      1. an ADI; or
      2. a bank; or
      3. a building society; or
      4. a credit union; or
      5. a person specified in the AML/CTF frules;
    4. the instruction is a same-institution same-person electronic funds transfer instructions; and
    5. for the purposes of the application of this Act to making the money available to the payer:
      1. the payer may also be known as the payee; and
      2. the ordering institution may also be known as the beneficiary institution.

Same-institution same-person electronic funds transfer instruction is treated as a wordgroup, in the same way that “ambient temperature” is treated as a wordgroup. Many wordgroups don’t need definitions, as it is sufficient to link them to the precise meanings of the words used. Some, like “same-institution …” do have definitions, and there are wordgroups in those definitions, like “bank account”, which again has a definition, as does “payer”.

Wordgroups are not the only large structures in complex text (ignoring paragraphs, sections, chapters).

Some words are independent of other words – the adjectives in “a large black car”.

Some words need to be clumped, so other words can operate on an object, rather than a collection of nouns, adjectives and prepositional phrases.

One definition for a plane is

An imaginary flat surface of infinite extent.

The adjectives “imaginary” and “flat” are not independent.
“imaginary” operates on “flat surface of infinite extent” – a clump. These clumps are visible in the machine version of the text.

Figure 2 – The structure behind the text is rich in information

A major problem with large specifications is that most of the processing a reader does on them is unconscious, making it very hard to know if everyone processed the document in the same way. Having a machine show the current state of understanding allows agreement on just what a contentious passage means. This is particularly problematic when you consider different specialties – a lawyer will skip over all the technological bits, while the technologist’s eyes will glaze over on the lawyerly bits – the intention is for everyone involved to have a good understanding of all of the document.

The result is that the machine has the definitions of all the pieces of machinery  that the words represent, and the larger structures involved, and can use this machinery to simulate exactly what was envisaged in the Specification. It can either be used to check that the complex text is complete, coherent and consistent (including links to objects tens or hundreds of pages away in the text), or to develop and test coherence and completeness on a working model of the specification (in other words, a Verification and Validation (V&V) on the specification, long before any steel is cut or code created).

Are You Saying People Can’t Read?

We are saying that people have a Four Pieces Limit, which needs to be handled in complex projects, where there can be hundreds, if not thousands, of things to consider.

Contact Us