by W H Inmon, Forest Rim Technology

Arguably, contracts are the most important property of a corporation. It is contracts that define obligations of the corporation to other corporations. It is contracts that define the obligations of other corporations to the corporation.

It is contracts that define dates, terms, rates, collateral and other business conditions. Furthermore, if the conditions defined in the contract are not met, there are legal consequences.

Therefore, contracts are at the top or near the top of any list that defines the information that is important to the corporation. In their own way, contracts are more important than anything else.

An Executive Looks at Contracts

But ask an executive what is actually in his/her corporate contracts, and the truth is that the executive cannot tell you what is there. As important as contracts are, executives really don’t know what is in the contracts – collectively – of the corporation.

If it comes to one contract, an executive can find that contract, have a lawyer look at it, and then understand that single contract. But when it comes to looking at contracts collectively, that is another matter entirely.

Certainly executives have a general idea what is in the contracts. But when it comes to precision, executives at best know guidelines and policies that have shaped the many corporate contracts. Actually knowing collectively what is in the corporate contracts is another matter entirely.

So why is it that if contracts are so important that executives do not know exact and up to the second accurate information about what is in their contracts? There are at least two reasons why the content of contracts are hard to pin down. These reasons are –

  • There are a lot of contracts. It is not as if there were ten or twenty contracts. In modern corporations there are literally thousands of contracts,
  • Contracts are written in text, and text defies any attempt to quantify and organize the information wrapped up in text. Computerized data base management systems were designed for repeatable data, such as the transaction data that might occur in a bank doing ATM processing or airlines, doing reservation processing. In banks and airlines the same activity occurs over and over. All that changes is the actual data associated with the transaction. But text does not have the characteristic of being highly repeatable. Indeed, text in contracts is highly non repeatable.

For these reasons and more, text in collective corporate contracts have evaded serious attempts at automation.

Until now. Today there is Forest Rim Technologies patent (pending) technology that allows text to be captured, automated into a data base and analyzed in a meaningful manner. And some of the most important text is that that is found in contracts.

A Keyword Index

One approach is to create a keyword index for contracts. The problem with keyword indexes is that they cover a lot of ground, but they do not cover all the ground that the contract covers. And it is this detailed information that falls between the cracks that unfortunately often is needed. Keyword indexing gets an organization part of the way to contracts management but not all the way. And unfortunately many managers and lawyers need all the fine print information that is not found in keyword indexes. What is needed is a way to analyze ALL of the detailed information found in a contract.

The Standard Contract

Another approach some organizations have tried is to create what are called “standard contracts.” The theory behind the standard contract is that all business can be conducted from essentially the same contract. Once the contracts have been standardized, commonly occurring parameters can be plucked off the contract and the contract can be treated in much the same way that banks treat information found in withdrawals, deposits or loan payments.

Fig 1 shows that the reality of standard contracts is that while there may be many contracts that are very similar, that there really is no such thing as a standard contract.

The reality is that when you look at corporate contracts, there are many contracts that do not meet the “standard”.

The Myth of the Standard Contract

It turns out that there are a lot of good reasons for the myth of the standard contract and the reality of the non standard contract. One of those reasons is that the corporate need for contracts changes over time. Fig 2 shows the changes over time.

On day 1 the corporation starts out with a perfectly good standard contract. The standard contract serves the corporation well for a while. But one day the business of the corporation changes. There are new products. There is new packaging of an older product. There are new marketplace conditions. In a word, it is quite predictable that over time business conditions change. And as those business conditions change, the corporate needs a new “standard contract”.

There is nothing terribly difficult with creating a new corporate “standard contract”. The problem is that there are now two standard contracts. Does the corporation go back to the old customers and suppliers and renegotiate a new contract? Under most circumstances it is unthinkable to nullify and recreate a new standard contract from an old one. Instead, the corporation learns to live with two standard contracts.

However, time passes and business conditions change once again. Now the corporation ends up with three standard contracts. And so it goes. Over time there may be any number of standard contracts.

Another reason why corporations don’t really have standard contracts is that corporations  have the habit of merging. Or making acquisitions. It may be true that corporations start out with standard contracts. But one day they acquire a new company. They wake up to find out that they have just inherited a whole new set of contracts, which undoubtedly are very different from the existing set of corporate contracts.

Fig 3 shows that corporations constantly merge or acquire other companies.

But perhaps the most common cause of lack of standardization of contracts is that customers and suppliers all do not have the same business requirements. One customer needs a new clause here. Another customer needs a special rate there. Yet another customer needs a different appendix.

In case after case, many customers have business needs that just are not covered adequately by a standard contract. The result is that there may be a lot of “almost but not quite” standard contracts. Fig 4 shows this very normal occurrence.

It simply is true that a healthy, growing business is constantly changing. Market conditions change. Economic conditions change. Global conditions change. Competition changes. Opportunities change. And contracts never stand still, as much as the business wishes they would.

Fig 5 illustrates that change is a sign of health in a business.

And as business changes, the contractual needs of the business change as well.

It therefore is absolutely normal for the corporation to end up with lots and lots of contracts with little or no real standardization. 

Delving into the Contracts

So what happens if there is a business need to delve into these contracts? There may be many reasons why delving into contracts may be necessary. Some of those reasons might be –

  • discovery – preparing for a lawsuit
  • negotiation of a new contract
  • preparation for a merger or acquisition
  • determining the potential business impact of a change to the business, and so forth.

Given that there are very legitimate needs to occasionally examine contracts, and given that there are lots of contracts with very little real standardization, an interesting question is – how many contracts can a lawyer handle at once? This of course depends on the size and complexity of the contracts in question, but it is fair to say that a good corporate lawyer can handle – maybe – ten contracts at once.

Fig 6 shows a lawyer poring over the details of a contract.

The number of contracts that a lawyer can handle at once is a real limitation when there may be thousands of contracts. Or even tens of thousands of contracts. This proposition is a real obstacle to decision making in an informed manner.

Looking for a Better Way

There has to be a better way to manage the information in contracts. The natural alternative is to manage contractual information by placing that information on a computer in a standard data base management system. Such a change in contracts management is possible with Forest Rim Technology’s Textual ETL product.

There are many advantages to automating the text found in a contract and placing the text into a data base. Some of those advantages are –

  • once automated, analysis is very fast
  • once automated, analysis is very accurate
  • once automated, analysis is very flexible
  • once automated, the data base can be stored for a long period of time and then reused effortlessly
  • once automated, the data base can be combined with other data to produce the basis for very sophisticated analysis
  • once automated, results can be rechecked effortlessly
  • once automated, many contracts can be analyzed at once, not just two or three
  • once automated, the data base can be stored in a small amount of space, etc.

There are then some very good reasons for automating the text found in a contract into a standard data base.

Fig 8 shows the possibilities for analysis that are opened up by automating the text in a contract.

In order to do an effective job of automation of contract text into a data base, it is necessary to:

  • recognize that there are different kinds of data in a contract,
  • recognize that some specialized kinds of textual processing are needed.

Simply throwing text from a contract into a data base is not effective or useful. Instead, a specialized kind of textual processing is necessary. This specialized textual processing is generically called textual ETL, and has been patented (pending) by Forest Rim Technology. (For more information about textual ETL see the book TAPPING INTO UNSTRUCTURED DATA, Prentice Hall, 2007 or the book BUILDING THE UNSTRUCTURED DATA WAREHOUSE, Technics Publications, 2011.)

Two Types of Tables

The result of the sophisticated handling of text in preparation for the movement of text from a document to a data base is the building of two types of tables in a data base. Fig 9 shows these two tables.

One type of table that is produced can be called the document variables table.  The document variables table is produced by going through the process known as document fracturing. The document variables table is produced by reading the text in the document and removing all extraneous words (“stop words”, such as “a”, “an”, “and”, “the”, “was”, “is”, and so forth). Other textual ETL processing is done. The net result of the textual processing done by the software for the document variables table is that all important and relevant words for the contract are captured here.

The second table that is created is that of the “named” variables. A named variable is one that is recognized by looking at text in the contract and identifying one of the named variables. For example, the contract may contain the text – “contract date”. The software knows that whatever follows “contract date” will be the occurrence of the variable. The result is that the software knows that “contract date” is an important variable and that the actual contract data is “May 15, 2007”. The software may look for many variables in the creation of the named variables table. Typical named variables may be:

  • contract date
  • lessor
  • lessee
  • contract type
  • location
  • term
  • rate
  • condition, and so forth

Fig 10 shows the creation of the named variables table for the contracts that are being processed.

Fig 11 shows the creation of the document variables table for the contracts that are being processed.

While both the named variables and the document variables tables are useful and important, real value of the tables is that they can be meaningfully joined together. Fig 12 shows that both tables share a common identifier.

Once the two different tables can be joined together, the different kinds of information can be commonly accessed and analyzed. Note that if only standard contracts are processed, then there will be a commonality of data. But if non standardized contracts are processed, that this kind of text is easily combined into the data base as well. (For a deeper explanation of how the gap between standard text and non standard text is bridged, see external category processing in TAPPING INTO UNSTRUCTURED DATA, Prentice Hall, 2007 or the book BUILDING THE UNSTRUCTURED DATA WAREHOUSE, Technics Publications, 2011.) By using textual ETL, both standardized contracts and non standardized contracts can be mixed in the same data base with no problem. Now analysis of the text can be done regardless of whether the contract is standardized or non standardized.

The issue of contract standardization disappears in light of textual ETL.

Efficient Query Processing

And other very nice things happen. Queries can be run very quickly. Queries are accurate down to the finest degree. Data is stored in small spaces. The results of a query can be rechecked, and so forth.

The benefits of automating textual data from a contract into a data base is simply unquestioned.

Queries can be made against a single table that represents data from multiple contracts. For example when a query is done against the named variable table containing information from many contracts, the analyst might ask –

  • In how many contracts is the lessor HSQ Corporation?
  • Which contracts expire in 2009 and are for more than 2000 acres?
  • Are there more than 5 contracts for greater than $100,000 whose contract date is greater than 2010?

The kinds of queries that go against the named variables table are those that reference the named variables.

But queries can be run against document variables as well as seen in Fig 15.

The kinds of queries that are run against document variables might be –

  • How many contracts are for hydrocarbons?
  • Are there any contracts that affect the Cambrian layer and the well depth of 6500 feet, where the wells are pooled?

The difference between a query that goes against a named variables table and a document variables table is that the query for the document variables table does not reference named variables. It only references other text.

But the real power of analytical processing against the data bases created from text is that the two table types can be joined together. There is a common identifier that both tables relate to. By using this common contract identifier as a reference point, both named variable data and document variable data can be accessed and analyzed together. This gives the analyst a great deal of freedom in analyzing what kinds of contracts there are – standardized or non standardized.

But there are other possibilities as well. Not only can named variable data and document variable data be combined, but other data can be added as well. For example, it is possible to add financial data to contract data. In doing so, an entirely new and different kind of perspective can be gained.

There is then a world of possibilities that open up when textual data can be automated. The advantages of automation are simply tremendous.

A dashboard can be created from the data found in the contracts.


Forest Rim Technology was formed by Bill Inmon in order to provide technology to bridge the gap between structured and unstructured data. Forest Rim Technology is located in Castle Rock, Colorado.