by W H Inmon, Forest Rim Technology

The value of safety is unquestioned.  When an accident occurs and there are material damages there is the cost of repair and the cost of lost business and lost opportunity.

In addition, there is the disruptive effect on the day to day, routine activities of the organization. And, if there is loss of life, it is hard to put a dollar value on that unfortunate occurrence. So when the subject of safety comes up, no one questions the value of safety. It simply is very difficult to put a price tag on the value of safety.

But safety is peculiar in that while everyone agrees on its importance, almost no one agrees on what can be done to improve safety.

Some industries are more dangerous than other industries. In many industries safety is not really a burning issue. In fifty years there has not been a serious accident in the industry. But in other industries safety is genuinely a burning issue. Some of the industries where safety is a burning issue include –

  • oil and gas exploration, production, refining and distribution
  • pipeline administration
  • airlines
  • mining
  • chemical manufacturing, and so forth.

These industries have been dangerous for a long time and are likely to remain dangerous into the future. Danger is simply an inherent factor in these industries. One of the ways that these industries have attempted to improve safety is to rigorously inspect the facility and to religiously analyze and document each accident and breakdown. There are many reasons why these accidents, analyses and inspections are taken seriously –

  • Improving safety reduces the likelihood of a repeat of the same or similar accident or danger
  • innocent parties can be protected from litigation, and so forth.

A result of this attitude toward safety is that there is ample and lengthy documentation regarding each incident. In many companies these safety incidents are placed into a log tape. The log tape contains the daily record of all safety related activities that have occurred in the institution for a given 24 hour period. Once the daily log tape has been written, the organization can go and find selected incidents and create an analysis or report on the safety infraction. These analyses and descriptions find their way onto reports based on the data found in the log tape. It is a normal practice for these safety related reports to be written verbally, in a textual format.

The types of data that find their way into the daily safety log include accident reports, breakdown and failure report, inspection reports, repair reports, warranty reports, and so forth. There is a wide variety of activity that finds its way onto the daily safety log.

An interesting question then becomes – what happens to these reports that are found in the log tape over time? In truth, in many organizations these safety incident reports stack up in a corner and are very seldom ever read or used for analysis.

While it is normal to not look for information in these daily safety log tape unless there is a pressing need to do so, it is unfortunate because there is a wealth of important information that is tied up in the log tape. In particular, there may be information about recurring patterns of accidents, breakdowns, or other incidents that is buried in the daily safety log.

Over time there may start to be patterns relevant to safety that are piling up in the log. From a short term perspective it is not apparent that there is a problem that is developing. When looking at a single incident, nothing appears extraordinary or out of place. But when examined over time, the recurring problem or pattern pops out of the logs in a very distinctive fashion. There is much important information about safety that operates in just this fashion. The problem is that there is so much data and so many reports in the log tape that no one pays much attention to the report of all of these incidents over time. The sheer volume of text found in the log causes the information about these patterns to become lost, or never detected. These important patterns relating to safety practices and procedures “hide” behind reams of data and text over time.

The bottom line is that there is much important information found in accident and repair reports that simply escapes notice.

As an example of the value of safety data, recently there was an oil refinery that blew up in Houston, Texas. The refinery was destroyed. The equipment that was destroyed was worth an estimated  $100,000,000. And five workers lost their lives. In addition the oil company made the headlines of major newspapers, in a very unflattering way. Had this accident been able to be averted, the value of avoiding the accident is almost incalculable.

A few years later the same oil company had an oil well platform blow up in the Gulf of Mexico where eleven lives were lost and the entire Gulf of Mexico was damaged with spilled oil.

There is then a gold mine awaiting in the safety information that a company regularly collects into a log (but with which it does very little).

However, the sheer volume of the safety related data is only the start of the problem. The second aspect of the problem is that the safety incidents on the log are all written in text.

There are many challenges that relate to information that is bound in text. Some of the problems with textual reports are shown in Fig 6.

There is then a great opportunity for making dangerous corporations safer (and operating at a higher rate of profitability) by looking at the mundane information found in daily safety logs.

The first step to achieving this very worthwhile goal is to automate the collection and storage of the daily safety logs. As long as the safety log data sits there in a pile of papers, it isn’t going to be very useful. The first step to making that information useful is to transfer the text from paper to an electronic form. This is done through OCR, if the information does not already exist in an electronic format. (OCR stands for Optical Character Recognition). The process of OCR takes paper and converts the text on the paper into an electronic format.

Once the text is in an electronic format, it is recognized that the text is still very raw. OCR is only the first step and is needed only when the data does not already exist in an electronic format. In order to make sense of the raw text found in the daily safety log, it is necessary to read the raw text and “integrate” (or “disambiguate”) the text.

There is software  for gathering and integrating textual data and placing the text in a data base format where standard analytical tools can be used to analyze that data. The tool is Forest Rim Technology’s Textual Foundation tool for textual ETL. Once the data is integrated and passes through a textual ETL tool, it can then be placed on a standard data base platform, such as DB2, Teradata, Oracle, NT SQL Server, or other data base management system.

There is much value in being able to place textual data – in an integrated format – in a standard data base. Perhaps the biggest value of doing such a thing is that of being able to access and analyze the textual data in an automated fashion. Queries can be run against the data base where the results are produced very quickly and where there is accuracy of the answers. In addition, the data base can store data over a long period of time.

Another advantage of storing integrated textual data in a data base is that the data can be periodically refreshed in a seamless fashion. Each month or each week as new safety related logs are created, they can be seamlessly added to the data base.

Once the text is integrated and placed into a data base, the stage is set for analytical processing. Even though people take it for granted, there is much advantage to having a store of data that can be analyzed where the data is on a computer. Some of the advantages of having data in a data base on a computer are –

  • the data can be analyzed quickly
  • the data can be analyzed with flexibility
  • lots of data can be analyzed
  • data can be added periodically
  • different types of data can be analyzed in comparison to other data, and so forth.

When it comes to the types of analysis that can be done, in truth the limitation is in the imagination of the analyst. The agile analyst can look at anything that is in the data. But as a short list of the kinds of analysis that can be done that relate to safety, the following questions can be addressed –

  • is there any type of activity that is particularly dangerous
  • is there any location that is particularly dangerous
  • is there any particular product that is dangerous
  • is there a manufacturer of a product that is dangerous
  • are there times of the day, or times of the year that are more dangerous than others
  • what occupations are the most dangerous
  • is there any type of accident that recurs more than expected
  • what types of accidents are the most lethal
  • and so forth

The reality of creating a data base of safety data is that there are MANY more interesting avenues of interest that can be explored. And once the text has been disambiguated and placed in a standard relational data base, analysis is an easy thing to do.

Another nice thing about creating a data base that is built on standard dbms technology for analytical processing for safety is that analysis can be done with standard tools. In most cases the organization has already purchased and installed analytical software. End users have been trained in doing business intelligence. It simply is very convenient to use the analytical infrastructure that is already in place when the safety data is captured, integrated and placed on a standard relational data base.

But perhaps the most valuable part of the ability to analyze textual data in a standard data base is that of being able to combine textual data with classical structured data. Making such a combination is easy to do. For example, textual data can be combined with financial data, or production data, or human resource data, and so forth.

The ability to combine textual data with structured data greatly enhances the ability of the analyst to create very innovative and wide reaching queries. Now the analyst can have a range of potential queries that is limited only by the imagination of the analyst.

But there is another advantage – an architectural one – that arises out of this capability. There is the possibility of creating a truly integrated data warehouse, where the integrated data warehouse contains data whose origins are unstructured and where other data has origins that are unstructured.

There is then great opportunity in the ability of the analyst to do sophisticated queries when text can be meaningfully integrated into the form of a standard data base.


Forest Rim Technology was formed by Bill Inmon in order to provide technology to bridge the gap between structured and unstructured data. Forest Rim Technology is located in Castle Rock, Colorado.