by W H Inmon, Forest Rim Technology

The basic medical record is written by healthcare providers for healthcare providers and their patients.

The primary purpose of the medical record is to document and communicate important information with regard to the care and treatment of a patient. The medical record is normally created as a result of an episode of care.

Fig 1 shows a typical healthcare record.


The Challenge of Narration

Much of the healthcare record is written in a narrative form. Many subjects are typically discussed in the narrative, including such topics as past history, family history, current diagnostics and observations, procedures, future treatment options, and so forth.

The healthcare record is valuable to many communities.– the clinical community, the research community, the insurance community and so forth. But there is a fundamental problem with healthcare records. The problem is that even when the healthcare record is written in the form of an EMR (electronic medical record), the record still contains narrative information.

Narrative information is useful and necessary to the clinical community but poses real problems for the research community.


Fig 2 shows that the medical record can be put in the form of an EMR.


However, even when the medical record is put in the form of an EMR, the medical record still contains generous amounts of narrative. And narrative is what gives the research analyst such problems.

Fig 3 shows the EMR is full of narrative.


There is no question that narration is needed in the medical record. For doctors and nurses to understand what is going on with a patient it is mandatory that information be in the form of a narration.

Fig 4 shows that it natural and normal for medical records to be in a narrative format.


Computerized Analysis

However, as long as information is in the form of a narration, it is awkward and difficult to use that information for analytical processing. There are many challenges in trying to do analytical processing against narrative information. The primary challenge is that computer technology is based on what can be called a structured format. In order to do computerized analysis, data needs to be tightly structured. But narrative information is very unstructured. Trying to put unstructured information into a structured format is like trying to place a square peg in a round hole. And the fact that the medical record is in the form of narration means that the analyst trying to use the medical record faces severe challenges.

Fig 5 shows that the computer analyst is vexed when trying to do analytical processing on narrative information.


TEXTUAL ETL & Disambiguation

Enter into this fray Textual ETL. Textual ETL is the technology that is designed to read narrative information and to place that unstructured information into a computerized, structured format. Textual ETL uses many techniques and approaches to the reading and interpretation of text. It is said that Textual ETL “disambiguates” unstructured text into a structured format.

Textual ETL had been used in many environments. The medical records environment is merely one of the many environments where the transformation of narration into a structured data base format is useful.

Fig 6 shows that textual ETL is able to read textual information and by the means of disambiguation, creates a standard data base out of the text.


Normalization of Text

Being able to create a disambiguated data base out of unstructured narration information is an interesting and useful technology. The structured data that is produced out of Textual ETL is in a form that is called a “normalized” form of data, in addition to being disambiguated.

The normalized form of data has much to recommend it, especially to the computer analyst. When text is normalized it can be placed into a data base. When text is normalized the text can be edited, classified, and ordered.

Analysis of Many Records

Perhaps the biggest advantage to the transformation of unstructured text into a disambiguated, normalized form of data is the ability to analyze an unlimited amount of records. One of the great limitations to narration is that in order to analyze the narration, the text must be read. And there is only a finite amount of data that can be read and ingested. But when data is transformed into a disambiguated, normalized state, the computer is now capable of analyzing millions of records. Stated differently, if unstructured text must be read, only a finite number of records can be analyzed. But once the text is placed into a structured format, an unlimited number of records can be ingested and analyzed.

Fig 7 depicts textual ETL and the production of normalized text.


The normalization of text is a powerful and important step forward. But there are still challenges with normalized text.

Even though normalized text can be placed in a neatly structured data base, it still is not easy to work with. The biggest challenge is that in normalizing the text, each row of text holds only one word or phrase of data. That single word or phrase will have many relationships with other units of data. Some of those relationships are straightforward and obvious, but other of the relationships are not obvious at all. Trying to keep those relationships straight is quite a challenge, even for the most astute of the computer analysts.

The Inherent Complexity of Normalized Data

Fig 8 shows that the individual rows of data found in linear fashion in a normalized data base can form a complex “spider’s web” of relationships. The resulting “spider’s web” is complex to deal with. Unfortunately those spider’s web relationships are needed to make sense of the normalized data found in the structured data base.


A good solution to the problem of the complexity of normalized text is that of taking the normalized text and restructuring it after it has been created. The result is the creation of a structure of text that represents the raw text in the medical record and is reasonably intuitive. (The restructured record is undoubtedly more intuitive than the raw normalized data!)

Restructuring the Normalized Text into the Medical Record

Fig 9 shows the process of restructuring the normalized data that emanates from textual ETL.


The disambiguated, restructured record contains the same data as the normalized table. The difference is that the restructured record is designed to be much more useful than the normalized data.

What the Disambiguated, Restructured Medical Record Looks Like

Fig 10 shows the disambiguated, restructured record that has been created from the disambiguated data. (Note: the patient name has been covered for the purpose of privacy.)


The first feature of the restructured disambiguated data is that it is immediately obvious what data belongs to what patient. The patient name is spread over the right hand side of the record. Therefore, at a glance there is no ambiguity as to which patient the data refers to throughout the structure.

Fig 11 shows the obviousness of the patient name.


The second feature of the restructured disambiguated record is that the data is aligned sequentially as it was found in the originating document. There is never any doubt as to what data has been found in what sequence.

Fig 12 shows that the record of data is in sequence and that the data is sorted in the appropriate order.


The center of attention is the word or phrase that has been selected by Textual ETL. The selection of the word or phrase from the originating medical record may have resulted from a variety of techniques. The word or phrase could be the result of a taxonomic resolution. Or the word could have been picked because of a homo graphic resolution. Or the selection could be because of an acronym resolution. In fact there are many reasons why the word or phrase could have been selected from the originating medical record when reading and processing the raw text in the medical record.

Fig 13 shows the selection of the word or phrase and its placement into the data base.


An important feature of both the originating medical record and the resulting data base is the support of the negation of a term. On occasion the doctor will write – “this tumor was not malignant.” In this case the analyst needs to know that there is a negation of the term or phrase. The negation is found and recognized by Textual ETL processing as the originating document is being read.

Fig 14 shows that words or phrases can be negated by the doctor or nurse writing the medical record.


Another important feature of textual ETL is the support of taxonomic/ontologic resolution.

Some words and phrases written by the doctor/nurse creating the medical record have a taxonomic/ontologic classification associated with them. This is always true if the word/phrase was selected because of a taxonomic resolution. It is sometimes true under other conditions.

Taxonomic/ontological resolution is important to the disambiguation of the words used by the doctor/nurse in the writing of the medical report.

As a simple example of taxonomic resolution, the term medication may be applied to the word – “Zofran” or “metformin”.

Fig 15 shows that – when appropriate – a taxonomic resolution is associated with the word or phrase.


Another important feature of the medical record is the organization of the record itself.

When creating his/her notes for the medical record, the doctor often creates small sub classifications, for the purpose of organizing the report. For example the doctor may write –


NOSE: xxxxxxxxxxxxxxxxxxxxxxx




Following the sub classification about “NOSE”, the doctor will write about observations, treatments, medications, and so forth that relate to the nose. It is useful for the analyst to know that the word or phrase that has been selected is a part of this sub classification.

Fig 16 shows that Textual ETL picks up on the sub classifications that the doctor makes.


And on other occasions the doctor makes major classifications of data. These “super” classifications of information tend to be more sweeping and general in scope than the smaller sub classifications of text. These super classifications of data may be titled – “Impressions”, “Assessment”, “Treatment Plan”, and so forth. It is most useful to the analyst to know that the word or phrase that has been selected is part of the major classifications found in doctors notes.

Fig 17 depicts the major classifications of text when they exist.


By creating the “super” classification of text, a doctor can organize his/her notes in a  manner that is easy to understand and read. And the restructured, disambiguated data found in the data base reflects that organization.

The effect of restructuring data according to the lines of thinking created by the doctor GREATLY enhances the ability of the analyst to understand what data he/she has at any moment. When the analyst has in his/her hand a word or phrase, it is immediately clear –

  •    What patient the word or phrase applies to,
  •    The order in which the word or phrase appears in the originating document
  •    Whether the word or phrase has been negated in the context of the originating document
  •    Any taxonomical classifications that apply to the word or phrase
  •    Any sub classifications that doctor has intended
  •    Any super classifications of data the doctor has intended.

In a word, once the linear disambiguated text is restructured, it is easy and natural for the analyst to create his/her analysis.

And in any case there is an airtight relationship between the originating document and the restructured disambiguated data found in the data base. Fig 18 shows this very close relationship.


The Relationship between Narration & Database

Because of Textual ETL and the restructuring of the disambiguated data produced by Textual ETL, both the doctor/nurse community and the analytical research community have the information they need in the form they need it in, in order to do their job. Fig 19 shows the unifying effect of Textual ETL and the restructuring of disambiguated data.


Bill Inmon is the founder of Forest Rim Technology. Forest Rim Technology is located in Castle Rock, Colorado. Forest Rim Technology produces Textual ETL and the restructuring of disambiguated data. Forest Rim is happy to do a proof of concept to show you the value of being able to manage your textual data found in your medical record.