by W H Inmon, Forest Rim Technology
Nearly every corporation has a call center. Ask an executive if his/her organization has a call center and the odds are good that the executive will say yes.
Then you ask the executive whether or not he/she knows what is going on in the call center. The executive assures you that the executive knows what is going on in the call center. Then you ask the executive what really is going on in the call center. The executive says the corporation is getting 6,000 calls a day and the average length of the call is 4 ½ minutes.
Now knowing the number of calls and knowing how long the calls have lasted is one interesting measurement of the activity going through the call center. But this kind of information does not tell you anything about the content of what is going on in the call center.
What would you like to know?
The kind of information you would like to find out about the call center is information about –
- What are customers complaining about?
- What are customers/prospects asking questions about?
- Do customers/ prospects want to buy something?
- Do customers need more information about operating equipment?
- Are customers having installation problems?
- Are customers interested in further options associated with equipment?
- And so forth.
The call center information that is valuable to the corporation is not how much the call center is being used but what is the content of the conversations that are occurring in the call center.
When you ask the executive if he/she knows the content of what is being discussed in the call center, the answer comes back – “you can’t know that kind of information”.
But in today’s world you CAN know that kind of information. Today there is textual disambiguation (or Textual ETL) and with textual disambiguation it is absolutely possible to know precisely what is being said in the call center.
In order to see just how Textual ETL creates the opportunity for the corporation to start to use the information found in the call center for better decision making, consider the following example.
In the following figure is the synopsis of call center activity that a telephone company has with its customers. In a day’s time the telephone company will get thousands of phone calls in their call center. The calls are about the many aspects of the day to day operation of the telephone company. In addition the telephone company also services television programming.
There is no way that an individual can read the messages and assimilate what is being said by the customer. There simply are too many messages.
So Textual ETL is used in order to read the textual information of what has transpired in the call center and to assimilate that information.
The conversation information is read by Textual ETL and converted into a data base. Once the text has been converted into a data base, the data base can be read and analyzed.
The Flow of Data
The following figure shows the flow of data that is occurring.
TEXTUAL ETL Algorithms
Inside Textual ETL there is significant algorithmic processing that occurs. Depending on the text different sorts of algorithmic processing occurs. Some text requires one kind on processing and other text requires another kind of processing.
Some of the kinds of processing that occurs inside of Textual ETL include –
Stop word processing. In stop word processing extraneous words such as “a”, “and”, “the”, “to”, “from” etc. are removed before the data base record is written,
Stemming. Stemming occurs by reducing words to their Greek or Latin stem. The relationship between “move”, “moving”, “mover”, “moves”, “moved” is recognized,
Alternate spelling. A simple example of alternate spelling is recognizing the British spelling of “colour” is the same as the American spelling – “color”,
Taxonomic/algorithmic resolution. Taxonomic resolution allows words to be classified. As simple example of taxonomic resolution is the recognition that “Honda:”, “Volkswagen”, “Porsche”, “Chevrolet” and “Toyota” are all “cars”.
Homographic resolution. Homographic resolution occurs when it is recognized that the same word or phrase has different meanings to different audiences. For example “ha” means heart attack to the cardiologist, hepatitis A to the endocrinologist, and head ache to the general practitioner,
Proximity analysis. Proximity analysis is the recognition that words in proximity to each other have different meanings than when the words are separated. For example “Dallas Cowboys” refers to a once great football team, whereas “Dallas” on page 1 and “cowboys” on page 4 convey an entirely different meaning.
Negation resolution. Negation resolution refers to the practice of inferring negation of meaning upon encountering a negative term, such as “no”, “not” “never”, etc.
Date standardization. One document has “June 5, 2016” and another document has “06/05/2016” In order to be placed into a data base the dates need to be standardized.
And there are MANY other algorithmic practices that must be accounted for as text is read and converted into a data base.
Once the text is read and converted into a data base, the data base can be read and fed into a visualization tool.
The visualization tool can be used to create a dashboard.
The following figure shows a dashboard that can be created.
(Note: The dashboard seen here was created by Boulder Insight.)
The dashboard shows the activities that are occurring inside the call center.
On the top left hand side is the display of the type of calls that are occurring in the call center ranked by the number of calls. Typical of calls are complaints, questions, inquiries about sales, installation questions, and so forth. This information tells management the general demeanor of the activities going through the call center.
On the bottom right hand side is the analysis of phone calls by hour of the day. This diagram shows that there is little or no activity during the hours of 2:00 am and 3:00 am. However during prime time – 9:00 am to 4:00 pm there is a lot of activity. Furthermore the activities are color coded. The analyst can drill down on the color and the hour to go to a lower level of detail should deeper analysis be required.
Above the hourly analysis is the daily analysis. The daily analysis shows that different kinds of call center activity have occurred on different days of the week.
Above the daily analysis is the monthly analysis. The monthly analysis shows that for the 31 days of the month there are patterns of activity.
And finally in the center of the dashboard are the subjects that were contained in the call center analysis. The different subjects that were mentioned by the conversations that occurred in the call center are listed in a demographic manner. The largest and darkest box show the most mentioned subjects. The smaller and lighter boxes show the lesser mentioned subjects.
The dashboard seen in the figure shows that an organization can know what is occurring in the call center. When management says – “You can’t know what is going on in the call center” management hasn’t seen one of these dashboards that tell you precisely what is occurring in the call center.
It is interesting to look at the text to data base processing from the standpoint of what the computer sees and what the computer has to do in order to create the data base that is behind the dashboard.
First off, what does a textual document look like to the computer?
In a word a textual document looks like a long string of text to the computer. It is one word or character followed by another word or character followed by yet another word or character.
A picture of what the computer sees is seen in the following figure.
It is up to the computer to read and interpret the long string of text and to take that text as input into the creation of a data base.
There are (at this moment in time) approximately 67 different and separate algorithms that are required to read and interpret text found inside of Textual ETL. Making matters even more complex is the fact that the 67 different algorithms have to be sequenced properly for their execution. For example, in some cases for algorithm A to run successfully, algorithm B has to have been executed. And on other occasions algorithm A has got to be executed before algorithm B is run, all depending on the text that is being processed.
There are many very diverse algorithms that have to be accounted for by Textual ETL.
Some of the more common algorithms include –
- Proximity analysis
- Date standardization
- Custom variable formatting
- Inline contextualization
- Taxonomy/ontology resolution
The following figure shows the application of the algorithm to the text that is input into textual ETL.
The Relational Database
The result of the processing of the text by textual ETL is a relational data base. While the creation of a simple relational data base is hardly new, to the organization struggling with text the ability to create a relational data base represents a significant milestone.
The significance of the ability to turn text into a relational data base is this. Once the text is turned into a data base –
- There is no limit to the number of documents that can be read and analyzed
- Analysis can be done by standard analytical software.
These two features are truly significant. Stated differently, by having the documents in a textual format, the documents must be read and analyzed manually, and there is no analytical software that can be used to analyze text.
The following figure shows the relational data base that has been created from the processing of text by Textual ETL.
Some of the features of the relational data base include –
- Identification of the document (or the call center record)
- Byte address of the word being analyzed
- The actual word being analyzed
- The context of the word being analyzed
The following figure shows some of the features of the data base.
While there are many aspects of the data found in the data base that are important, one of the unique and most important features of the data base is the identification of the context found in the call center. Stated differently, while text is important, if you are going to be analyzing text, you need to be analyzing the context of text as well.
And context of text is a standard feature of the data base produced by Textual ETL.
As an example of the value of context, suppose two men are on a street corner and a young lady passes by. One of the gentlemen says to the other – “She’s hot”.
Now what is the meaning of “she’s hot”?
One interpretation is that the lady is attractive and the gentleman would like to have a date with the lady.
Another interpretation is that it is Houston, Texas on a July day where the temperature is 98 degrees and the humidity is 100%. The lady is covered with sweat. She’s physically hot.
Or the gentlemen could be doctors and one doctor has just taken the lady’s temperature and she has a condition that has driven her temperature to 103 degrees. She has a bad temperature.
So when someone says – “she’s hot” – the only way you understand what they mean is to have context. It is context that infers meaning to text.
Stated differently, text without context is meaningless.
Textual ETL handles both text and context.
The following figure shows that both text and context are found in the data base created by Textual ETL.
In the figure it is seen that the call center has mentioned the words – “Maricopa County”. The context of those words is a place where a real estate deed has been recorded.
Note that Maricopa County (a county in Arizona) could have referred to many things. The context of Maricopa County could have been many things. Maricopa County could be –
- A place where professional baseball teams practice in the early season
- A place where a deed was recorded
- A place where a murder occurred
- A place where you took a vacation
- A place where a movie was shot
And so forth.
From an analytical standpoint, it is crucial to capture context as well as text. Textual ETL then is valuable in unlocking the secrets of the call center.
But the applicability of Textual ETL is widespread. That is because text is widespread. Textual ETL becomes the key to unlocking the information found in the text of the corporation.
The following figure shows that there are many places where text is found in the corporation.
Forest Rim Technology was formed by Bill Inmon in order to provide technology to bridge the gap between structured and unstructured data. Forest Rim Technology is located in Castle Rock, Colorado.