Books on Textual ETL

TextualETL™ Technology

The profound division between structured and unstructured data has prevented organizations from including unstructured, textual data from being used for analytical purposes. The worlds of text and structured data have grown up together as if each of the worlds were in a vacuum. Today, with FOREST RIM®’s TextualETL™, you can bring together structured data and unstructured data to:

  • Integrate textual data into a data base,
  • Create an unstructured data base that is then integrated with structured data in a data warehouse,
  • Read and process unstructured data so that textual analytics can be done.

By doing so, the investment you have already made in Business Intelligence can be leveraged. Your analytical infrastructure can be used – as is – and include unstructured data as part of the decision making process. TextualETL™ reads, integrates, and prepares unstructured data that is ready to go into standard technologies such as Oracle, DB2, Teradata, and NT SQL Server. Once the unstructured data resides in any of those technologies, standard analytical tools such as Business Objects, Cognos, MicroStrategy, SAS and other analytical technologies can be used to access and analyze your unstructured data.

The Power of TextualETL™

TextualETL™ is the technology that allows an organization to read text in any format in any language and to convert the text to a standard relational data base (DB2, Oracle, Teradata, SQL Server et al) where the text is in a useful meaningful format. TextualETL™ does not put text into a blob. Once text is placed in a blob it is essentially not useful. Instead TextualETL™ creates a textual data base in a relational format that is fit for analytical processing.

Protected by nine patents that have been filed, and based on the research and design by Bill Inmon, TextualETL™ includes the ability to:

  • Read any source or format of text – email, Hadoop, .txt, .doc. data base, etc.
  • Handle text in any common language (English, Spanish, French, German, et al)
  • Interpret both “standard” text (the language your English teacher taught you) and shorthand (the notes your doctor makes when you have a checkup)
  • Transform the text into any standard relational data base management system (Oracle, DB2, Teradata, SQL Server, et al)
  • Scale up to handle large amounts of data
  • Address the issues of multiple terminology for the same term
  • Apply taxonomies and ontologies to raw text
  • Recognize and manage logical sub divisions of text as it resides in the document
  • Perform both document fracturing and named value processing
  • Order multiple forms of text
  • standard text, doctor’s notes, comments, shorthand, tweets, and so forth
  • Visually display the clustering of words and terms
  • Back reference a document
  • Accomplish homographic resolution
  • Locate and recognize patterns of text
  • Many other features Textual ETL reads electronic data from any source.

Some of the typical sources include:

  • Standard Microsoft formats
  • Email
  • HTML
  • Social Media (Facebook, Twitter, LinkedIn, Google
  • Databases
  • Big Data
  • More.

With TextualETL™, you will have a decided market edge over your competitors by accessing the unstructured textual data in your entire organization quickly and easily. This technology is available today for improving your bottom line, listening to your customers and employees, and improving efficient decision-making.

TextualETL™– Available exclusively from Forest Rim®.

Other vendors who do not have TextualETL™ technology may say “We do everything that TextualETL™ from Forest RIM does.” But the fact is that TextualETL is a unique product with unprecedented sophistication and functionality in the arena of textual disambiguation and the complex process of ETL (“extraction, transform, and load”). No one else can make this claim. Here’s why:

TextualETL™ is not NLP based. Forest Rim did not fall into the NLP trap. Look at how much research has been done in and on NLP, how many doctoral dissertations there are, how long NLP has been promising results, and how little has ever been delivered. The first decision Forest Rim made in developing their product was to not use NLP processing. That does not mean that Forest Rim does not take into account the context of words. Indeed Forest Rim very much takes into account the context of words. But Forest Rim does not use NLP approaches to deal with context.

If there is any doubt as to the fundamental weaknesses of NLP, consider the following short scenario:

Two men are standing on a street corner in Houston, Texas. A young woman walks by. One man says to the other – “She’s hot.” Now what is meant here? One interpretation is that the young woman is attractive and the man would like to date that young woman. That would be one interpretation. But the temperature may be 105 degrees and the humidity 100% and the woman is sweating profusely. That may be another interpretation. Or yet another interpretation may be that the young woman has just gotten a parking ticket and is mad. In short, without knowing a lot more about the look on the woman’s face, the temperature, the attractiveness of the young woman and a lot of other factors we just do not know how to interpret the meaning of – “She’s hot.” NLP attempts to understand the context of words by looking at other words. In fact context is much more than words. Trying to put into a computer all the factors that shape context is not able to be done by today’s technology. So NLP processing falls into a trap from which it is very difficult to extricate itself. Forest Rim simply avoids that trap altogether by deploying a proprietary technology for disambiguation and contextualization.

Standard data base management systems have been optimized in order to handle repetitive activities efficiently.

But there is an irony here. In a standard corporation it is estimated that 80% or more of the data in the corporation is textual or non repetitive information. The irony is that technology has been optimized for the processing of only 20% of the corporations data. Stated differently, because it is a non repetitive format, 80% of the data of the corporation is not found in a standard data base.

The fact that non repetitive data cannot be stored in a data base has profoundly shaped the usage of computers. Applications – the very way that computers are used – are limited to accessing and analyzing repetitive data, which constitutes only 20% of the data in the corporation.

For years it was held that text could not be meaningfully placed in a standard data base. And for years that was true.

But now there is TextualETL™. And with TextualETL™ text can meaningfully be placed into a relational data base. And with the ability to place text into a standard relational data base comes the opportunity for an entirely new class of applications for business analysis and visualization. These new applications can maximize the business value locked away in textual data and, as a result, help address fundamental business challenges and opportunities.