There are two forms of textual disambiguation. In one form, raw text is examined a single word or phrase at a time. This type of single word or phrase textual disambiguation is common for Google and other dictionary based products.
The other form of textual disambiguation is the form of textual disambiguation where whole bodies of text are disambiguated (as opposed to a word at a time or a phrase at a time passing through disambiguation.)
Textual ETL is the process by which textual disambiguation of entire documents (and bodies of documents) is accomplished.
An early form of textual disambiguation was NLP – natural language processing. While there are some practical uses of NLP, the problem with NLP is that much of the context that is needed for disambiguation is not textual at all. The environmental surroundings, the weather, the people that are conversing, the time of day, the temperature, the date, a business occasion and many more factors greatly complicate NLP processing because these external factors are not textual at all. But these external factors greatly influence the context and interpretation of the raw text.
When textual ETL is used for textual disambiguation, a large amount of algorithms and a large amount of input are used for textual disambiguation. Typical of the input to textual ETL are taxonomies and ontologies, context vocabularies and context acronym dictionaries. Typical of the algorithms used in textual ETL are stemming algorithms, homographic resolutions, association block processing, alternate spelling, and word delimited indexing, among others.
One of the features of textual ETL is the ability to operate in multiple languages, in shorthand, in slang, instant messaging as well as proper text. Textual ETL is also used for disambiguating log tapes and other forms of logged messages. In addition textual ETL can handle improperly formed text, such as the text that comes out of OCR processing.
The output of textual ETL and textual disambiguation is the creation of a standard data base (often a relational data base) that can be accessed and analyzed by standard analytical software. In that sense textual disambiguation opens up the door to analytical processing of text.
Typical forms of input to textual ETL include standard Microsoft extensions (doc, docx, txt, xls, etc.), html, data base, email, tweets, log tapes, Big Data, etc.
Through textual ETL and textual disambiguation, the organization can start to store and analyze major blocks of raw text that could not previously be analyzed in an automated manner.