Thought Leadership Archives

In an earlier post, I discussed the broad term of “text analytics”, and looked at a few of the automated functions grouped under that umbrella. It’s important to be able to differentiate these functions; for example, entity extraction can help you fill gaps in your database, but won’t necessarily give you comprehensive information on what your text is about. Today though, I’d like to look at the flip-side of this issue: are there any functions associated with text analytics that are labeled separately but actually provide similar information? The labels in question are “Concept Recognition” and “Classification” (though different providers may brand these services using varying terminology). What do these functions provide? Are they distinct enough to justify storing data from each? Let’s take a look at what these functions actually do:

Concept Recognition: Picks out the key ideas to give you an at-a-glance idea of what a document is about. Listed concepts may include corresponding “type” and “subtype” information and/or an importance score relative to the other concepts in the document.

Classification: Assigns one or more categories to the document as a whole. Typically, these categories are organized into a relatively shallow taxonomy*, so an article about a biotech CEO using herself as a guinea pig for experimental anti-aging therapy might be classified to “Science” or “Health::Therapy”.

Essentially, concept recognition and document classification both give you an idea of what subjects your text discusses; the former just works on a much more granular level, while the latter gives you a broad categorization. This begs the question: how are these two functions linked? Do the individual concepts identified by a text analytics service serve to inform the overall document classification? In a perfect world (of text analytics), the specific concepts would not be separated from the broad taxonomic classifications. Niche topics like “scientific journals” would simply be nodes in a much larger, more comprehensive taxonomy: Books & Literature::Books & Literature Products::Periodicals::Scholarly & Professional Journals::Scientific Journals Unified topic classification is preferable to a separated approach because, ideally, all of that data fits together. Each specific concept links back to a broader subject matter, and high-level classification is informed by the prominence of individual topics. eContext’s hierarchy includes 450,000 individual topics, and each one has its own address in a massive tree. Our users can classify documents to specific concepts, but view those concepts through as broad a narrow a lens as necessary. “Concept Recognition” and “Document Classification” limit users to the very beginning and very end of a long subject matter chain. By unifying these services with comprehensive taxonomy, text analytics companies can provide topic classification that’s more informative, more organized, and supports wider variety of use cases.

*Occasionally, a text analytics company will feature their own proprietary taxonomy, but most of the time, they classify text to align with IAB or IPTC standards. While it’s helpful for providers to group their content according to industry conventions, as of 2016, these taxonomies could certainly use some expansion. FWIW, eContext’s taxonomy can be overlaid on these existing structures, to keep things consistent while still allowing for more granular build-outs.

[/vc_column_text][/vc_column][/vc_row]

Contextual Machine Learning

Contextual machine learning helps combine the best of machine learning capabilities with a classification system that enriches your data. The addition of a hierarchy brings meaning to the data so you can see the relationships between high-level categories and the sub-classifications underneath them. You can look at broad strokes or very specific, fine-grained categories to find associations between similar terms, or see how various topics break down by segment in the hierarchy. Structured data helps you make better judgments.

eContext’s classification system is flexible. It provides the top-level categorical view that other similar systems show you, and then it goes very deep, delivering opportunities for classification of your data into 21 tiers. The depth of our system mans you can take a deep dive into your data and arrive at better predictions.

Getting trained

We’ve been working with Seth Grimes of the Alta Plana Corporation, who recently published a white paper titled, “Contextual Machine Learning: It’s Classified“, which outlines various models for this type of machine learning and explains how taxonomy enhances it.

The paper shows how text classification can help people in a wide range of professions arrive at better decisions — from those working at agencies and in marketing to people who are focused on advanced projects such as chat bot applications and or even far-reaching logistics and manufacturing projects.

The paper also relays five ways that better text classification can improve machine learning accuracy. For instance, applying classification for better relevancy and more structured training data sets, or using classification resources to test machine learning outputs for model validation.

Whatever your profession is, it’s safe to say that machine learning will have an impact on what you’re doing at some point. Classification is ideal for anyone who works with a lot of data, so get up to speed on the latest in data trends.

Contextual Machine Learning

Getting trained

Ready to learn more?

Chicago

London

INQUIRIES

Ready to learn more?

Classifying the Classifiers: Distinguishing Text Analytics Functions (Part 2)

Contextual Machine Learning

Getting trained

Ready to learn more?

Chicago

London

INQUIRIES

Ready to learn more?