White Papers Archive

December 14, 2018 by Miriam Carey

June 14, 2017 by eContext

Virtual Agent (VA) technology is poised to transform the role of computers and information technology. Its share of consumer attention and wallets is set to explode as it becomes ubiquitous in peoples’ everyday lives. This transformation will not be limited to smartphones. VA technology is increasingly incorporated into embedded devices for ecommerce, including the Internet of Things.

The key advantage to voice-activated Virtual Agents over other human-computer interfaces is that interaction can be natural, hands-free, and faster — people can interact with a VA in the same way that they would interact with a person. Conversational interaction is the primary interface humans use with one another to manage their affairs. When VAs become capable of nearly human-level conversational interaction, the physical interface of most computer technology will essentially disappear.

The critical factors for wide-scale acceptance of VA technology are reliability and user satisfaction. Achieving improvements in this area will require developing improved methods for intelligent understanding of user queries and knowledge processing. This is not simply a technological problem with a purely technological solution, but one of technology connecting with fundamental human social structures.

Transformative VA technology will need to represent and effectively use knowledge about the world, about its users, about typical tasks, about conversational structure, about conversational errors and repairs, and so forth. Machine learning and data analytics can help, but structured knowledge bases curated by human experts will be critical to achieving these advances.

VAs will need to incorporate deep and broad taxonomies, including fully articulated ontologies, into existing methods for question understanding and answering. VAs will need to understand a larger variety of semantic relations represented between concepts as well as inference rules that allow complex reasoning to take place. VAs will model their users’ interests, goals, plans, and worlds, and will use this information to anticipate their needs, understand their requests, and be natural in conversation with them.

Owners of VA technologies will be partnering or investing in taxonomies and ontologies, and other large enterprises should establish a solid VA strategy to take advantage of this disruptive technology.

Technical Challenges

Social Assistance with Personal Projects

Although this paper focuses on technology for answering questions, it is important to keep in mind that, as with most conversational exchanges, questions and their answers are best understood as vehicles for social action. Humans use a variety of grammatical forms to accomplish a very broad range of actions: requesting, reporting, complaining, greeting, and others that are more subtle and complex (such as “confirming an allusion,” see Schegloff, 1996). Even business, sales, and marketing application agents, both human and virtual, must be capable of dealing with a very wide variety of actions.

The most common activities users pursue with Virtual Agents are requests for action or information and current systems are closest to being able to recognize and respond to them. However, the approach we developed below anticipates the broader range of actions humans engage in, and can be generalized to these other types of actions as systems develop.

Questions and Requests

Virtual Agents are designed to both provide information and fulfill requests. These different types of actions can be expressed in multiple grammatical forms:

Thus, effective language processing for Virtual Agents must determine what the user is trying to accomplish overall, not just the form of what is said. The agent must consider how each interaction fits in with the user’s overall intent. It is not just about language, per se; broader knowledge about the user and the world must be brought into play.

But most user needs cannot be achieved through simple “one-off” interactions, and demand extended conversation. Virtual Agents need the ability to engage effectively and naturally.

This ability involves complex inferences about the real world provided by natural language understanding (NLU) and developments in artificial intelligence. This is the medium term key to the industry.

Questions: The Easy and the Hard

Easy Questions

First, consider what these agents already do well. A question which is well-formed grammatically, pronounced clearly, and stated in terms that exactly match information in the database available to the Virtual Agent can be answered immediately and well. Answering questions like these is a relatively straightforward process:

Siri, where’s the nearest Apple store?

Alexa, what’s the price of The Divergent Series, Insurgent on instant video?

The words in these questions are not utterly unambiguous, but the phrases clarify meaning. “Apple” may be an electronics firm, a fruit, or a music company, but “Apple store” has one, more commonly- used meaning. “The Divergent Series” refers to many books and films (as well as a mathematical concept), in several formats, but “The Divergent Series Insurgent instant video” narrows the possibilities to just one thing.

Hard Questions

Questions become more difficult to answer when stated in a less direct manner, put in words that are not in the answer database, or when the words used have more than one possible meaning:

Siri, where can I get my screen fixed?

Alexa, how much for the new Divergent?

The user’s intent may be the same, but the agent’s task is more complex. The word “screen” has many possible meanings: computer, phone, TV, sun, mosquito, phone, preliminary interview.

Asking where to get it fixed narrows the possibilities, and some of the possible meanings are used much more frequently than others, but not enough to confidently narrow the question to one clear meaning. And some of the necessary elements to identify the right answer are missing. If the intent was to fix a phone screen, for example, it may be necessary to know the brand of phone and whether it is under warranty.

Understanding the phrase “the new Divergent” requires not just knowing that it refers to a movie (or a book) series rather than just a dangling adjective, but also knowledge about what is “new”— knowing that the last Divergent book was published long before the most recent movie is necessary to know that the user means the movie.

VA technologies are starting to go beyond statistical speech-to-text and information-retrieval methods. Contextually meaningful responses will require VAs with articulated knowledge about the possible meanings of words and phrases, connected with each other and with the real-world context. They must continue to improve their language models, but also start to gather information to create richer situational awareness, understand both individual and general context, and become attuned to the intent of the questioner.

Technological Market Drivers

“Every single conversation is different. Every single context is different… we want to understand your context…I think computing is poised to evolve beyond just phones. It will be in the context of a user’s daily life. It will be on their phones, devices they wear, in their cars, and even in their living rooms.”

—Sundar Pichai, CEO, Google, Google I/O Keynote, 2016

The three key technology dimensions for VA development are:

Question understanding: Generating an accurate internal representation of the user’s question or request.
Response generation: Generating a relevant, accurate, and useful response for the user. This may involve finding and explaining information or invoking an action on a device.
Conversational interaction: Interacting with the user. At a minimal level, this involves simply responding to requests; more advanced capabilities involve maintaining multi-turn conversations with the user, as appropriate.

Each of these dimensions defines a spectrum of sophistication, which define a three-dimensional space for VA technology, depicted in the figure below. Question understanding defines a range from basic speech-to-text capability through contextual understanding of the meaning of the question. Interaction begins with simple one-off question answering, progressing towards more sophisticated modes include phone-tree-like verbal menus, conversational questioning of the user to clarify information, and naturalistic conversation that uses full contextual information. Finally, responses generated should be relevant to the question, with improvements leading to more fully accurate, useful, and natural responses.

VA technologies are starting to reach more sophisticated levels of processing in all three areas. There are three key technological barriers that need to be overcome to achieve qualitative improvements:

Understanding

Using the full context to understand the meaning of user queries. This includes the conversational context, as accounting for the month, the location, and even device and the time of day, and also knowledge about the specific user including age, gender, and their topical search history. VAs will need to resolve ambiguity, either by referencing its context or background knowledge, or by actively seeking user confirmation.

Conversational Structure

For VAs to be widely accepted as intelligent, they will need to understand how to structure a conversation that makes users feel comfortable. Several aspects of extended interaction are critical:

Follow up questions as part of disambiguation and clarification of the user’s needs.
Driving complex multi-turn actions, such as booking travel across dates, times, destinations, flights, and hotels.
Detecting and repairing a temporarily derailed conversation. VAs must detect when the user and agent are not on the same page, or when the user wants to “edit” something they already said. The VA will need to know enough about the expected structure of the conversation, troubles it may encounter and how to recognize them, and how to initiate a conversational repair. Even if the repair is just to say “I’m sorry, we seem to have gotten off-track. Shall we try again from the beginning?” Detecting such errors can greatly improve the naturalness of conversation with a VA.

Let’s look at each of these areas in more detail and consider how advances in structured knowledge can significantly improve user experiences.

Question Understanding

The first phase in any question-answering system is processing the input query into some representation of its “meaning,” that is, a form that can be used to find an answer.

This meaning consists of at least two components: the question “type” and the question “topic.” The type informs the VA what sort of an answer to seek; for example: “How many people live in the UK?” is a quantitative question, “What is a gene?” is a definition question, and “How do I knit a sweater?” is a procedural question. Knowing the type of a question helps to narrow the field of acceptable answers, and enables the system to structure the response appropriately.

The topic represents what the question is about. A topic representation may be as simple as just the set of topical words in the question, but more sophisticated representations can greatly improve results. For example, the representation of “What pharmaceutical companies are using Oracle?” should refer to knowledge that pharmaceuticals, medicines, and drugs (in one sense) are the same thing, and that Oracle is a technology company.

Even common, highly commercial questions can present topical ambiguity. Understanding the set of likely topics allows the system to prioritize certain interpretations over others, based on additional meta data or qualifying interactions.

Question Types

Determining the type of a question requires knowing what the different types of questions are. This requires the classification of question types, each with associated constraints on the kinds of answers acceptable for that type of question.

There are not yet any widely accepted taxonomies of question types. Most systems use something ad hoc, often based on one of two long-standing taxonomies, neither developed for purposes of question answering: Graesser’s taxonomy of questions in tutoring sessions¹ and Bloom’s taxonomy of educational objectives².

A good question type taxonomy will be fine-grained enough to give strong constraints on the possible answers to a question, to improve relevance and accuracy, and will also have clear and accurate criteria for determining the type of a given question. In some cases, the type of question is pretty clear, as in “Who shot JFK?” where the word “who” tells us that we want to identify a person; other cases are more difficult. The word “what,” for example, says little about the type of answer:

What is the population of Indonesia? (quantity, of people)
What is the capital of Congo? (city name)
What does the cheapest tablet go for? (price, in currency)
What is wrong with the cable company that they keep overcharging me? (explanation)

The taxonomy must not simply classify questions by their grammatical types, but rather by what the questioner’s intent is — what they are trying to accomplish by asking the question. This is what constrains the form of relevant responses.

In a more sophisticated Virtual Agent system, questions and requests must be classified into “user actions.” Developing a taxonomy of such actions together with an accurate classification method will be a vital ingredient to VA development. This will require analysis of actual questions and answers in context. Machine learning will be a key component of classifying questions to the right type in the taxonomy. However, fully automated techniques do not yet give high enough accuracy for deployment, so some level of expert human involvement will provide cognitive context.

In subsequent work, systems will need to expand the range of utterance formats that they can accept and generate. VAs must be able to handle directives, declaratives, and even multi-unit forms such as stories and telling, as well as the wide range of actions these accomplish, such as reporting, evaluating, revising, complaining, and so on.

Question Topics

In addition, it is essential to determine the topic of the question. This is now often done by the collection of keywords in the question (a bag-of-words), perhaps expanded through a lexical resource such as Wordnet, or mathematically modeled using statistical models of word meaning, such as vector-space models like latent semantic indexing or more sophisticated word-embedding models like word2vec.

Syntactic analysis can help to create a more fine-grained understanding of the topic. The bag-of-words approach applied to “Can dogs be allergic to dust?” would look like a question about allergies to dogs and dust as much as a question about dog allergies. Syntax is needed to disambiguate.

Another key notion in question topic is that of focus. Consider the question “Who sells coffee that pays good wages?” The question could be either “What coffee shops pay good wages?” or “Where can I buy/get fair trade coffee?” The system must determine if the focus of the question is the seller or the coffee. However, even given that information, background knowledge is necessary to frame the question properly. If the focus is the seller, what are the other alternatives? The system must recognize someone interested in “wages” is looking for a job, and that certain types of sellers, like bricks-and-mortar retailers, will likely have open positions more than large eCommerce retailers. This requires a taxonomy of relevant world knowledge about objects and attributes.

Parenthetically, it should be noted that improvements in speech recognition, intonation, and prosody may help resolve some of the ambiguities associated determining focus.

Question Ambiguity

“You know, I think that most people underestimate the difference between 95% accurate speech recognition, which is maybe where we are, and 99%. 99% is not an incremental, 4% is improvement — it’s a game changer. It’s the difference between you barely using it — maybe what you do now — versus you using it all the time.”

—Andrew Ng, Chief Scientist, Baidu, Bloomberg West, May 23, 2016

One of the difficulties in question understanding, as in natural language processing in general, is ambiguity, which arises at all levels of processing. Speech-to-text can be foiled by homophones; often knowing the general topic of the question is the only way to disambiguate; consider:

I need new sneakers — where can I get a pair/pear?

I’d like some fruit — where can I get a pair/pear?

Knowledge of typical information needs or specific search queries can even help with this:

How do I clean a flu/flue/flew?

In this case, a request for cleaning a flu or cleaning a flew would be quite rare compared to cleaning a flue. Exploring multiple homophonic words from text-to- speech and then selecting among them based on contextual awareness can increase accuracy.

And even when the specific words are recognized correctly by speech-to-text, knowledge of topic categories is critical to help with disambiguation; consider:

I’d like to gamble — where is the Taj Mahal? (Atlantic City)

I’d like to travel — where is the Taj Mahal? (Agra, India)

What is needed is a taxonomy or ontology of possible topic categories, together with statistics of how often users request information on particular topics and combinations of topics, with certain words and phrases, in given contexts.

To illustrate this, suppose a user asks for a “nearby jaguar park.” Does the user mean a place to park a car or to view wild cats? The VA can disambiguate the sense of this request by matching the taxonomic categories for the different words in the query – “jaguar” and “park” – with each other and with commonly used concepts. If we consider taxonomic concept similarity³ (on a scale of 0.0 to 1.0) for this example, we get:

This shows how a conceptual taxonomy can bring clarity to a query which is ambiguous on a purely lexical level. The taxonomic structure also permits consideration of other concepts along hierarchical pathways, enabling the prediction that “nearby jaguar parks” are also similar to “Wolf Sanctuaries,” a sub-type of Wildlife Sanctuaries, as well as super-types like “Wildlife Facilities,” or “Animal Disease Research Institutes.”

Such information is quite valuable to question answering systems and Virtual Agents. Even if questions are to be primarily processed by humans, as currently in Facebook’s M service (working in tandem with its text understanding engine, DeepText), comprehending the topic categories of a question allows it to be routed to specialists in the relevant topics, and to help people deal with the question appropriately. This will become more of a common practice as more service and retail organizations adopt chatbots as a useful type of interface. Using this information, the bot can better establish deep context, and successfully interface with the most relevant third party service or vendor on the web, instead of relying on one or two of the most generic sources. This means that although Wikipedia may return relevant answers on the ‘Cubs 2016 starting rotation’, they may not be as useful as the answers that could be supplied by ESPN or the MLB.

Question Context

“If we can understand text, we can help people connect and share in a lot of different ways.”

— Hussein Mehanna, Director of Engineering, Facebook, Interview with Mike Murphy, Quartz News, June 1, 2016

As noted, knowing the context of a question is essential to properly understanding it, most easily seen in the cases of ambiguity (“Where is the Taj Mahal?”) or under-specification (“How do I get there?”) Disambiguation is currently based on general statistics of the frequency of different kinds of questions. Systems may assume that “Taj Mahal” means the mausoleum rather than the casino, regardless of what the user actually wants. To do better, systems will need to understand information about the situation, including the individual user’s location and recent activity, while tracking the topics used in conversation and the interests of similar users.

An explicated taxonomy of possible topics can enable such tracking, as illustrated in the following figure. With a classification of the conversational topic, the same question can be interpreted correctly in very different ways, depending on the context.

Different interpretations of "Where Can I Get a Triumph?"
based on knowing the contextual topic category.

Similarly, topic tracking can help with determining question focus. If someone just asked for job applications or résumé templates, then the focus of “Who sells coffee that pays good wages?” is probably the seller-as-employer, based on the contextual category of “Jobs.”

The same idea applies to modeling user preferences. The topical categories mentioned by a user can form a profile of the user’s interests. Systems would need to combine information about a user’s general interests with the specific context of a particular conversation, but having a taxonomic representation is necessary.

Semantic Response Relevance

Users demand VAs return accurate answers to their questions, relevant to their needs, and useful for them in context. Current systems do this reasonably well for simple common questions such as “Where is the nearest gas station?” or “When did Abraham Lincoln die?” but fail for more complex questions such as “Does New York or Chicago have more pizza shops?”, “How does someone contract Leukemia?”, “What does the United Nations do?”, or “What caused the housing bubble?”

Long-term, improvements will require sophisticated language understanding methods that extract detailed representations of the meaning of documents as well as of questions. To achieve this, large-scale structured (labeled) knowledge is a vital ingredient, while at the same time will immensely improve response accuracy, relevance, and usefulness at all stages of the process. This is both by improving overall question understanding, as well as by injecting useful information into the response construction process in various ways.

At the input level, a topic taxonomy can be used to index known questions and their answers, which will enable VAs to reformulate an original question in different ways and make it easier to find a correct answer. For example, if the question “Where can I find black high-tops?” is already in the VA’s knowledge database with a high-quality answer, connected to a given topic category (for example, HIGH-TOP SNEAKERS) other similar questions that can be classified to the same category could be matched with that known question, enabling them to be answered directly as well.

So questions such as:

Where can I find black hi-tops?
Where can I find black above-the-ankle sneakers?
Where can I find black high Chuck All Stars?
Where can I find black Freestyles?

would all be mapped to the same high-quality answer. The virtual assistant could answer questions by searching a database to lookup a question and retrieve the corresponding answer. This would significantly expand the list of possible questions that can be answered correctly by a Virtual Agent, improving its retrieval rate.

Adding the topic category or categories to an input query can improve retrieval of likely relevant documents, by limiting attention to those collections most likely to be relevant and by increasing the scores of likely relevant documents. Thus, for the user asking “Where can I get a Triumph?” from the example above with their preceding interactions classified to the PET FOOD category, a VA would retrieve information from web services (including APIs) it knows to offer pet supplies, or look specifically for stores selling “Triumph” whose product descriptions are classified to PET FOOD.

Similarly, answer extraction can be improved by picking out those phrases that are most relevant to the topic categories as well as to the question itself, as below:

For example, if the user asks a VA “What canyons are ridden in the Tour de France?”, the system can return a more useful answer by extracting snippets from its knowledge base about the brand of Canyon bicycles, one of the highly confident topic classifications of the input query, rather than geographical descriptions of the route the Tour de France follows.

Finally, knowing the topic, as well as the type, of the question can be used to filter and rank potential answers by how well they match the kind of answer and the topics that the user desires.

For example, if the user asks a VA “Where can I see Quantico?”, it is crucial to understand that Quantico can refer to the topic of geographic places or to the topic of television shows. The question type, “where” often refers to a physical location — however, the addition of the action “to see” (without any conflicting signals like “where can I go to see”) weights the question type away from geography, and should return the user information about the ABC television network as more confident than the place in Virginia.

Relevance and Usefulness

Besides helping to provide more accurate answers, using taxonomies to model context can improve relevance and usefulness beyond what is normally seen today. Currently, relevance and usefulness can be improved by using simple external context, such as location; “Find me a good pizza shop” can use GPS to find a nearby shop as opposed to one in another neighborhood or city.

However, by modeling the user’s interests expressed in the current conversation, remembering the user’s general interests, and structuring these within a consistent topic taxonomy, VA systems will be able to better predict what answers to a question will be most relevant and useful to a user in a specific situation.

Consider “Where can I get a reasonably priced attractive suit?” To give a truly relevant answer, a VA must know the user’s gender and age, whether a business suit or swimsuit is meant, whether the user is looking for an online or brick-and-mortar purchase, and what “reasonably priced” means to the user. None of these can be answered by analysis of just the input question, no matter how sophisticated. However, if the system tracks and classifies that the user was previously talking about swimming, beaches, online shopping, or specific upscale or downscale brands, a more relevant and useful answer can be constructed.

Follow-Up and Intent Disambiguation

Even the best modeling of conversational context will not resolve every ambiguity and lack of specificity, however. At times, VA systems will need to reference multiple contextual information sources, as well as general background knowledge. If the user asks for “Jets scores” and the VA knows the user is located in Winnipeg, Manitoba, their intent is probably hockey information. If the user asks for “Jets scores” and is in New York, their intent is probably football information—unless it’s between February and June, when the NHL season is active and the NFL season is completed.

At other times, VAs need to interact with their users and ask clarifying questions. Consider the previous example with ambiguous focus, “Who sells coffee that pays good wages?” A simple follow-up question might be “Would you like to see nearby barista job listings?” but this will be hard to understand for the user who is not thinking about the need of the system to determine the question’s focus. A better follow-up would be “Are you interested in buying some fair trade coffee, or are you interested in jobs in coffee shops with high employee satisfaction?”

To get to that naturalistic response, the VA would need to know that “good wages” is a conceptual attribute valued by both job-seekers and shoppers looking for certain products; the VA would need to know that “good wages” on a career level is linked to employee satisfaction, but on a product level its linked with fair international trade. The system would thus need to have a linked taxonomy of concepts and be able to recognize when terms in a question refer to those concepts.

Relevance and Personalization through Structured Knowledge

“We want, over the next five or ten years, to take on a road map to try to understand everything in the world semantically and map everything out. These are the big themes for us and what we are going to try and do over the next five or ten years”

—Mark Zuckerberg, CEO, Facebook, TechCrunch Keynote, September 11, 2013

The function of structured knowledge, disambiguation, and follow-up goes beyond simple lexical clarifications, and can help at all levels of the process, including speech-to-text. Because successful taxonomies operate with controlled vocabularies or positive/negative business rules, deploying these rules would improve speech-to-text accuracy. Consider the question:

Where can I get pens?

In many accents (particularly in the southern United States), “pen” and “pin” sound very similar. Is the system sure which one was said? If not, it would use its taxonomy with knowledge of grammar to validate the concept.

A hypothetical flow of the VA may be:

The grammatical structure indicates that the word is a noun or part of a noun phrase.
The sound of the word is not clear, but the highest confidence options include “pen” and “pin.”
The VA consults its topical taxonomy to define the scope of possible meanings:
- The topical understanding of “Pen” is associated with noun-based concepts of: writing implements, enclosed spaces, style of handwriting, style of authorship…
- The topical understanding of “Pin” is associated with noun-based concepts of: small devices for fastening pieces of cloth, metal rods for holding machine parts together, jewelry, badges…
The VA applies knowledge of local context of the interaction to narrow the likely meanings:
- If the full phrase spoken to the VA was: “I need to order some glass head p_ns,” applying a topical knowledge base would identify that “glass head pens” do not exist as part of the topic hierarchy, but “glass head pins” do.
The VA applies knowledge of the user’s profile context to narrow the likely meanings:
- If the full phrase spoken to the VA was: “I need to order some ball-point p_ns,” the local context is no longer helpful because there are accurate topical understandings for both “ball-point pens” and “ball-point pins.”
- A pure statistically driven VA might fall back on frequency of usages from all digital interactions to form the list of likely meaning, which would probably weight toward “pens.” However, a more sophisticated VA would recognize if the user had, in a recent prior conversation, asked about other sewing topics, or has a high interaction history in topics of fiber crafts. This user can be delivered their desired answer, in a way that creates bond and trust with the VA for understanding their overall goals.

If these considerations all together do not suffice to disambiguate, the VA would use its topical understanding to pose a clarification question:

Did you want something for writing or sewing? Or for something else?

The process need not end there. If the request is for ball-point pens, the system has identified a broad level of meaning in the taxonomy. It can go on to ask questions that identify more specific details of the request, such as:

What brand would you like?

What color ink would you like?

In order to ask these questions, the VA must have prior knowledge of options, and use it to organize its thinking. Or, even more desirably, some clarifying questions can be skipped if there is sufficient information in the previous conversation to draw the answers from, and reconsider previously ambiguous statements through the contextual lens of the now-established goal.

Finally, the VA should remember key aspects of the disambiguating conversation, when they imply important background knowledge needed to understand future interactions. For example, if the user responded to “Did you want something for writing or sewing?” with “I don’t sew!” the system should remember that sewing is much less likely to be relevant for that particular user.
In order to ask these questions, the VA must have prior knowledge of options, and use it to organize its thinking. Or, even more desirably, some clarifying questions can be skipped if there previously ambiguous statements through the contextual lens of the now-established goal.

Conversational Structure

“I’m super excited about artiﬁcial intelligence, but we like to say that there are probably a dozen or half a dozen miracles needed to really build these out to be truly intelligent things (bots)...getting to that future where we have that truly intelligent thing we can have a conversation with I think is years away.

—Facebook CTO, Mike Schroepfer, Bloomberg West TV, April 14, 2016

Virtual Agents need to be able to interact with their users, and not just provide one-off answers. To converse effectively, VAs must produce both questions and responses that are relevant and natural; to do this, they must understand the broad sweep of the full conversation, not just the current user request.

The VA must recognize how earlier parts of a conversation inform interpretation of later parts of the same conversation. The VA must distinguish which parts of a conversation cohere as part of a larger unit, and which parts are discrete, or one-off, question-answer pairs.

The VA should possess generic knowledge about how conversations are constructed. Many kinds of conversations are built on the skeleton of a script (Schank & Abelson 1977). Scripts establish a rough sequence of actions and information exchange in a particular context. By identifying what scripts are relevant to a conversation, a VA can better constrain the possible interpretations of user utterances and generate more relevant and helpful utterances of its own.

A system capable of conversational interaction will also require a system for tracking where the VA and the user are in an unfolding project, and the ability to chunk the sub-units or elements out of which it is built.

To accomplish this, VAs can draw on the intersection of three basic forms of social organization used to manage complex courses of action:

Sequence Organization:

Speakers use basic grammatical forms to compose questions that initiate the project and distinct units of it (“How can I…”, “When is…”) By contrast they use reduced, or parasitic, grammatical forms with “and prefacing” to pose questions that continue in progress units (“And how much is that?”). They also distinguish between initiating and responsive actions, thereby enabling the parties to move between leading and following.

Practices for Referring to Places, Persons, Time, and Objects

Speakers draw on alternative practices for managing initial and subsequent references to places, persons, dates, times, and things (Schegloff, 1996). The VA may present the user with a variety of specific flight departure times as the initial reference, and the user may respond that they are interested in “the early morning one,” indicating a subsequent reference, rather than a request for an alternative time. VAs can use these references as a method for indicating whether an utterance continues an in-progress sequence or initiates a new one.

Another key is understanding users’ goals and the typical plans they use to achieve them. This will enable systems to better predict what users are likely to say, improving comprehension and appropriate responses. Such goals are often not expressed explicitly, so VAs will need to identify implied goals — for example, if a user remarks that a flight has “a lot of transfers,” they are implying the goal of taking a non-stop flight.

This sort of understanding of the user’s goals and plans in a conversation are essential.

Chatbots

“In China there are more bots put on WeChat every day then there are websites put on the internet. Said another way, WeChat is the internet in China.”

—Ted Livingstone, CEO KIK Messenger , TechCrunch Disrupt, May 11, 2016

“Some 31% of Chinese WeChat users buy from retailers.”

—Mary Meeker, Partner, KPCB, Code Conference, June 1, 2016

Messenger apps like Line, Viber, QQ, and WeChat offer chatbots. Kik recently debuted bots from the Microsoft Bot Framework. It would seem inevitable that Apple’s iMessage and Google’s just-announced messenger app, Allo, (May 18, 2016) would eventually be open to bots. Apple announced during WWDC (June 13, 2016) that it will give developers access to its Messages app (as well as Siri). According to Ted Livingston, CEO of Kik Messenger, 40% of all U.S. teens use Kik Messenger each month. Internet web chat is ranked alongside social media as the most popular contact channel by Generation Y (born 1981 - 1999). In other words, more popular than email or smartphones. According to David Pierce, senior staff writer at Wired magazine, (June 14, 2016), there are more people using message services than there are on social networks and messaging is the “interface of the future.” The table below shows the total monthly active users on selected social networks and messengers, 2011 - 2015 (2016 Internet Trends report).

Messaging Continues to Grow Rapidly
Leaders: WhatsApp/ Facebook Messenger/ WeChat

Monthly Active Users on Select Social Networks and Messengers, Global, 2011-2015

“Chatbots will fundamentally revolutionize how computing is experienced by everybody...so pretty much everyone today who is building applications whether they be mobile apps or desktop apps or websites will build bots as the new interface.’’

— Satya Nadella, Microsoft CEO, Worldwide Partner Conference, July 11, 2016

Consider a company developing a chatbot for ecommerce sales and customer support. Whether deployed on an existing framework like Skype, Kik, Telegram, Facebook Messenger, or Slack, or in their own app or site, the bot should be able to help users find the products or services they want, but also know when it is appropriate to recommend or cross-sell, by understanding the user’s goals and how they are or are not satisfied at each stage of the conversation. As contextual technologies evolve, understanding user preferences, chatbots in essence become intelligent, ie., smartbots. This is already being seen in niche verticals such as health and finance.

Developing all of the knowledge bases and algorithms needed to attain naturalistic conversation is a significant technical challenge, though much research progress has been made. The industry is already gathering and analyzing large amounts of conversational data to extract patterns that can be used to construct script libraries and goal/plan representations. Longer range progress will depend on fundamental advances in systems that can represent and reason about discourse, goals, and plans.

The Role of Taxonomies

The kinds of taxonomies that will be useful will include background knowledge and topics, as well as taxonomies of question/action types, conversation turns, errors, and disambiguation/repair strategies. Improvements in the input data available, both from better analysis of prosodic patterns and from better fundamental natural language processing, will also improve results markedly.

As we have seen, a vital factor for Virtual Agent technology is creating a variety of structured knowledge, including question types, question topics, local user interests, real world context, types of speech acts, conversational structures, user goals, and so on.

The backbone of any structured knowledge representation is a taxonomy — specifying a set of concepts in a hierarchical organization as the fundamental terms of discourse (Davis, Shrobe, and Szolovits 1993). Further structure and relationships can be represented in a more complete ontology, which can enable more sophisticated inference. But the core ingredient is taxonomy; without a solid taxonomy, no other aspects of the structured knowledge will be as useful, or easily scalable.

What should we seek in a taxonomy? Seth Grimes of the Alta Plana Corporation, has set forth criteria for good taxonomies for text analytics; roughly the same criteria apply for taxonomies underlying Virtual Agents. Adapted from Grimes (2014), we have:

Domain Suitability
A system designed for use in medicine, covering pharmaceuticals, diseases, clinical symptoms, treatments, and anatomy would be an odd choice for use in agents geared to helping customers of a retail chain.
Scope
Even a suitable choice may not be a best choice. For instance, a taxonomy that captures hospitality terminology will fall short in analysis of travel reviews if it lacks food service and restaurant coverage. General taxonomies are useful, but must be specific enough in the target domain.
Precision
Detail enables exact classification, the ability to differentiate based on fine-grained characteristics. Look for a level of precision that provides complete domain coverage (breadth) and enumerates all variations and attributes (depth).
Accuracy
Simply put, is the software or taxonomy publisher’s work correct? An important question is whether the taxonomy is built fully automatically, or whether it involves human curation for higher accuracy.
Currency
Is the taxonomy or model frequently refreshed with new categories, nodes (whether companies, brands, products, or people), and attributes?
Ease of Implementation
Does the method work ‘out of the box,’ across all subjects — a distinctive advantage — or is model training (e.g., for machine learning) or rule-writing (language- engineering approaches) required?

It is common to find VAs relying on taxonomies or ontologies from Wikipedia and dbPedia, Freebase (now Wikidata), Wordnet, Schema.org, and potentially other specialized sources.

However, the most effective taxonomies are those which provide a consistent and common operation across data source, input types, styles of speech, and linguistic practices.

A taxonomy that can universally structure and enrich the direct interaction from a user, as well as the other contextual cues that might be available to the VA — app usage, social network engagement, or content consumption, for example — will provide a richer and more consistent experience.

These abilities are also tested through canonical versus colloquial language use. For example, understanding that “RG3,” “RGIII,” or “rg three” all refer to NFL Quarterback Robert Griffin III, and do so with a higher confidence than they refer to a model of Lamborghini or a move in chess.

A taxonomy that offers strong links between taxonomic nodes also provides a distinct advantage. These links may be hierarchical, where nodes are nested in clear supertype/subtype relationships, or ontological, where nodes by links that represent a variety of semantic relations that support various kinds of inferences. With linked data, the system can combine a series of weak signals to provide evidence for another, stronger, signal, giving the system better understanding and the user an improved experience.

Especially for personal experiences, a taxonomy that can be easily or automatically expanded is extremely useful. A system that can be individually tailored, adding to or reorganizing its nodes and connections based on the habits of user, will create more trust with the VA. The user will feel encouraged to interact with the VA more often and in more scenarios, rather than contorting or compromising their natural habits to fit within the rigid boundaries of what the VA originally offered.

A taxonomy-powered agent will understand products, services, and consumer needs at multi- levels, ranging from abstract to specific. A taxonomy-powered agent will understand brands, product classes, products, components, and attributes. A taxonomy-powered agent can bridge the divide between the user’s intent and the vendor which will satisfy the user. Through contextual, semantically structured knowledge, a VA can support progressively difficult queries:

Without a clear understanding of context, a “yeti” could have been understood to be a mythical creature, a crustacean, an airline, a car brand, a bicycle brand, a microphone, or a cooler. But taxonomy-powered agents understand that “arctic hi top” refers to the Arctic Zone Hi-Top lunch box; and by classifying vast numbers of search and social queries, they understand lunch boxes are more frequently mentioned in relation to Yeti Coolers than any of the other possible interpretations.

Together with an understanding of conversational structure and typical users’ goals, the system can successfully complete the exchange to the user’s satisfaction.

Establishing accurate context, and aligning the intent of the user with the proper vendor of the desired product or service, allows the taxonomy-powered agent to leverage specialized services and other structured knowledge across the web. A taxonomy-powered agent can more easily translate and standardize the highly varied language of user inputs with the rigid expectations of existing product and service vendors.

The Route Forward

’’The opportunity here and the excitement should be around these digital assistants...this is a new platform play, it is a race to a single interface.’’

— Gary Morgenthaler, Partner, Morgenthaler Ventures
(seed investor in Siri and Viv Labs), Bloomberg West TV, June 13, 2016

The analysis in this paper has one fundamental conclusion with respect to the main technological challenge that the Virtual Agent market confronts over the next decade or so: The key is structured knowledge and inference, producing rich context.

A Virtual Agent needs to represent and effectively use knowledge about the world, about its users, about typical tasks, about conversational structure, about conversational errors and repairs, and so forth. While machine learning and data analysis can help and will be essential, high-quality knowledge bases curated by human experts are well suited to improving VA technology to create more human-like exchanges.

Philosophers of language, cognitive scientists, and social psychologists agree that establishing the relevant “context” for understanding human utterances and actions poses significant analytical problems. And yet human agents manage this task routinely.

Virtual Agents need to perform this task at a high, if not human, level to be accepted as natural conversational partners and thus to realize the promise of conversational interfaces. The key will be to attain such levels of performance without having to achieve full human-level intelligence, by finding shortcuts for modeling the most relevant aspects of conversational context.

The present growing market of chat and messenger bots will encourage more companies and developers to participate in Virtual Agent-like exchanges. These Agents will start out with very narrow knowledge bases and conversational limits, constrained to only the products or services the organization offers: ordering a bouquet of flowers, or instructing which toppings to put on a pizza. Users deviating from these tightly scripted interactions will meet ungraceful errors, and will be funneled back into the script. But the data collected through these interactions will be valuable, providing any developer a massive testing ground of individuals and how they ask questions, make requests, and forge conversation patterns.

Structured knowledge, in the form of taxonomies and ontologies, will power more accurate, more responsive voice assistants, virtual agents, and conversational user interfaces. Players that are able to incorporate this key technological ingredient effectively into their systems will likely dominate, particularly due to speed to contextual understanding.

VAs built using these principles will support a wide variety of highly desirable applications, including scalable online shopping support agents; a factual research assistant; a diagnostic agent for health and medical concerns; banking services; a financial planner that can make recommendations based on market analysis as well as individual preferences; or a truly intelligent personal assistant who will remember who you are, what you want, what you do, etc.

In the longer term, Virtual Agents will model their users’ interests, goals, plans, and worlds with greater accuracy and precision, and will use this information to anticipate their needs, understand their requests, and be natural in conversation with them. As they interact with their users more like humans, users will adjust as well, and treat Virtual Agents more as conversational partners.

A fully articulated VA as described above has the possibility to become the first “infomediary” as predicted by senior McKinsey consultants John Hagel III and Mark Stinger in their book Net Worth published 17 years ago (1999, HBS Press). The company that first manages to develop this technology to this level will therefore be well-poised to achieve a large share of this enormous growing market, as they enable users to protect themselves from intrusive advertising messages and to get intuitive and helpful solutions in their daily lives, including task management, research, ecommerce, and more.

June 14, 2017 by eContext

Modern, cognitive computing – the application of adaptive machine learning to diverse real-world challenges – enables an advance from descriptive to predictive to prescriptive analytics. We expect systems to anticipate needs, round up and crunch relevant data, evaluate alternatives, and make targeted recommendations that help us reach our goals, whether they involve choosing a travel route, bringing a product to market, or enabling personalized delivery of precision medicine. We no longer need to be satisfied with static, after-the-fact pictures of what was. Today’s leading-edge solutions predict and suggest actions likely to lead to the outcomes we seek. But while the newer technologies are stunning, they are built on, and will continue to rely on, the power of established, proven classification and analysis methods such as taxonomy. This paper explains why and how.

Machine learning picking up speed

Technology advances lead to higher expectations, which motivate further innovation: A virtuous cycle.

Machine learning is this decade’s great technical advance. The algorithms discern patterns in source data and generate predictive models. The technology has existed for decades, but new sophistication, in the form of hierarchical deep learning, powered by low-cost, on-demand computing resources and fueled by a robust data economy, now make machine learning practical for everyday problems. Yet results remain highly reliant on the choice of inputs and algorithms.

Traditional approaches continue to out-perform machine learning for many of the most common tasks, including especially classification, that are at the heart of so many business processes and decisions. Leading analytics providers continue to apply traditional, high-precision, taxonomy- based classification, for instance, for text and social analysis needs.

Pattern detection and classification are at the heart of search, social listening, and customer engagement, as well as recommendation, media analysis, and market research. In each domain, application of human knowledge, captured for instance via taxonomy, helps deliver the most accurate and relevant insights. Outcomes are more favorable if human expertise trains models, tunes them via active learning, and evaluates and interprets the insights produced. Apply the co-joined technologies to model consumer behaviors and interests, to messaging, video, and voice data, to enhance interactions with virtual assistants. The classification advantage is unbounded.

There are many approaches that harness data and analytics to meet common business challenges. We define analytics as the systematic application of numerical and statistical methods to derive and deliver quantitative information. The power and complexity of approaches has grown, and will continue to grow, hand-in-hand with business (and personal) needs and expectations.

Needs and expectations have evolved beyond descriptive analytics: a first-generation analytics that is essentially a picture of the What of a situation. The questions, however, are still relevant:

Which of your company’s Web pages were visited most frequently, and which sources drove the most traffic and revenue?
How have sales performance and profitability evolved year-on-year, measured monthly for principal product lines, in each region?
How many social mentions has your brand generated recently, and posting at what days and times of day generated the greatest social engagement?

These are important questions; the insights gained in answering them can help you optimize your business. Yet they merely describe. They don’t explain, and they don’t suggest best courses of action.

Enter predictive analytics, a discipline with two basic forms:

Numerical projections: Past performance predicts future outcomes.
Classification: Category Y is a best fit for case/person/object X based on shared or similar qualities or characteristics.

Neither variety of predictive analytics is new, but the state of the science is constantly improving, driven by new methods such as deep learning, by the on-demand availability of inexpensive computing resources, by data culture, and by API-enabled application flexibility.

It’s classification that’s our central interest. Consider common questions such as:

Question	Answer
What are the key points in the Fitbit review posted on Amazon.com, and how did the writer feel about the various product features she mentioned?	Topic, feature, and sentiment extraction, and resolution and disambiguation of identified entities (people, products, places, etc.), are significant classification challenges.
Given the items an individual views, can we recommend additional interesting content?	Classification can create “semantic signatures” of content, of single items and of collections.
Given the words and phrasing of a customer interaction, can we infer inclination to cancel service or just to seek a discount, or perhaps openness to an extension, upgrade, or add-on?	Classification can assign individuals to persona categories and pattern-match particular interactions to understand intent and to identify deception and fraud.

But here’s where analytics gets really interesting, in the jump from predictive to prescriptive…

Prescriptive analytics is about the path to a goal:

We know where we’d like to be. Which actions – which decisions – will take us there?

Think of descriptive and predictive analytics as contributing steps. Take your best-fit predictive model and evaluate what-if scenarios to find the set of controllable conditions that promises to land you closest to your goal.

Prescriptive analytics isn’t easy. Ability to execute fast, exhaustively, and accurately is key. Your modeling choices include machine learning and also traditional methods. The first excels at discerning emergent patterns in big data. Traditional methods, especially for the central classification challenge – where taxonomy, especially when constructed the variety of data domains, excels – provide reliability and high precision. Imagine the advantage that can be gained via a combined approach.

Given: Machine learning is this decade’s great technical advance.

The technology aims to identify, detect, classify, and predict interesting features in source data, both text and structured datasets. The underlying process involves modeling, evaluation, and feedback/reinforcement, the latter making the method “cognitive,” mimicking human learning. The hope is to improve on established methods, to achieve greater accuracy, robustness (model coverage and maintainability), and speed-to-production without sacrificing performance or ability to support the effect. (Availability and cost of data science talent is a significant concern.)

Some of the terminology is esoteric – words such as cognitive and reinforcement – but the concepts are relatively straightforward. In supervised machine learning, the software infers general decision rules – a predictive model – from training data. A human analyst annotates features of interest in a training set, choosing labels from a predefined set of type or categories. (Some organizations use crowd-sourcing for this labor-intensive task.) In unsupervised learning, by contrast, the machine makes a best guess as to the categories, grouping cases with similar characteristics. Feedback or other forms of reinforcement confirm or correct the machine’s choices.

Despite advances, results remain highly reliant on the choice of inputs and algorithms. Model validation to ensure accurate results and reliable performance is an essential step.

Concerns aside, the case for machine learning is clear. The prime motivator is ability to flexibly generate purpose-suited models from data. The ingredients for adoption – low-cost, on-demand computing resources and lots of data – are in place. Steps to put machine learning in production, however, can get quite complicated.

Cognitive is complex; don’t miss context

Consider IBM Watson, an example of a cognitive system. Watson feeds a knowledgebase by combining text-sourced data, extracted via text mining, with information from structured data sources. There’s a curation processing involved: Humans assess, select, and correct acquired knowledge. The system interprets natural-language queries and generates candidate responses. The machine weighs possibilities and offers the answer most likely to respond to the question/ query.

What we have is, in essence, contextualized machine intelligence: a system generated by machine learning and context-focused via classification. The results speak for themselves: In 2011, a Watson computing system that could beat human Jeopardy champions. In 2014, Watson was made available on-demand, via IBM’s Bluemix cloud, and more recently, specialized healthcare, for smarter cities, and for the spectrum of business challenges involving natural language.

Starting very recently, commodity machine learning from a variety of sources, often open source – from Google TensorFlow and Microsoft Azure Machine Learning to startups such as MonkeyLearn and MetaMind – has brought machine learning to the masses. Powerful tools in under-trained hands, however, will not produce best results. The contextualization we’ve discussed can be applied to improve outcomes, systematically, contributing at several stages to the accuracy, relevance, and usability of models and results, as we now discuss.

You seek to model diverse features, the features that matter for your business:

Interests broken out by segments.

Can your technology profile young Latina women, aged 16-19, versus those aged 20-24, and differentiate the goods and services each segment purchases? Or easily interpret different seasonal buying habits of teenage boys living in San Francisco versus San Diego? One classification approach is to take a semantic fingerprint of visited and shared content, for purposes such as recommendation and ad matching.

Apply classification to create a high-relevancy training set.

If you train your model on data that isn’t representative of the sources you’ll use in production, your models will fail to deliver. Consider: Don’t train a sentiment model on a set of movie reviews if you’ll be analyzing Twitter reactions to automakers’ announcements. You’ll mix up Harrison Ford and a Ford Focus.

Instead, draw only from sources that provide on-topic inputs, and apply contextual classification to ensure that each input is relevant. Looking for keyword hits, on “Ford,” say, won’t do the job. You need fine-grained classification to ensure accuracy.

Apply classification for automated, context-sensitive training-set preparation.

Annotation – labeling features of interest – for training-set preparation can be a labor-intensive process. In many cases, you’ll need to hire subject-matter expert annotators. In other cases, you can crowd-source annotation although, due to quality concerns, crowd-sourcing requires careful management. Instead, consider applying linguistic resources to automate annotation.

Start with lexicons and gazetteers, which are lists of terms, names, places, and other entities. A thesaurus lists synonyms: a step more sophisticated but not enough to disambiguate a polysemous term, a term with multiple meanings. (Is Ford a carmaker, an actor, or a president?) You can apply lexical networks, which capture the words that frequently precede and follow a term of interest, and look at co-occurrence of other terms with a given term. Also consider contextual frequency of use whatever the domain. (If you’re working with recent movie reviews, odds are that Ford will be Harrison rather than Henry).

Use the abstraction and detail – concepts and attributes – captured in taxonomy to boost annotation breadth and precision.

“Ford” belongs to the conceptual class (category) of vehicle manufacturers, along with Toyota, Fiat, GM, and others. Here, we’re climbing up a level of abstraction in our classification taxonomy. And Ford vehicle models include Focus, Mustang, and F-150… descending a level. In effect, use of taxonomy allows you to provide implied annotations, for instance, to label a car-model instance with a tag for manufacturer, even when the manufacturer’s name isn’t explicitly present in the training data.

Use classification resources in a reinforcement learning approach that will enhance your models and keep them current.

Currency is ensured by using output corrections to adaptively retrain the ML- produced model. And model enhancement: An example is use of taxonomy to associate entities and topics to annotate and make point-searchable a video based on words spoken in the video’s sound track.

One other potential accuracy booster we’ll mention: Use of an ensemble approach. Combine outputs of multiple methods – perhaps machine-learned and traditional – to arrive at a best- consensus result.

These are some of the many ways to improve machine learning accuracy via classified context. Creative minds can surely come up with others. Where can these methods be applied?

Finally, we consider the question:

Who can make best use of contextual machine learning approaches?

We choose a few representative examples for purposes of illustration.

Brands, agencies & marketers study social status updates, consumer-generated reviews and forum postings, survey responses, e-mail and other customer contacts, and other, diverse insight sources with requirements that include:

Audience and market profiling and segmentation.
Identification of behavioral signals that predict commercial activity (e.g., ‘path to purchase’).
Social listening, customer engagement, sentiment analysis, and customer experience management.
Competitive intelligence.

Social media platforms & online publishers serve constituents who include visitors and subscribers – who both consume and produce content – advertisers, syndicators and aggregators, and their own editorial and business needs. Their analytics-dependent tasks include:

Data monetization, applying classification for topic tagging and
Whom-to-follow
Ad matching, and content recommendation, based on the semantic signatures of the ad/content and of the visitor, based on content consumption.

Retail, manufacturing, and logistics deal with often-huge counts of product and service items and their categories, components, attributes, and specifications, as well as associated information describing usage scenarios, events, and sentiment. Analytics powers functions such as:

Search keyword expansion to capture product categories and attributes.
Self-service customer support, via semantic search that understands categories, components, and attributes.
Product recommendation, matching product and visitor profiles.

But these are only examples. Really, the answer to our opening question, “Who can make best use of contextual machine learning approaches?” is:

Anyone with a lot of data – hence the applicability of machine learning – and with a real world problem where common-sense knowledge comes into play.

This paper was written for data scientists, software developers, marketing analysts, product managers, and the executives who work with them, crafting organizational data strategy. The assumption is that you and/or your colleagues have an aptitude for data wrangling and a degree of coding experience, whether for data analysis or product creation. That is, you have the facility necessary to work with the tools and techniques discussed in the paper. We assume that you’re currently applying machine learning to pressing analytical tasks or have an initiative in the works.

You’re looking to maximize model performance – precision and results relevance in particular.

The choice of machine-learning methods is out of scope for this paper – although we’ll offer the hints that a) recurrent neural networks offer best results for text and other sequence-dependent data and b) supervised methods, with models built from annotated training data remain quite popular, for good reason – so we’ll focus on implementation of the ML-classification hybrid we’ve been describing.

Proof via prototyping

programming interfaces – or by devising a processing pipeline where the output from one step is fed as input to the next. The focus should be on creating a repeatable process that will generate reproducible results with consistent performance. Given the plethora of cloud-deployed services and installable components available, and the possibility of scripting your own workflow for experimentation or for production deployment, there are few barriers to prototyping and development via an agile, iterative approach. Go for it!

Do prototype use of taxonomy. Focus on a detailed, holistic understanding of consumer conversations across categories to allow not only for right-level classification – by category, topic, brand, product, component, or attribute – but also providing indexation multiples – measures of in-category frequency expectations – that help you assess contextual significance.

Test on your own data, judging the correctness of results for yourself and evaluating the boost that contextualization, via classification tools, provides in test cases. You’ll wish to assess classified context at multiple process points, as described in Section IV of this paper. Use of an on-demand processing service with a subscription model will allow you to make efficient use of resources and manage costs. Do ensure that the system not only meets accuracy and performance needs, providing analytical lift, but also that it has the capacity to scale to meet production needs.

This paper has described classified context, a technical approach that boosts the accuracy of models built via machine learning. Classified context improves training-data relevancy. The approach provides for rich, expanded training-data annotation and supports model validation and reinforcement learning

The hybrid is contextual machine learning, analytical modeling for text-rich business applications drawing on social and other online media and a spectrum of enterprise information sources.

Contextual machine learning makes the most of analytical advances, the data economy, and human expertise, as captured in traditional classification methods, notably taxonomy.

Prototype with your own data, using best-fit machine-learning algorithms, and experience the advantage for yourself.

June 14, 2017 by eContext

I. INTRODUCTION

This paper addresses the numerous challenges of automated text data handling, for applications that range from social intelligence and user-experience design to search marketing to and contextual advertising. It explains advantages delivered by use of a particular technology, taxonomy, for a key technical step, classification.

This paper describes the technology and application scenarios: The What of taxonomy and alternative approaches, and How and When to apply taxonomy-centered solutions for optimal results. The Why of taxonomy is the business advantage that stems from accurate and reliable automated processes: Customer satisfaction, sales conversion, efficient support, retention, and monetization.

II. BUSINESS AND TECHNICAL CHALLENGES

We live in a world of high-velocity, high-volume, online, social, enterprise, and device data. The Internet of Things (IoT) is an emerging reality. Smartphones, sensors, machinery, and servers generate extraordinary volumes of data. The IoT is complemented by an Internet of People (IoP) – consumers, influencers, our communities, competitors, and the broad public – you and me. We create and consume a growing amount of diverse content, across multiple devices and often while on the move, whether for business or personal purposes. We share our news, views, and needs, via messaging and apps, review sites, and online and social media as well as via in-person encounters and traditional documents and channels.

Understanding consumers, personalizing experiences

It is common practice to track and measure customers’ and public activities online, on-social, and in the public sphere, subject to privacy constraints, and when appropriate, to engage, to respond.

The aim of public-facing organizations, whether commercial or non-‐profit, is to use the data generated to understand individuals’ needs and better serve them. Business-to-business functions face similar needs. Automation is a necessity; nonetheless, our audiences expect personalized, one-to-one experiences. To support appropriate, efficient, and productive user, customer, and business interactions, accurate, reliable, and flexible methods are essential.

Diverse technical challenges

There are diverse technical challenges. We can summarize them in a few bullet points. When we automate text processing, we must:

Accurately identify content – articles, ads, product descriptions, service information, reviews, and the like – relevant to given audiences, outlets, individuals, and occasions.
Deal with the vagaries of natural language – the misspellings, fractured grammar, slang, and idiom that are common in everyday written and spoken
Match content analysis and delivery to situational needs, accounting for context, profile, demographics, and behaviors
Move beyond keywords, dealing with ambiguity and taking into account interest categories, detailed attributes, associated topics, and related
Operate in right-time, whether the need is low-latency processing of streaming data, fast response to an online inquiry, or trend detection and forecasting via data mining and predictive

Technology choice

An array of technologies attempt to help you meet these business and technical challenges. Some have been proven through years of experience. Others are experimental. It is critical to choose the right method, sensitive to the nature of your data and analytical or operational needs, with the reliability and performance you require.

TECHNICAL APPROACHES

Technical challenges may seem daunting, answerable only by complex systems of language rules, statistical algorithms, or highly sophisticated machine learning. Yet in many instances, straightforward, simpler techniques perform best. Taxonomy is a textbook example.

Let’s look at taxonomy and at other language-analysis methods, before taking on application scenarios. We’ll start the look at taxonomy with a definition.

Taxonomy 101

Taxonomies group things of particular types or categories into hierarchies. They model relationships – “this is an instance or example of that” – often to several levels of depths, from detailed to general. Taxonomy’s origin dates back over two thousand years, to Aristotle and other ancients. They sought to describe, via categories, everything that exists or can

exist. Carl Linnaeus’s 18^th-century classification of the natural world is a wonderful, well-

known example. Each branching of the tree distinguishes subgroups that, while they share enough characteristics to be grouped together at the branch point, may be differentiated by other, more-detailed characteristics. Take the example of mammals, distinguished from birds, insects, fish, and other animal classes by milk production.

How would an automated process apply taxonomy? By matching words and terms found in a given text item – in a Tweet, e-‐mail messages, article, or document – to a taxonomy, preferably accounting for usage density. The aim is not only to classify the item, but also to boost item usefulness – for search, classification, and information usability – by identifying (per the fourth technical challenge, above), broader categories, detailed attributes and related concepts.

Taxonomy + technology

The capacity and performance of modern computing allows for the creation of deep and broad taxonomies that capture the objects and types within a given industry or business along with their many attributes. The same Web-crawling technology that harvests new content for Google indexing – and the same social media monitoring technology that underlies listening and engagement solutions – can be applied to identify new, topical things for inclusion in industry-domain and business-‐function taxonomies. Modern technology allows taxonomy-based solutions to be applied for a diverse set of challenges that answer just the sort of online and social business needs we have examined.

Contrast: Linguistic analysis, statistics, and machine learning

The information content of documents, messages, speech, and search queries (to name a few of the many forms text takes) is communicated via words, organized into phrases, sentences, narrative, and conversations. Let’s put aside emoticons and emoji and creative punctuation (!!!), while word morphology (tense, conjugation, declension, plurals, and gender agreement), misspellings and abbreviations, and every sort of grammar and syntax used “in the wild” remain in-bounds.

Parsing and counting

Some natural language processing (NLP) methods seek to decode subject, object, and verb – this approach is called part-of-speech tagging – in order to understand relationships among the things the words refer to. A more basic first step, however, after tokenizing individual words, is named entity recognition via look-up of person, place, company, and product names in lexicons or gazetteers. Some tools rely on language rules that encode generalized associations among words; for instance, “Mrs. <word>” probably indicates a person while “state of <word>” may name a place (or a condition such as confusion). Rules trade the exactness of list look-up for flexibility that may identify things not already on your list.

More sophisticated is theme or topic extraction via simple word and term counts or via statistical clustering. We might use word adjacency or co-occurrence to infer attributes, and further, lexical chains and word networks help us decide the contextual meaning of ambiguous terms. Other computational linguistics algorithms resolve coreference, multiple ways of referring to a given thing (e.g., Barack Obama, Mr. Obama, the President, and in certain cases, the pronouns “he” and “his.”)

Decoding language is hard; decades of research and development have been dedicated to the task. As a result, natural language processing approaches may be quite involved, especially when they need to deal not only with multiple human languages, but also with bad grammar, misspellings, acronyms, slang, and sarcasm. As an automation end run, or to facilitate creation of linguistics lexicons, networks, and rules, we have machine learning.

Machine learning

The term machine learning covers many methods. A basic distinction is supervised methods, which build models from training data, and unsupervised methods, where the software creates a classification model from whatever the algorithm determines is statistically interesting. Machine learning can work quite well, if you have enough model-building data and also if you retrain your models to keep up with new terminology. But machine learning on its own will never deliver the descriptive precision – the exact modeling of a knowledge domain – possible with a well-‐crafted taxonomy.

SMART SELECTION

Recapping, our analytics aim is to fully understand customers’, prospects’, and market needs in order to optimize product and service delivery, creating excellent experiences and boosting satisfaction, loyalty, and profitability. Our audiences expect personalized, one-to-one experiences.

Every aspect of the customer journey – and of supporting market research, product and service design, and demand forecasting – relies on accurate classification technology.

Classification is at the heart of describing, modeling, and predicting individuals’ interests, behaviors, and affinities, based on pattern detection and demographic and behavioral profiling. It enables software to create topical “semantic signatures” of online and social content, in order to facilitate automated matching and recommendations, and also to avoid negative associations.

Use Cases

There are many situations where taxonomy-based classification is the most accurate and effective classification approach. Top-value scenarios include:

Online Advertising: Match the taxonomy-derived semantic signature of a web page or social content to a precisely tagged ad, and you have precision ad placement. You can rely on exactness not possible via statistical topic extraction, which won’t model the important conceptual relationships captured in a taxonomy. And if you don’t want a car ad matched to an article on an accident? A taxonomy can record negative associations that support “don’t show” instructions.
Recommendation Engines: eBay’s “See what other people are watching” and Amazon’s “Customers who viewed this also viewed” are based on behavioral patterns. Useful, but what if a product is new or less popular (that is, there’s little viewing history) or your online storefront hasn’t attracted the Amazon-scale visit volume needed to build useful behavioral models? Taxonomy based classification doesn’t rely on behavior modeling. It can associate, for example, screen protectors, cases, earphones, and other accessories with cell phones, by manufacturers, model, and even color, with unequaled
Social Media: Social practices have focused largely on listening and engagement, with predictive analytics – an essential tool for next-generation online commerce, market research, brand and product management, and even government policy – only infrequently deployed in the social sphere. A deep and broad taxonomy would classify and associate social users' interests, behavioral patterns, and intent, advancing the cause of social in
Customer Service/Support: Imagine a contact-center agent interaction, or an e-mail message requesting product support. We can speed response to these and similar inquiries if we can automate routing of the inquiry and identification of content that answers the need and also associated components and potential issues. These functions call out for a mechanism that interrelates items (products and services and their components), attributes (size, features), concepts (repair, refund, failure), and content (guides and instructions). Taxonomy provides that mechanism, via integration into customer-‐interaction systems and search engines.
Consumer Insights: Consumers use a very rich vocabulary of nicknames, terms, and slang in referring to products and Via a taxonomy, you can map “in the wild” language used on online and social platforms and in e-mail and surveys to a controlled vocabulary of terms, in order to normalize the data you’ve collected, to support analysis and insight extraction. As an aside: Location is an important

Consumer-insight component. Taxonomies provide an excellent mechanism for handling geographic reference data.

Demand Forecasting: A deep taxonomy may capture both high-level information about brands and product categories, and detailed specs covering features and attributes. A taxonomy’s hierarchical structure allows for classification of forward-‐ looking market insights at multiple At a category level, we might ask “What class of toys will be hot next Christmas, given advance buzz?” At a feature level, the question becomes “How are consumers’ mobile phone battery-life expectations trending?” You can get multiple roll-ups from a single set of information sources without the expense of complex rule sets or of training and managing multiple machine-learning-built models.
Competitive Intelligence: A statistically rooted method, fed enough data, may infer that iOS and Android are both mobile-device operating system. Through co-occurrence analysis, it may determine which vendors sell Android devices. A combination of named-entity recognition and measure-value extraction will pull sales figures by device or vendor out of published news articles: impressive but insufficient if your need is to aggregate information that responds to multiple attributes, per this example. By contrast, a taxonomy may allow you to short-cut the statistical modeling – human experts are typically great judges of topical information-organization – while supporting roll-up to comparable classes and categories at the level of detail you desire, an important supporting capability for competitive intelligence.
Search & Site Retargeting: Have you searched online for a product, or visited a vendor’s website, and then been repeatedly subjected to product ads when visiting other sites? That’s search and site retargeting at work, a special form of advertising You’ll see retargeting ads even if you’ve already made a purchase – even if you don’t search again for that product or visit product-relevant pages – because the tracking networks are ignorant of context. Further, tracking is a challenge on non-cookie based smartphones so that accurate contextual mapping of content is particularly vital. A taxonomy-based solution will allow targeting based on the semantics of the content and not based only on past behaviors.

These scenarios include direct application of the technology, through domain-adapted user interfaces and also as part of larger analytically reliant solutions, where analyses are invoked as-‐a-‐service, via application programming interfaces, with workflow managed by a purpose-‐ built solution. Keep in mind that implementations are not either/or. A well-constructed taxonomy will complement other methods, handling classification needs when other methods fall short.

WHAT TO LOOK FOR IN A SOLUTION

Different tools have different capabilities, different uses and strengths. In application of taxonomy (or any other technology) to text analysis tasks, you’ll want to identity the solution that best meets your needs. Here are attributes that will influence your selection:

Domain suitability: A system designed for medical research, covering pharmaceuticals, diseases, clinical symptoms, and anatomy, would be an odd choice for hospitality-industry market research or customer service. Adaptation may involve interfaces and algorithms.
Task suitability: We accomplish tasks via multiple steps, often involving a series of decisions. A graphical user interface will offer ease-of-use in accomplishing the tasks it was designed for, but it may dictate a certain workflow and output choices. So for certain tasks, you’ll want the flexibility of building a solution from a componentized, or as-a-service, technology choice.
Scope: Even a suitable choice may not be a best For instance, a taxonomy that captures hospitality terminology will fall short in analysis of travel reviews if it lacks food service and restaurant coverage.
Precision: Detail enables exact classification, the ability to differentiate based on fine-grained characteristics. Look for a level of precise that provides complete domain coverage (breadth) and enumerates all variations and attributes (depth).
Accuracy: Simply put, is the software or taxonomy publisher’s work correct?
Currency: Is the taxonomy or model frequently refreshed with new categories, nodes (whether companies, products, or people), and attributes?
Ease of Implementation: Does your method of choice work ‘out of the box,’ across all subjects – a distinctive advantage – or is model training (e.g., for machine learning) or rule-‐writing (language-‐engineering approaches) required?

These attributes reflect familiar concepts. Domain and task suitability are related to relevance. What we call scope here is similar to search recall, or result-set completeness. Accuracy and currency concerns are universal and independent of the method.

There is a temptation to choose a solution based on source or on endorsements. While these are important factors, also consider the provider’s industry experience, objectivity, and record providing exemplary customer service.

ABOUT THE AUTHOR

Seth Grimes consults on business applications of text analytics, sentiment analysis, and data visualization. He founded Alta Plana Corporation in 1997 and the Sentiment Analysis Symposium conference series in 2010. Follow him on Twitter at @SethGrimes.

ECONTEXT

This paper was sponsored by eContext, a SaaS text classifying technology. eContext discovers insights and intent in vast amounts of data to give brands, publishers and marketers unique and valuable advantages. eContext is a division within Info.com.

Machine Reading Comprehension

Technical Challenges

Social Assistance with Personal Projects

Questions and Requests

Interrogatives (grammatical questions)

Declarative statements

Imperatives

Questions: The Easy and the Hard

Easy Questions

Hard Questions

Technological Market Drivers

Understanding

Response Relevance

Conversational Structure

Question Understanding

Question Types

Question Topics

Question Ambiguity

—Andrew Ng, Chief Scientist, Baidu, Bloomberg West, May 23, 2016

Question Context

— Hussein Mehanna, Director of Engineering, Facebook, Interview with Mike Murphy, Quartz News, June 1, 2016

Semantic Response Relevance

Relevance and Usefulness

Follow-Up and Intent Disambiguation

Relevance and Personalization through Structured Knowledge

Conversational Structure

Sequence Organization:

Practices for Referring to Places, Persons, Time, and Objects

Variations in the Pitch, Pace, and Volume of the Talk

Chatbots

The Role of Taxonomies

The Route Forward

Machine learning picking up speed

Taxonomy + ML for better predictions

Enter predictive analytics

From predictive to prescriptive

Given: Machine learning is this decade’s great technical advance.

Cognitive is complex; don’t miss context

Not just terms, but also higher-level categories and detail-level components and attributes.

Temporal – sequential – associations of terms, categories, elements, and attributes.

Interests broken out by segments.

Significance.

Apply classification to create a high-relevancy training set.

Apply classification for automated, context-sensitive training-set preparation.

Use the abstraction and detail – concepts and attributes – captured in taxonomy to boost annotation breadth and precision.

Use classification resources to test machine-learning outputs, for model validation.

Use classification resources in a reinforcement learning approach that will enhance your models and keep them current.

Proof via prototyping

I. INTRODUCTION

II. BUSINESS AND TECHNICAL CHALLENGES

Understanding consumers, personalizing experiences

Diverse technical challenges

Technology choice

Taxonomy 101

Taxonomy + technology

Parsing and counting

Machine learning

Use Cases

Ready to learn more?

Chicago

London

INQUIRIES

Ready to learn more?