where content, technology and people meet. (SM) Publishing and content technology executives use Shore to measure and understand their markets and competitors, define marketing strategies and implement successful content products and services using Shore's highly actionable insights into vendors, institutions, individuals and virtual communities.
ContentBlogger is the 2007 SIIA CODiE Award Winner for Best Media Blog
COMMENTARY:

Insights and headlines from Shore analysts on trends in enterprise and media content markets.
  Subscribe to our feed (?) or add to: MyYahoo  iGoogle/Google Reader  Bloglines  NewsGator  Rojo
Monday, September 07, 2009
The word "semantic" is bandied about quite a bit these days in online publishing, a term that is used to label everything from systems that automatically categorize content based on the presence of key terms in its body to more human-assisted forms of content organization. Whatever the particular technology or methodology, though, using the language structure of content and queries to infer more than merely the presence of key terms or concepts can get a little tricky with content on the open Web. An example of the challenges found in implying meaning from both search queries and related online content surfaced recently with the launch of the new HealthBase online portal.

HealthBase is a showcase for the technologies of NetBase, a Mountain View, CA-based company specializing in using semantic language processing to unearth relationships in conent collections not easily revealed by traditional keyword technologies. NetBase claims that HealthBase can help people to sort through Web content to find solutions to medical problems by parsing their queries through natural language semantic filters and then using semantic processing to find content organized by specific aspects of possible causes and solutions for medical problems. While HealthBase attracted some kind words from Search Engine Land, some test queries by Technorati delivered less flattering results. For the search query "aids," for example, a list of possible causes identified in Web content by HealthBase included "Jews," based on HealthBase interpreting the word "aids" as the word describing assisting people rather than the disease's acronym. The possible cures for this possible cause for "aids" included "salt" and "alcohol."

There can be little doubt that NetBase took an enormous risk by exposing its cutting edge technology in an open Web service focused on something as critical as healthcare, a field in which services from many well-funded providers have been focused for several years online. With many people doubting the reliablity of the Web as a source of medical information, glitches in a new service are not likely to make people feel more comfortable with using online content from unvetted sources to consider courses of treatment. But the real problem is not the NetBase technology so much as the expectations of how well some technologies can deal with a wide array of semantic issues found in subject domains only tangentally related to a field of science.

The idea of exploring sources of content using semantic tools to parse out possible causal relationships can be made to work, but these technologies need a lot of pre-defined context to guide their efforts. For example, semantic analysis tools tend to work well on documents that are either highly structured - say, a research paper abstract or a news article in which a lede paragraph contains key information in a fairly structured pattern. To get semantic processing working on more unstructured sources of content such as emails, Web pages and other more open-ended content formats requires a lot of "training data," documents that are typical of successful matches for a given domain of information. Similarly, search engines or databases that use natural language processing to infer a particular kind of topic from a query entered in a text interface may lack enough words to infer the right kind of context to be implied from those words in relation to a specific subject.

Keyword-0riented search engines such as Google remain popular in part because they don't try to infer too much semantic knowledge from a given query. Instead, they rely on the human understanding of the semantic context of a given keyword - for example, looking at the number of people visiting or linking to a page that appears to be a match - to help select possible matches for a given keyword. Type "aids" into Google, for example, and you get a lot of documents relevant to the disease AIDS. If you had this type of collection as a starting point and then applied semantic filters to look at causal relationships, then you'd probably be in a better context for applying domain-specific semantic processing tools.

Semantic processing applied in the manner of HealthBase can help to expose exciting possible relationships between different sets of content that may have otherwise never surfaced, making its potential worthy of being taken very seriously. But like someone trying to learn a foreign language by just walking down the streets of an unfamiliar country, applying the assumptions of one subject domain to any number of generally unrelated domains is not always the most efficient or reliable way to discover the most obvious causal relationships. Being able to learn and to apply lessons rapidly from a wide range of experiences is key to making such semantic processing work effectively. To some degree these kinds of services must offer "self-learning," that is, the ability of the semantic technology to be trained to recognize automatically when it's made mistakes based on human input and to be tuned rapidly by humans who will understand complex semantic relationships more rapidly than most software.

No doubt HealthBase will benefit from such tuning over time. The expectations of people looking for concrete causal relationships, though, may take more time. HealthBase is an exciting experiment in technology, which will benefit from more experiments in how to apply these technologies effectively to specific market needs.

Labels: , , ,


By John Blossom - posted at 2:14 PM
permanent link to this entry        bookmark this entry:  AddThis Social Bookmark Tool
  4 comments (click to view or to add your own) 
 

To top of page To Top of Page

COMMENTARY: INDEX
CONTENTBLOGGER
INDUSTRY EVENTS
CONTENT NATION

Read ShoreLines, our free weekly email newsletter.

Sample issue
Follow us on Twitter
Get headline-only feed
Buzz news comments
RECENT ENTRIES
READ CONTENT NATION

Learn how to thrive and to survive as social media changes our work, our lives and our future.
Buy the book
Read it online
Read our social media blog
WEBLOGS: ARCHIVES
 
 

shorename.gif (1190 bytes)
[HOME] [US] [SERVICES] [COMMENTARY] [RESEARCH] [EVENTS] [PRESS] [CONTACT]
Copyright © 1997-2009 Shore Communications Inc.  All Rights Reserved - Click Here to Read Terms of Use
Corporate Privacy Policy

 

 

 

 

 

 

 

 This page is powered by Blogger. Isn't yours?