 |
|
Insights and headlines from Shore analysts on trends in enterprise and media content markets.
|
|
|
| Monday, September 07, 2009 |

 The word "semantic" is bandied about quite a bit these days in online publishing, a term that is used to label everything from systems that automatically categorize content based on the presence of key terms in its body to more human-assisted forms of content organization. Whatever the particular technology or methodology, though, using the language structure of content and queries to infer more than merely the presence of key terms or concepts can get a little tricky with content on the open Web. An example of the challenges found in implying meaning from both search queries and related online content surfaced recently with the launch of the new HealthBase online portal.
 HealthBase is a showcase for the technologies of NetBase, a Mountain View, CA-based company specializing in using semantic language processing to unearth relationships in conent collections not easily revealed by traditional keyword technologies. NetBase claims that HealthBase can help people to sort through Web content to find solutions to medical problems by parsing their queries through natural language semantic filters and then using semantic processing to find content organized by specific aspects of possible causes and solutions for medical problems. While HealthBase attracted some kind words from Search Engine Land, some test queries by Technorati delivered less flattering results. For the search query "aids," for example, a list of possible causes identified in Web content by HealthBase included "Jews," based on HealthBase interpreting the word "aids" as the word describing assisting people rather than the disease's acronym. The possible cures for this possible cause for "aids" included "salt" and "alcohol."
There can be little doubt that NetBase took an enormous risk by exposing its cutting edge technology in an open Web service focused on something as critical as healthcare, a field in which services from many well-funded providers have been focused for several years online. With many people doubting the reliablity of the Web as a source of medical information, glitches in a new service are not likely to make people feel more comfortable with using online content from unvetted sources to consider courses of treatment. But the real problem is not the NetBase technology so much as the expectations of how well some technologies can deal with a wide array of semantic issues found in subject domains only tangentally related to a field of science.
The idea of exploring sources of content using semantic tools to parse out possible causal relationships can be made to work, but these technologies need a lot of pre-defined context to guide their efforts. For example, semantic analysis tools tend to work well on documents that are either highly structured - say, a research paper abstract or a news article in which a lede paragraph contains key information in a fairly structured pattern. To get semantic processing working on more unstructured sources of content such as emails, Web pages and other more open-ended content formats requires a lot of "training data," documents that are typical of successful matches for a given domain of information. Similarly, search engines or databases that use natural language processing to infer a particular kind of topic from a query entered in a text interface may lack enough words to infer the right kind of context to be implied from those words in relation to a specific subject.
Keyword-0riented search engines such as Google remain popular in part because they don't try to infer too much semantic knowledge from a given query. Instead, they rely on the human understanding of the semantic context of a given keyword - for example, looking at the number of people visiting or linking to a page that appears to be a match - to help select possible matches for a given keyword. Type "aids" into Google, for example, and you get a lot of documents relevant to the disease AIDS. If you had this type of collection as a starting point and then applied semantic filters to look at causal relationships, then you'd probably be in a better context for applying domain-specific semantic processing tools.
Semantic processing applied in the manner of HealthBase can help to expose exciting possible relationships between different sets of content that may have otherwise never surfaced, making its potential worthy of being taken very seriously. But like someone trying to learn a foreign language by just walking down the streets of an unfamiliar country, applying the assumptions of one subject domain to any number of generally unrelated domains is not always the most efficient or reliable way to discover the most obvious causal relationships. Being able to learn and to apply lessons rapidly from a wide range of experiences is key to making such semantic processing work effectively. To some degree these kinds of services must offer "self-learning," that is, the ability of the semantic technology to be trained to recognize automatically when it's made mistakes based on human input and to be tuned rapidly by humans who will understand complex semantic relationships more rapidly than most software.
No doubt HealthBase will benefit from such tuning over time. The expectations of people looking for concrete causal relationships, though, may take more time. HealthBase is an exciting experiment in technology, which will benefit from more experiments in how to apply these technologies effectively to specific market needs. Labels: healthbase, natural language processing, netbase, Semantic Web
|
|
By John Blossom - posted at 2:14 PM |
permanent link to this entry
bookmark this entry:
|
|
|
|
4 comments (click to view or to add your own)
|
| Thursday, June 26, 2008 |

 It seems like only a few weeks ago that I was blogging about semantic search startup Powerset's soft-launch beta. In fact, it WAS only six weeks ago that we were covering Poweret's soft launch of new semantic search technology. But in that six weeks Barney Pell's crew got in a ton of good PR and a few meetings that have already resulted in a USD 100 million exit into the hands of Microsoft, according to VentureBeat. It wasn't so many years ago that Barney was a part of the bumpy exit of WhizBang Labs and its Web mining technologies. This time around his team was well ahead of the burn rate and blessed with both a good idea and good timing. With tons of cash on hand after their war chest for a Yahoo acquisition Microsoft was ready to vent by spending some large (or, for them, small) at the deals mall to pump up its search for more advertising revenues. Given Powerset's ability to parse natural language questions as well as to provide "factz" topic clusters that could draw in related content, the target for Microsoft has to be the revived Ask.com portal as much as Google's leading search engine. Already Microsoft's Live.com search engine provides rich search results that emulate Ask's more user-friendly approach to search-driven content aggregation, but Ask still manages more meaningful responses based on natural language queries. Better front-end parsing and clustering of results terms from Powerset's technologies would certainly help Live to get more relevant and rich results that could help to build a larger audience, though how Powerset's technology will fare in absorbing Web content lacking the encyclopedic style of it's trial Wikipedia content remains to be seen. On most test queries using natural language questions one finds Google to be at least or more relevant in its results than existing major search engines, so even with new semantic technology Microsoft has its work cut out for them. A better match for Powerset might be found on the enterprise side of Microsoft's offerings, where its recently acquired FAST enterprise search technology may benefit from some extra semantic search and clustering mojo - and find somewhat more structured content sources against which to apply semantic algorithms. That's not to say that Powerset won't succeed with open Web content, but in general semantic search technologies are most easily tuned when they're digesting documents with relatively similar styles. It would seem that this would be easier to tune to an individual enterprise's needs overall than to a world of Web content that could be in any shape at any time. A better question might be why Microsoft hasn't considered purchasing Answers.com if they are so interested in natural language queries. With millions of pre-formed questions already in its WikiAnswers database many natural language questions map very neatly to its answer sets. In other words, sometimes the best answer to a full-sentence is a person who understood the question in all of its semantic details and has already provided the answer. This is far from a goof-proof solution to semantic search, but it's an approach worth considering as a valuable supplement to semantic document parsing. In any event the Powerset set now finds itself in the enviable position of having sold their ship before it ever went down the launching track into the waters. That's certainly more than a few publishing portals can say these days. Congratulations to Barney and all of the other rocket scientists at Powerset - it pays to have a technology that solves a problem that companies with deep pockets are ready to get their hands on. Labels: Deals Partnerships and Sales, Microsoft, powerset, Semantic Web, Wikianswers
|
|
By John Blossom - posted at 8:35 PM |
permanent link to this entry
bookmark this entry:
|
|
|
|
2 comments (click to view or to add your own)
|
| Wednesday, April 16, 2008 |

 Jeffrey Massa, CEO of Yellowbrix grabbed my ear at the Buying and Selling eContent conference in Scottsdale, AZ, for good reason it turned out. Yellowbrix is well known in both online and enterprise markets for its content aggregation, portal development tools and financial information services, but like everyone else in the aggregation game they've been looking at higher value tools to create better opportunities for insight through their content. The Sentiment Performance Indicator tool that Jeff showed me is one such tool - and a killer app at that. The Sentiment Performance Indicator is a set of data and graphics tools built off of a Yellowbrix-developed semantic engine that munches through the content in its media feeds to determine whether stories on a particular company are positive, negative or neutral. This data then drives simple data displays and charts that enable one to get a grasp on likely market sentiment for a company's securities very quickly. This can be especially important before a securities market opens - there's that period before trading begins when analysts people on sales and trading desks in financial institutions try to get a fix on market sentiment is overall and for particular investments. Traditionally this is done with phone calls to trusted contacts, browsing through news, morning reports and other more quantitative tools to get a feel for what's going to happen.  The Sentiment Performance Indicator takes this type of activity to a whole new level. Instead of working on largely "seat of the pants" sentiment, the Sentiment Performance Indicator gives data that provides really strong correlations with likely market activity. In the chart on the right, reduced a bit but I hope still readable, shows in this instance a graph of the Dow Jones Industrial average in blue. The Sentiment Performance Indicator in this instance looks at all the news relating to the index and the companies that comprise it and out pops the sentiment data. Note how the green line, showing positive sentiment, tracks strongly ahead of negative sentiment for today before the market opened. Note the strong correlation with how the market performed after the open. Note also that as positive sentiment began to drop and close in on negative sentiment that the market levels out. Highly predictive. Jeff walked me through similar displays for individual stocks and the data correlations were truly eye-popping - this coming from someone who stared at Reuters and Quotron screens and wallboards for more than a decade. What I found interesting was not only how closely the sentiment data tracked and predicted subsequent stock performance but also how changes in sentiment correlated closely to typical trading activities. For example, in one particular example Jeff showed a stock where negative sentiment rose sharply and there was a subsequent selloff. However, there was at the same time a steep fall in negative sentiment but no increase in positive sentiment. In other words, once a stock starts falling from bad news the damage is done and it will keep falling until there is countervailing news - to put it another way, once you have bad news, no news is the equivalent of bad news. That is a very, very accurate portrayal of real-life market activity.  Another example of a display in the Market Sentiment Indicator is a simple tool that shows cumulative sentiment data, the top positive companies and the top negative companies. Very easy to interpret - and very useful not only to securities traders but investor relations, management dashboards and other business applications where there is a need to see at a glance the real-time changes to how companies are being perceived in the marketplace. This is certainly not the first sentiment analysis tool on the marketplace and no doubt there are several major investment banks, asset management firms and hedge funds that have cooked up their own custom version of such a tool. But I must say that to my jaded eyes this was one of the most powerful applications of semantic content analysis that I have seen in a long time. The top positive/bottom negative display should be on every desk tracking U.S. markets unquestionably, an invaluable tool that can help financial experts prepare for their trading day effectively and to get a handle on how trends are likely to unfold during the day. More sophisticated tools are no doubt required for sentiment analysis to trigger low-latency basket trading during the market day, but for moving one's seat-of-the-pants sense of market sentiment into more firmly grounded views of market realities, especially in those critical minutes before markets open, this has to be one of the more powerful human-oriented tools that I have seen since PCs first started providing bar and line charts for securities analysis. No kidding. Thanks, Jeff, for a demonstration of such a simple yet powerful application of value applied to aggregated content. Labels: financial information, market analysis, market intelligence, market sentiment, Semantic Web, sentiment analysis, web harvesting, yellowbrix
|
|
By John Blossom - posted at 9:48 AM |
permanent link to this entry
bookmark this entry:
|
|
|
|
0 comments (click to view or to add your own)
|
| Monday, February 25, 2008 |

 I really love Rafael Sidi's Really Simple Sidi weblog, it's a great compilation of insights into sciences publishing that is easy to read and is in my daily bookmarks of news sources to monitor. Turns out that Rafel is a big fan of ContentBlogger also, so I was pleased to get a preview briefing from him on Elsevier's new Illumin8 product making its debut today. While it's hard to draw major conclusions on the significance of any product Day One, it appears that Elsevier has enabled Rafael's team to come up with what promises to be a real breakthrough in STM workflow solutions focused on getting the right insights into emerging solutions to scientific problems effectively. The problem in big-stakes scientific research and development fields is that most search tools are oriented towards topical approaches to research that don't necessarily focus on relating problems and the organizations and people focusing on them with the solutions and benefits that they provide. For example, if one were to look for research, news and Web content relating to the HIV virus, the typical search engine is going to look at a search centered on that term and come up with documents that relate to this topic - but not necessarily focus on the solutions and benefits being provided by specific research studies for available new products. This is a critical factor when trying to select a new line of scientific research or to understand how to position a new product based on that research. How quickly can one define what solutions are in play for specific types of scientific problems by specific companies or universities? Who's delivering the most beneficial solutions? Illumin8 addresses these kinds of questions by adding an important semantic twist to search processing. Instead of focusing just on nouns to define how content relates to a topic Illumin8 clusters results based on how they fall into verb categories that align topic groups such as organizations, products, experts and technology with problems and benefits associated with those topics. Using this tool one can discover easily not just recent research, Web postings and news stories but the items that the real problems being addressed by that research and the real benefits being revealed very rapidly. Illumin8 has a very simple search interface thus far, a "white box" approach that will move from topics to problems and benefits mapping automaticaly or the ability to define more sophisticated queries using special keywords. You can choose from news, research and Web content or any combination of these via a checkbox interface and adjust your precision/recall balance for getting lots of results or just of few of the best matches with a slider bar. Search results come with graph bars and totals to make it easier to see which keywords and clusters of topics, problems and solutions are coming up most frequently in results. While lacking some of the interface sophistication of a more mature product like Collexis that focuses deeply on helping people navigate expert network relationships and still needing to address some entity mapping issues the fundamental power of Illumin8 is quite evident even in its early introduced form. More sophisticated analysis of verbs as valuable tools in semantic processing is in part behind the proliferation of "sales triggers" intelligence products such as Generate and InsideView, which enable sales professionals to understand when news and other content sources are pointing towards companies involved in activities that impact their sales processes. Applying this type of processing to scientific studies and product development is likely to help scientific, medical and technical companies and organizations to get a similar leg up on understanding who's moving towards revenue-impacting insights more quickly. It's an approach that can probably yield tangible benefits for many types of business information as well as consumer information. It would be nice, for example, to see a semantic engine such as Illumin8's applied to product and catalog sites. To some degree many existing search engines factor these kinds of semantic issues into their processing behind the scenes, but Illumin8 demontrates that when one focuses on the problem-solution relationship from a product standpoint instead of a straight topic approach the benefits can be dramatic. I am skeptical oftentimes when new products claim to be "workflow solutions," but Illumin8 seems to be pointing towards a pain point that people in R&D departments encounter often enough without real effective solutions being offered elsewhere that it probably qualifies as such a tool. It's another way of saying that there just might be some significant ROI in there if someone can do the research to tease it out from an early adopter community. Hats off to Rafael for a nifty product launch - helps to have that blog - and to the folks as Elsevier for giving Rafael a chance to strut his stuff. Hopefully Illumin8 continues to grow in scope, substance and quality. Labels: elsevier, rafael sidi, search, Semantic Web, STM
|
|
By John Blossom - posted at 11:20 AM |
permanent link to this entry
bookmark this entry:
|
|
|
|
0 comments (click to view or to add your own)
|
| Tuesday, February 12, 2008 |

 Metadata is one of those terms that's likely to get traditional publishers' eyes glazing before you've even finished saying it, but it happens to be the content that's going to determine much of what powers profitability in publishing over the next decade. Broadly speaking metadata is the categorization and tagging of content that enables it to be referenced easily and to reference other content easily. If the easy money to be made in searching documents on the Web has been made already by Google, the next generation of publishing services will be providing tools than enable more structure to be added to content, both for providing more rich content that search engines will like and to provide enough richness that people looking at a metadata-enriched Web page won't have to go hunting via search engines for related content. Reuters has launched recently their Calais open API initiative that holds great promise for them becoming a major player in leveraging metadata generation as a tool to put them at the heart of increasingly structured Web content. Calais provides tools that will enable publishers and applications developers to pass their content through a content analysis engine provided by Reuters' ClearForest semantic content processing tool and to get well-structured metadata returned for free. What's the payback for Reuters? To be the first to have this information, of course. With its centuries-old traditions of breaking news and real-time market data, Reuters is far from being a stranger to the value of being the first one obtaining critical information. In helping the Web to gain semantic structure Reuters can become in theory via Calais the one best suited to help people take advantage of thst structure. Will this become a reality? While it's not likely to take off quickly I think that it's likely that Calais may enjoy a very comfortable position as a pioneer in open metadata generation for some time. The more time in which they can build up metadata without much opposition - lots of people will still be in the "old media" mindset of trying to quantify short-term profits for such a move - the more time that they will have to build value-add services that build on both the information's value as a real-time update stream as well as its value as a tool to enable people to make sense of an ever-expanding Web. Metadata also helps search engines and contextual ad services to match content to queries more effectively, so the What's-In-It-For-Me might be very valuable to publishers, especially publishers of social media who don't have the budget to afford their own semantic metadata generation systems. Publishers place a lot of emphasis on copyright, but as the financial market data business has shown through the years copyright is of little value if you can't get your content to the right people in time for it to make a difference to people. Focusing on metadata will enable Reuters to start indexing the Web in a more organized manner and to use that indexing to develop information products that will become in time at least as valuable as those that it has developed for the financial securities marketplace. It's no accident that Reuters is using a silhouette of a pigeon in the logo for Calais. Julius Reuter made his first stab at electronic publishing by closing the gap between telegraph stations carrying stock quotes by tying them to carrier pigeons. Sometimes filling the gaps in content services that others wait to get filled can have profound consequences. Labels: calais, ClearForest, metadata, Reuters, Semantic Web
|
|
By John Blossom - posted at 10:21 AM |
permanent link to this entry
bookmark this entry:
|
|
|
|
0 comments (click to view or to add your own)
|
| Wednesday, November 28, 2007 |

The annual KM World & Intranets 2007 Conference / Expo in San Jose keeps growing, adding a West Coast version of the successful Enterprise Search Summit (ESS) held in May in New York. The co-location of Taxonomy Bootcamp and Streaming Media West creates a dynamic interplay between different aspects of the information business, from technology to enterprise content. Attendees voiced the value of the range of tracks from strategic management of knowledge to the practical aspects of selecting and living with search software and applications, down to the nitty-gritty of taxonomy implementations. Traffic was good in the vendor booths of the Expo area, as technologists and content managers mingled over receptions, meals and seminars. The opening keynoter for ESS was Susan Feldman, Research Vice President, Content Technologies, IDC. describing a market in flux with many competing technologies. Search is the missing piece for enterprise software, and large software vendors are entering the market. SaaS options are good solutions due to the complexity of search technology, and need to have the latest version. The keynote was a nice lead into the session that I chaired on "Solving the Multiple Search Engine Problem" addressing approaches to the proliferation of departmental search vendors within organizations. Rennie Walker, Wells Fargo, described "waking up one morning with the multi-search engine blues", resulting in creating a Search Center of Excellence ( COE). Swetswise uses a federating search software, Museglobal, to deliver a subscription delivery product incorporating multiple search indexes. Miles Kehoe, New Idea Engineering, identified the challenges of maintaining distributed search engine indexes--a practicality not addressed by vendors. Security, ediscovery and regulatory compliance were themes in other presentations. Search across multiple repositories brings the thorny problems of access control to the underlying content. Depending on the application, different levels of security may be necessary, down to the sub-document level. Choices include "early binding" vs. "late binding" options for access. Additional challenges include the changes in Federal Rules of Civil Procedure of 12/1/2006, making risk management of the enterprise search environment more critical. Steve Arnold, highly regarded industry expert on search engines chaired a keynote panel originally entitled "Giants Do Stumble: Are Google and Microsoft in Decline?" modified in the final program to "What's Next for the Search Engine Giants", questioning product managers from Google and Microsoft, who provided little new insight. Both companies are relative newcomers to the enterprise search space, and had vendor booths in the expo, joining traditional vendors. Arnold, in a later session, honed in on Google and his analysis of their patents to predict new directions. Findability is more than keyword search in full text documents, a message which came through in both the sessions and vendor presentations. Sessions on semantic search indicate progress in actual implementation, which is closely tied to classification and taxonomy systems. Improved navigation, particularly faceted search, are another approach to improve the user experience, and improve findability. Niche software vendors on the exhibit floor, demonstrated other approaches to improving findability. Siderean uses a relationship approach which intuitively fits research and discovery processes, to improve findability. Cognition was demonstrating their linguistic search software with great promise for in depth research, particularly in scientific and technical literature, with a plethora of potential search terms. Deep Web Technologies showed the power of federating search software, as implemented at science.gov and scitopia.org. Enterprise search and management of organizational intellectual capital have become mission-critical. The challenge is finding the right approaches for the organization, then the technical tools for implementation. Increasingly, behavioral and linguistic aspects are being recognized as essential factors in the process of adding value to the organization. Search is not easy, and delivering answers to people is not straightforward. It's finding the right combination of solutions that challenges the attendees at these conferences..there is no one-size-fits-all! Labels: enterprise, ITI, Jean Bedord, search, Semantic Web
|
|
By Jean Bedord - posted at 1:51 PM |
permanent link to this entry
bookmark this entry:
|
|
|
|
1 comments (click to view or to add your own)
|
| Wednesday, June 13, 2007 |

 While Nexis tinkers with the edges of its market footprint Dow Jones's Factiva unit is pushing forward with two key enhancements that are designed to change the scope of what business information users are likely to expect from their suppliers. Dow Jones' upgrades to its Synaptica taxonomy management services enable different taxonomies for different user groups - an essential tool for adapting business information into departmental functions - and enhanced semantic support for RDF, SKOS and OWL semantic standards that will enable Dow Jones clients to process and interpret a wider range of content types more effectively - including multimedia content. No small surprise, then, that the other announcement from Dow Jones is a deal with EveryZing (recently renamed from PodZinger) to integrate audio and video content from major suppliers such as The Wall Street Journal, NPR, CNN, BBC Radio and other major suppliers. EveryZing's already heavily categorized video content includes news from around the world in several major languages, making it a natural for integration into the Factiva set of general news and research content, enhanced all the more by the increased semantic prowess of their semantic tools. Video is certainly all the rage on the Web and gaining steam within the enterprise as network backbones and security infrastructures are tuned to deal with more pervasive video consumption. Dow Jones' aggressive positioning of its integration capabilities combined with timely multimedia content will position them well as a supplier of both content and integration tools as enterprises think more seriously about how to integrate business-ready video into their portals and collaborative tools. In a sense this gives Dow Jones additional leverage against the increasing penetration of services such as Google's enterprise search appliances that enable both enterprise content and content from the Web that will be backed by their "Universal Search" capability to make its way into corporate Webs. But the "catch-up" nature of the EveryZing deal underscores the degree to which Google is developing business-ready content sets far broader than Dow Jones and other business information suppliers. Dow Jones, Nexis and others hope to continue to pull trumps on Google with more select licensed sources at their disposal tuned to very specific enterprise audiences. And with sales and support staffs that have been knee-deep in enterprise needs and solutions for years folks like Dow Jones have some important edges in being able to integrate content effectively into enterprise platforms. Yet one wonders how much longer search-oriented business information suppliers such as Dow Jones are going to be able to leverage their licensed content sets to stave off more direct competition from Web-oriented integration specialists such as Google. Is Dow Jones' Factiva unit a search and taxonomy company with licensed content or a subscription database service with some nifty integration tools? Neither answer may be sufficient as stronger competitors enter the stage with better generic answers to these questions and others with more sector-specific answers. But for now, kudos to Dow Jones for keeping Factiva fresh and relevant. Labels: Audio, Business Information, Dow Jones, Factiva, Semantic Web, Taxonomies, video
|
|
By John Blossom - posted at 12:29 AM |
permanent link to this entry
bookmark this entry:
|
|
|
|
1 comments (click to view or to add your own)
|
To top of page  |
|
|
|
 |
|