There was some scuttlebutt buzzing around last week's duel between the Google I/O developer's conference and the All Things D conference to the effect that perhaps Google had some intelligence about the Ballmer announcement of the Bing-flavored preview of Microsoft's new Live Search search engine that prompted their announcement of their Wave messaging and collaboration technology. Somehow that doesn't ring true, given the breadth of the Google Wave announcement, which is a pretty encompassing technology initiative. By contrast, Ballmer didn't have anything nearly as broad to offer the ATD crowd, but at least he had something to put up against I/O to keep people buzzing about Microsoft, most of which was catch-up to counter announcement's at Google's earlier Searchology event.
If you can find any significant differences between Bing and the earlier Kumo-labeled version of Microsoft's Live Search preview, you have sharper eyes than I do. That's not necessarily a bad thing; there's a lot to be said for Microsoft's leveraging of their new Powerset technology that helps to dress up search engine results with related content and faceted navigation features. But in several forays into Bing searches, I cannot say that I am finding all that many melds of information that are truly impressive. Yes, it's nice to be able to to have comparison shopping data, reviews and related links embedded in searches such as "Samsung LCD TVs," but that's not so different than, say, a search on Google for "JFK to SFO" with the "related searches" option turned on that has comparison flight shopping tools in the search results. Bing is good, perhaps even state-of-the-art, but hardly a game-changer for the state of search in general.
What the maturing Bing search results do seem to indicate is that the lines between destination sites and search engines will continue to blur as content providers and search engines both go in search of more valuable and engaging contexts for high-quality content. For search engine providers, being able to increase engagement time on a given page of search results is good for ad revenues and overall user satisfaction and brand value. For online publishers, the melded results offered in Bing, Google's Universal Search and other evolving search portals represent opportunities to engage audiences at the point of demand with solutions that enhance their own brand value while building revenues from advertising alliances with search engine portals. You might say, even, that the Bing/Google Universal Search approach is like dialing up a custom magazine/shopping guide/newspaper, with increasingly slick and well-organized content that begins to mimic the editorial capabilities of traditional specialty publications.
The parallel between traditional media and on-demand publications assembled by search engines is underscored in Bing by the rich and engaging photographs that appear on the home page of the Bing site. Squint a little bit and you can imagine the cover of a National Geographic magazine or other glossy high-quality publications. The visual promise of Bing's home page is that what you're about to experience is really, really good at a visceral level. The guts of this "magazine" don't yet match the cover, but you can tell that over time both Bing and other search engines are headed in the direction of getting search results to be as engaging and visually rewarding as traditional magazine publications, albeit with lots of the Web-savvy functionality that keeps people coming back.
With these evolutions in mind, publishers need to be prepared to make their content brands resonate in the online pages of whatever on-demand context appeals to their audiences - including increasingly sophisticated search engines that are aiming to keep people hanging around their pages as long as possible. Initiatives such as Journalism Online will help to make search engines more profitable aggregation venues for traditional publishers, but they need to be ready to accept more willingly the idea that search engines can be great publishing partners that help them to get their content to their audiences in the contexts that they value most. Certainly Bing will help to convince some publishers of this, but it's still early days for publishers recognizing that The New Aggregation is not a mere thought piece but instead a key component in the future of profitable publishing.
While the concept of the content organization features found in the Powerset search application was always compelling, the original content in the demo application set up for the early version of Powerset was not the most powerful presentation of its strengths. Now in the hands of its acquirer Microsoft, the Powerset features appear to be ready to take on a much-improved content set and interface in the guise of an internal project at Microsoft labeled "Kumo." As revealed by Kara Swisher at All Things Digital, an internal Microsoft memo is encouraging staff to play with the prototype search engine to get some initial feedback.
In spite of some scathing negative reviews from the search engine intelligentia, the screen grabs provided by ATD of the Kumo interface look to be pretty competent. Gone is the over-busy Powerset interface, replaced by and interface that is at once Google-esque and yet unique. The top five web results are followed by results that match different facets of a search term. For example, results for the recording artist Taylor Swift return groupings of content available for her songs, her lyrics, her bio and her music downloads and her albums. On the left are possible searches by related artists and categories, as well as the ability to initiate new searches in video collections, bios and so on.
It's unclear at this point whether Kumo will be just a project name - it's apparently a word that means both "cloud" and "spider" in Japanese - or whether it's just an internal marker that may disappear at its features get absorbed into Microsoft's Live Search engine. For that matter, it's unclear that the features will make their way into production at all, though they are certainly useful enough. What is clear, though, is that Microsoft is going to continue to search for new ways to make alternatives to Google palatable in a way that might appeal to both enterprise and media audiences. I don't think that too many people harbor illusions about the ability to crack Google's dominant market share in search any time soon, but competition is good for the breed, they say.
I suppose the most intriguing aspect of Google's success that challenges the challengers such as Kumo is how Google has attained its success without explicit content categorization features. One can go to dozens of knowledge management and search conferences every year and hear about how important good content categorization features are for the success of search engines - and then look at the nearly naked search results on Google to contemplate just how true that may be. The assumption that categorization specialists have is that having categories makes it easier to browse content collections. Well, that may very well be true if you are in fact interested in browsing relatively finite and well-organized collections of content, but in general search engines have become less about browsing and more about delivering specific answers for most people. The average searcher seems to be trained now to refine their own searches via the "white box" rather than to traverse through browsing categories.
This isn't to say that content categorization isn't useful: it's more a matter of where it turns out to be most useful. Where it does seem to help most is in portal solutions where someone has come to a specific page of content and may want to explore that site or database from different facets. Where people understand that there's a finite, well-curated collection at their disposal, categorization seems to do quite well. Where it's a matter of sifting through billions of pages for the needle in the haystack, most folks are getting used to typing in the best search string that they can think of. With that said, the features in Kumo do provide an interesting and engaging alternative to Google search results, but they'd probably be better off either in specific content portals that need enrichment or in creating an on-demand portal from its results sets, so that it will be a more browsable set of content in its own right - and then, perhaps, attract a higher breed of advertising, if that's the goal. Instead of trying to out-Google Google, perhaps challengers such as Kumo need to think about how to out-aggregate the aggregators to build better revenue margins for smaller search operations. Something to wrestle with, perhaps.
It seems like only a few weeks ago that I was blogging about semantic search startup Powerset's soft-launch beta. In fact, it WAS only six weeks ago that we were covering Poweret's soft launch of new semantic search technology. But in that six weeks Barney Pell's crew got in a ton of good PR and a few meetings that have already resulted in a USD 100 million exit into the hands of Microsoft, according to VentureBeat. It wasn't so many years ago that Barney was a part of the bumpy exit of WhizBang Labs and its Web mining technologies. This time around his team was well ahead of the burn rate and blessed with both a good idea and good timing. With tons of cash on hand after their war chest for a Yahoo acquisition Microsoft was ready to vent by spending some large (or, for them, small) at the deals mall to pump up its search for more advertising revenues.
Given Powerset's ability to parse natural language questions as well as to provide "factz" topic clusters that could draw in related content, the target for Microsoft has to be the revived Ask.com portal as much as Google's leading search engine. Already Microsoft's Live.com search engine provides rich search results that emulate Ask's more user-friendly approach to search-driven content aggregation, but Ask still manages more meaningful responses based on natural language queries. Better front-end parsing and clustering of results terms from Powerset's technologies would certainly help Live to get more relevant and rich results that could help to build a larger audience, though how Powerset's technology will fare in absorbing Web content lacking the encyclopedic style of it's trial Wikipedia content remains to be seen. On most test queries using natural language questions one finds Google to be at least or more relevant in its results than existing major search engines, so even with new semantic technology Microsoft has its work cut out for them.
A better match for Powerset might be found on the enterprise side of Microsoft's offerings, where its recently acquired FAST enterprise search technology may benefit from some extra semantic search and clustering mojo - and find somewhat more structured content sources against which to apply semantic algorithms. That's not to say that Powerset won't succeed with open Web content, but in general semantic search technologies are most easily tuned when they're digesting documents with relatively similar styles. It would seem that this would be easier to tune to an individual enterprise's needs overall than to a world of Web content that could be in any shape at any time.
A better question might be why Microsoft hasn't considered purchasing Answers.com if they are so interested in natural language queries. With millions of pre-formed questions already in its WikiAnswers database many natural language questions map very neatly to its answer sets. In other words, sometimes the best answer to a full-sentence is a person who understood the question in all of its semantic details and has already provided the answer. This is far from a goof-proof solution to semantic search, but it's an approach worth considering as a valuable supplement to semantic document parsing.
In any event the Powerset set now finds itself in the enviable position of having sold their ship before it ever went down the launching track into the waters. That's certainly more than a few publishing portals can say these days. Congratulations to Barney and all of the other rocket scientists at Powerset - it pays to have a technology that solves a problem that companies with deep pockets are ready to get their hands on.
There are rocket scientists, then there are rocket scientists - and then there's Barney Pell, long-time Silicon Valley startup maven and currently the Founder and Chief Technology Officer at Powerset. Barney is one of those rare people who has been a rocket scientist via both the NASA side of the term and the software industry side, an outlook that has helped him to assemble many teams through the years that have developed advanced search and language processing technologies. Powerset has unveiled its first effort recently at a new technology to provide rich content from semantic searches, an interesting look at how one can completely reshape the face of a content product via enhanced search technologies.
Using Wikidpedia as its primary target content, Powerset technology analyzes search phrases to come up with search results that match natural language phrases as well as keywords. This being a very early stage debut of technology some search targets work better than others and overall I'd have to say that it's a technology that seems to do best with people and things as opposed to concepts. For example, if you type in "Who is Bill Gates?" you get the screen similar to the top of the above screen grab, which includes a top deck of biographical information from the Freebase reference database followed by Powerset's sets of semantic analysis called "Factz" that focus on what the Wikipedia article says about this prominent figure. One of these sets, for example, tells us that Gates gave testimony, a speech, an address, a demo, a presentation and a deposition. You can click on any of these terms to get more details from the underlying article.
Below the initial bio and Factz information is a set of search results for the initial query, including the best-match article on Microsoft founder Bill Gates. This is in essence the straight Wikipedia article with links mapped over to Powerset's version of this content, along with a handy visual presentation of the article's outline on the right or another listing of key Factz organized within the article outline. I like some of the inferences that it's come up with in the Wikipedia definition of Content that I contributed a while back: "information provides value; experiences provide value; content provides value." True enough.
I like how Powerset prefixes organic search results with federated content, taking a best stab at results on very focused topics that enable people to obtain knowledge more quickly and effectively. The automatically generated Factz, though, suffer from the same problem that most semantic tools experience when they examine a very small data set: spotty inferences. For example, in the Factz about Bill Gates Powerset inferred that he founded Cher, an inference drawn from the fact that biographer Howard Johns was known for revealing the addresses of these and other celebrities. Hmm. Don't think that I'd put that info down on my "final Jepoardy" slate. I am also not so crazy about the organic search results, which tend to err on the side of word proximity. Again, with a relatively narrow data set such as Wikipedia it's not always easy to tune content analysis well to the capabilities of semantic text analysis in search engines.
The big picture for this early-days release of Powerset is that it is a great demonstration of how one particular source of content can be transformed through search and content federation technologies into an altogether different kind of publication. Oftentimes I talk these days about search technologies being similar to datafeed technologies, but in this instance it's important to recognize that search technologies are also end-publishing technologies in and of themselves that can aggregate, filter and organize content in altogether new ways that enhance the value of one or more core publications. Using free content from Wikipedia and Freebase the Powerset technology does a good job of demonstrating this concept simply, albeit with some early growing pains. Publishers wanting to stay in the forefront of content markets are turning in droves to content federation technologies as a solution to add value to existing product sets, so expect to hear more from technologies such as Powerset that help publishers to add value rapidly.