where content, technology and people meet. (SM) Publishing and content technology executives use Shore to measure and understand their markets and competitors, define marketing strategies and implement successful content products and services using Shore's highly actionable insights into vendors, institutions, individuals and virtual communities.
ContentBlogger is the 2007 SIIA CODiE Award Winner for Best Media Blog
COMMENTARY:

Insights and headlines from Shore analysts on trends in enterprise and media content markets.
  Subscribe to our feed (?) or add to: MyYahoo  iGoogle/Google Reader  Bloglines  NewsGator  Rojo
Tuesday, May 13, 2008
There are rocket scientists, then there are rocket scientists - and then there's Barney Pell, long-time Silicon Valley startup maven and currently the Founder and Chief Technology Officer at Powerset. Barney is one of those rare people who has been a rocket scientist via both the NASA side of the term and the software industry side, an outlook that has helped him to assemble many teams through the years that have developed advanced search and language processing technologies. Powerset has unveiled its first effort recently at a new technology to provide rich content from semantic searches, an interesting look at how one can completely reshape the face of a content product via enhanced search technologies.

Using Wikidpedia as its primary target content, Powerset technology analyzes search phrases to come up with search results that match natural language phrases as well as keywords. This being a very early stage debut of technology some search targets work better than others and overall I'd have to say that it's a technology that seems to do best with people and things as opposed to concepts. For example, if you type in "Who is Bill Gates?" you get the screen similar to the top of the above screen grab, which includes a top deck of biographical information from the Freebase reference database followed by Powerset's sets of semantic analysis called "Factz" that focus on what the Wikipedia article says about this prominent figure. One of these sets, for example, tells us that Gates gave testimony, a speech, an address, a demo, a presentation and a deposition. You can click on any of these terms to get more details from the underlying article.

Below the initial bio and Factz information is a set of search results for the initial query, including the best-match article on Microsoft founder Bill Gates. This is in essence the straight Wikipedia article with links mapped over to Powerset's version of this content, along with a handy visual presentation of the article's outline on the right or another listing of key Factz organized within the article outline. I like some of the inferences that it's come up with in the Wikipedia definition of Content that I contributed a while back: "information provides value; experiences provide value; content provides value." True enough.

I like how Powerset prefixes organic search results with federated content, taking a best stab at results on very focused topics that enable people to obtain knowledge more quickly and effectively. The automatically generated Factz, though, suffer from the same problem that most semantic tools experience when they examine a very small data set: spotty inferences. For example, in the Factz about Bill Gates Powerset inferred that he founded Cher, an inference drawn from the fact that biographer Howard Johns was known for revealing the addresses of these and other celebrities. Hmm. Don't think that I'd put that info down on my "final Jepoardy" slate. I am also not so crazy about the organic search results, which tend to err on the side of word proximity. Again, with a relatively narrow data set such as Wikipedia it's not always easy to tune content analysis well to the capabilities of semantic text analysis in search engines.

The big picture for this early-days release of Powerset is that it is a great demonstration of how one particular source of content can be transformed through search and content federation technologies into an altogether different kind of publication. Oftentimes I talk these days about search technologies being similar to datafeed technologies, but in this instance it's important to recognize that search technologies are also end-publishing technologies in and of themselves that can aggregate, filter and organize content in altogether new ways that enhance the value of one or more core publications. Using free content from Wikipedia and Freebase the Powerset technology does a good job of demonstrating this concept simply, albeit with some early growing pains. Publishers wanting to stay in the forefront of content markets are turning in droves to content federation technologies as a solution to add value to existing product sets, so expect to hear more from technologies such as Powerset that help publishers to add value rapidly.

Labels: , , , , ,


By John Blossom - posted at 11:53 AM
permanent link to this entry        bookmark this entry:  AddThis Social Bookmark Tool
  3 comments (click to view or to add your own) 
 
Tuesday, January 22, 2008
Steven Arnold writes a thoughtful post on his Beyond Search blog about the inadequacy of traditional databases and search engines to deal with organizing and delivering content when the Web and many private content collections measure in petabytes and exabytes of information. Steve hints at a "next generation" database management system that can start to leapfrog over these problems, but the greater question is perhaps unasked in his article. Namely, as the problems that people need to solve with content technologies become increasingly complex and increasingly fleeting, why is it that we really need permanent unified databases to solve those problems? There is an important need for data normalization, but if normalization can be achieved "on the fly," as leading content federation services can provide, do people need a database or instead data objects that solve specific problems in the moment?

When data normalization was associated with creating massive databases that would be used for repeated functions such as payroll management or publishing functions such as newspapers or directories permanently structured databases made a lot of sense. But as market advantages gained through content publishing fall increasingly to those who can mine unstructured content, aggregate content from disparate sources and enable people normally confined to consuming content to create it and organize it, the traditional database is being relegated to one of many silos from which advanced content services can develop on-demand content solutions. Search engines, which rely on databases that can be queried in a standard format to provide standard answers, are beginning to fall into this same role of specialized answer tools. If you look at the typical search results page today from major providers you're looking at federated content from multiple sources, logically related to a greater whole but residing in separate storage environments and coming together in the moment as the answer to a specific question or need.

In short, what we have called a database is no longer a storage and indexing device. Rather, the database is now, the content sets that we assemble in a given moment to solve the moment's problem. Its structure is consistent thanks to XML standards, data dictionaries and data mining normalization tools, it can be stored as needed for time series analysis or corporate compliance, it can be shared with others to develop collaboration services or new forms of content and analysis. But in the next moment our needs may shift, sources may change structure or become unavailable or be replaced by different sources.

Market advantages tend to flow from institutions who can take advantage of content most effectively, and in the markets we can see how this concept already impacts business in a large way. In financial markets profits are shifting from public securities exchanges, whose transactions are built around highly normalized databases and data formats, to private transactions on highly complex financial instruments, whose underlying complex calculations on financial risk and return may apply to only a single transaction at a time. There is structure in such transactions, yes, and lots of normalized data, but the uniqueness of the content's structure at the moment that a deal is executed is far more important than its standard components.

Search engine providers such as Google understand this paradox explicitly and work hard to provide value-add interfaces that enable people to use search engine content as one of many feeds that can power "mashup" consumer and enterprise content applications. The Google search engine may be one of the world's largest databases but if other content in a form that's more usable in a specific context can come along and complement it in the moment, it becomes rather moot beyond a certain point whether or not it's in Google's index or another index. This federated approach to content value becomes at least as important as the quality of the individual sources. In a "the database is now" world, quality is as quality does - and it may mean something else a moment from now.

The implications of this concept for content publishers is enormous. Long used to building their standardized databases, the long-promised New Aggregation is on the verge of becoming the value leader for both enterprise and media publishers. Through the on-demand federation of content sources into aggregated content solutions the uniqueness of insights for small audiences is becoming a much more important method for creating value in aggregation than the pervasiveness of standardized insights.

Make no mistake, we'll be using today's search engines and databases for a long time as building blocks for federated content services, but we'll be less fixated on owning databases and more focused on owning the contexts in which they provide solutions. This is likely to change the pricing structure of content aggregation services significantly and to force traditional publishers into becoming on-the-fly aggregation services pulling in content agnostically from many sources that may not be under their direct control for more than a few moments. Subscription databases will yield, sometimes gradually and sometimes very rapidly, to subscription contexts, services that can assemble content from anywhere consistently and reliably for workflow and lifestyle applications. Yesterday's email inbox is becoming today's content inbox via feeds and social media: tomorrow's federated inboxes will be even more rich and complex through databases that live in the moment.

Social media and enterprise content federation services have already pressed many of these changes forward, but expect 2008 to be the year in which more than one company will begin to recognize the value of databases in the moment. The database is now - and so is the opportunity for publishers and enterprises to move beyond isolated content solutions.

Labels: , , ,


By John Blossom - posted at 10:14 PM
permanent link to this entry        bookmark this entry:  AddThis Social Bookmark Tool
  5 comments (click to view or to add your own) 
 

To top of page To Top of Page

COMMENTARY: INDEX
CONTENTBLOGGER
INDUSTRY EVENTS
CONTENT NATION

Read ShoreLines, our free weekly email newsletter.

Sample issue
Follow us on Twitter
Get headline-only feed
Buzz news comments
RECENT ENTRIES
READ CONTENT NATION

Learn how to thrive and to survive as social media changes our work, our lives and our future.
Buy the book
Read it online
Read our social media blog
WEBLOGS: ARCHIVES
 
 

shorename.gif (1190 bytes)
[HOME] [US] [SERVICES] [COMMENTARY] [RESEARCH] [EVENTS] [PRESS] [CONTACT]
Copyright © 1997-2009 Shore Communications Inc.  All Rights Reserved - Click Here to Read Terms of Use
Corporate Privacy Policy

 

 

 

 

 

 

 

 This page is powered by Blogger. Isn't yours?