where content, technology and people meet. (SM) Publishing and content technology executives use Shore to measure and understand their markets and competitors, define marketing strategies and implement successful content products and services using Shore's highly actionable insights into vendors, institutions, individuals and virtual communities.
ContentBlogger is the 2007 SIIA CODiE Award Winner for Best Media Blog
COMMENTARY:

Insights and headlines from Shore analysts on trends in enterprise and media content markets.
  Subscribe to our feed (?) or add to: MyYahoo  iGoogle/Google Reader  Bloglines  NewsGator  Rojo
Wednesday, January 14, 2009
The very mention of the word "metadata" is enough to make some weary eyes in the content industry gloss over, perhaps triggering memories of conference speakers who droned on endlessly about precision, recall and taxonomies. Metadata is extremely important stuff, though, the "under the bonnet" information that powers the semantic organization of content into useful forms and helping to maximize search engine exposure and valuable categorization tools.

Thomson Reuters takes metadata very seriously, though, and has been working via its Calais initiative to promote the use of its semantic processing capabilities to create more valuable content through metadata generation and source linking. The latest output from Calais is improved handling of its processing of documents that enables a publisher's partners to access the rich metadata and source linking available via Calais by using a simple document code generated by the Calais document parser. Instead of having to settle for just a hyperlink to an online document, these codes allow linking into specific data and information sources related to the metadata in a document. So, for example, Calais can return links to articles in Wikipedia on topics that surfaced from the metadata. That's a free example, but obviously premium content sources can be swapped into the picture for those wanting to develop more sophisticated content products and services.

Calais now includes a preview tool that can take any document and parse it out, providing a human-friendly display of the results or an XML-formatted RDF document that provides the original text with the Calais metadata and link tags inserted into the appropriate spots in the document. While the tool is not a demonstration of a production-ready process, it's easy enough to get the picture of how one could apply the Calais document processor in a production environment.

I fed a recent weblog entry into the preview tool to get a flavor of its capabilities. In general semantic processors don't do a great job finding relevance in short documents - just not enough content to "chew" and weigh - and Calais shows typical limits in some of its concept extraction. Nevertheless, it did an impressive job at pulling out a long list of concept keywords relating to the media topics covered in the post as well as entity extraction for the people and publishers mentioned as well as their parent companies that weren't mentioned directly in the blog post. In other words, from a simple text document Calais can take you to a fully metatagged document with "hooks" in it that can pull in financial and company background information, biographies and other valuable content in a flash.

Calais is stretching its wings not only with media-oriented content but as well with enterprise-oriented content sources. The release summary mentions enhancement for product identification, competitive intelligence and judicial events and automated document level categorization for recreation, environment, weather and legal content, which should give you a hint as to the types of organizations that are starting to put Calais through its paces.

While Calais remains a relatively low-profile project at Thomson Reuters, it's clear that they are working on unfolding a sophisticated scheme for profiting from the virtual aggregation of content linked primarily through metadata tools such as Calais. In other words, why own the data when you can own the data relationships that add the most value to a content source? It's a compelling concept, one that has a lot of potential value for enterprise and media content markets and that is likely to grow in importance over time. I recommend stopping by the Calais site to poke around a bit and to get your own ideas as to how applying both metadata and linking capabilities to your own content sources can help to extend their value rapidly. One recalls that the city of Calais on the coast of France was the decoy landing site for the Allies' D-Day invasion of Normandy in 1944; perhaps the Calais initiative may not look like a real product in and of itself but it may serve as a beachhead for a broader product vision before long.

Labels: , , ,


By John Blossom - posted at 5:30 PM
permanent link to this entry        bookmark this entry:  AddThis Social Bookmark Tool
  3 comments (click to view or to add your own) 
 
Tuesday, February 12, 2008
Metadata is one of those terms that's likely to get traditional publishers' eyes glazing before you've even finished saying it, but it happens to be the content that's going to determine much of what powers profitability in publishing over the next decade. Broadly speaking metadata is the categorization and tagging of content that enables it to be referenced easily and to reference other content easily. If the easy money to be made in searching documents on the Web has been made already by Google, the next generation of publishing services will be providing tools than enable more structure to be added to content, both for providing more rich content that search engines will like and to provide enough richness that people looking at a metadata-enriched Web page won't have to go hunting via search engines for related content.

Reuters has launched recently their Calais open API initiative that holds great promise for them becoming a major player in leveraging metadata generation as a tool to put them at the heart of increasingly structured Web content. Calais provides tools that will enable publishers and applications developers to pass their content through a content analysis engine provided by Reuters' ClearForest semantic content processing tool and to get well-structured metadata returned for free. What's the payback for Reuters? To be the first to have this information, of course. With its centuries-old traditions of breaking news and real-time market data, Reuters is far from being a stranger to the value of being the first one obtaining critical information.

In helping the Web to gain semantic structure Reuters can become in theory via Calais the one best suited to help people take advantage of thst structure. Will this become a reality? While it's not likely to take off quickly I think that it's likely that Calais may enjoy a very comfortable position as a pioneer in open metadata generation for some time. The more time in which they can build up metadata without much opposition - lots of people will still be in the "old media" mindset of trying to quantify short-term profits for such a move - the more time that they will have to build value-add services that build on both the information's value as a real-time update stream as well as its value as a tool to enable people to make sense of an ever-expanding Web. Metadata also helps search engines and contextual ad services to match content to queries more effectively, so the What's-In-It-For-Me might be very valuable to publishers, especially publishers of social media who don't have the budget to afford their own semantic metadata generation systems.

Publishers place a lot of emphasis on copyright, but as the financial market data business has shown through the years copyright is of little value if you can't get your content to the right people in time for it to make a difference to people. Focusing on metadata will enable Reuters to start indexing the Web in a more organized manner and to use that indexing to develop information products that will become in time at least as valuable as those that it has developed for the financial securities marketplace. It's no accident that Reuters is using a silhouette of a pigeon in the logo for Calais. Julius Reuter made his first stab at electronic publishing by closing the gap between telegraph stations carrying stock quotes by tying them to carrier pigeons. Sometimes filling the gaps in content services that others wait to get filled can have profound consequences.

Labels: , , , ,


By John Blossom - posted at 10:21 AM
permanent link to this entry        bookmark this entry:  AddThis Social Bookmark Tool
  0 comments (click to view or to add your own) 
 

To top of page To Top of Page

COMMENTARY: INDEX
CONTENTBLOGGER
INDUSTRY EVENTS
CONTENT NATION

Read ShoreLines, our free weekly email newsletter.

Sample issue
Follow us on Twitter
Get headline-only feed
Buzz news comments
RECENT ENTRIES
READ CONTENT NATION

Learn how to thrive and to survive as social media changes our work, our lives and our future.
Buy the book
Read it online
Read our social media blog
WEBLOGS: ARCHIVES
 
 

shorename.gif (1190 bytes)
[HOME] [US] [SERVICES] [COMMENTARY] [RESEARCH] [EVENTS] [PRESS] [CONTACT]
Copyright © 1997-2009 Shore Communications Inc.  All Rights Reserved - Click Here to Read Terms of Use
Corporate Privacy Policy

 

 

 

 

 

 

 

 This page is powered by Blogger. Isn't yours?