Sunday, January 10, 2010

Inadvertently gaming real-time search services

One duly noted facet is the critical importance of relevance in a real-time search query's resultset.  And within that, a criterion with serious weight towards determining such relevance is frequency - if a post is being re-posted by other members of the community and across other platforms.  It's the latter that draws my ire this morning.

Traditional search systems succeed based on their ability to leverage indexing of resources based on the direct correlation of time - the longer a document exists online, the better its ability to appear in a search engine due to other resources referencing it, and the more it's discussed/retweeted/copied/indexed by other services/etc.  Obviously, when displaying content that's just been created mere seconds ago, this isn't the case.

A recurring concern for those in developing real-time search applications is unscrupulous spamming of such systems - those people (or 'bots, usually) jumping on the popular trends of the moment to draw attention to their own devices.  But how about good-natured, well-meaning folks that simply just want to publish their data across multiple platforms?  Does multi-service publishing inadvertently influence relevancy algorithms?  

For instance, suppose I author a thought on Twitter having to do with a certain topic - say, I post "I really dig the new CD by Iron Maiden" - and use a service like Ping.fm replicate my post across other social platforms like Facebook, del.icio.us, LinkedIn, Friendster, MySpace, Identi.ca, FriendFeed, Posterous, Blogger and Jaiku?  In the case, I'd be creating the same data at 11 different endpoints on the Internet, each with its own unique URL.  

Real-time services harvesting real-time feeds would pickup the fact that Iron Maiden-related posts would suddenly spike, and if continuing to do so within a given time period, create a trend.  Therefore, could those systems that assign relevance to topics based on the number of times a post is referenced be gamed by me just trying to get my stuff out over the maximum number of platforms?

At a macro level, would this affect the chance that I could personally influence an entire trend?  Or at the micro level, could I escalate my own thoughts higher within search results?  At the very least, the result from a user experience perspective would be that I'd just be inadvertently making results pages a whole lot noisier by diluting search subsets with the same stuff.  Whether I did so deliberately or accidentally, none are optimal.

Fortunately, such apocalypse hasn't befallen us just yet.  Only a few social services publish public firehoses at the moment, Twitter being the canonical example, and several other services have been setup by real-time providers in order to pulse-track them.  MySpace and Facebook are both coming online very soon, and the growing popularity within the development community of XMPP will make it easier for publishers to do make their data payloads accessible with low-latency.

But such gaming might be something to think about going forward.  Thoughts?

Posted via email from jasonsalas's posterous


Comments:

Post a Comment



Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]