Monday, January 04, 2010

Design decisions: real-time apps on mobiles & data lossiness

One of my favorite words in graphic design and networking parlance is "lossiness". For those distinct disciplines, the term cleverly denotes the amount of degradation in quality an image can sustain, and how packets may be dropped while routed across a network as a means of faster delivery, respectively.


Either way, lossiness infers efficiency as a performance driver towards accomplishing a goal.


I used my New Year's Weekend to really apply some serious thought to the multidimensional problem space of real-time search applications for the web.  I wanted to attack the issue from two specific angles - focusing on the feasibility of low-to-no latency systems for mobiles, and the rate of lost items in a real-time resultset.



Since I was using a late Saturday afternoon to conduct my experiment, I had several available devices and the full bandwidth of my company's Internet connection at my disposal.  My LAN doesn't proxy access, so each device's responses were autonomous. Additionally, the iPhone and netbook got online via WiFi; all other devices were all hard-lined to a hub.



On all devices I launched Collecta at the same time, querying “New Years OR NYE” I find that Collecta's vertically-scrolling collection of asynchronous items is the most logical interface for this type of experience. If items are going to be found as they're being published online, it stands to reason that they would be displayed that way without repetitive user-initiated page refreshing.  It's also one of the more comprehensive real-time search platforms, listing tweets, Flickr images, videos, news articles, and blog posts/comments.  Collecta's also a great practical use of Bidirectional-streams Over Synchronous HTTP (BOSH), the XMPP extension that facilitates real-time push for web apps.


I've always wondered about the reliability of real-time services on wireless devices, not only because of the lack of proper JavaScript support in mobile browsers to handle the repetitive XMLHttpRequest transactions, but also due to the bandwidth requirements for so many inbound payloads. 



While Collecta held up beautifully (rendering seamlessly on the iPhone), the results for each digital device were noteworthy, indicating that the better the specs on the machine, the more voluminous the subset of found items.  Specifically, in a span of the first 11 minutes after accessing the endpoint, my servers reported a resultset containing more than 390 matched items.  My desktop tower had slightly fewer, my laptop a little less than that, and my netbook a little less, with the iPhone only getting about half of the server's matches - all from running the same URL. It didn't appear as if the lagging devices had some sort of queue to catch up to the larger load - missing posts/images/status updates simply disappeared into the ether.


So that brought to light the question of whether processing power, memory, bandwidth, and/or browser vendor/version makes a difference in the droppage factor - how many items get passed up due to constraints or limitations imposed by those criteria. 


(Notable other discoveries: after running a hugely popular Collecta trend pulse “christmas” for 9 straight hours on a Windows computer, Firefox occupied more than 1.3GB of RAM and eventually crashed.  Collecta also uses a large amount of HTTP conversations, as compared to Google's real-time search product.)




The data gathered implies that the loss rate of real-time search items in apps like Collecta may be directly proportional to the capabilities of the requesting device.  This makes perfect sense - the increased client-side processing cycles necessary for rendering rich, dynamic UIs delivering rapidly changing data streams may impact the ability of a system to reflect the true pulse of messages being filtered down from its source(s).  Thus, devices with reduced memory, computational power and network connectivity may be subject to higher rates of dropped items than machines of greater ability. 


Specifically, mobile devices attempting to run real-time search apps may incur significant performance hits due to the increased processing.  The additional computations from heavy AJAX-based functionality should be considered in applications where data is reported as it happens.


Simply stated: a faster computer means faster results, which means more results, which means a more accurate picture of the Real-Time Web.  But all is not lost - there's a method to this madness.


It seems clear that XMPP over HTTP might not store undelivered messages for future delivery.  This is logical, seeing as how an escalating loss rate over time could create massive queues that would backlog a user's entire experience and be counterproductive to the real-time nature (showing content from hours ago while catching-up).  While not optimal, being lossy is actually safer. 


This may be savvy and deliberate preventative engineering due to the fact that such a large load of messages coming in will, at some point, clog the system; messages unable to make it into the UI are simply ignored to avoid such backlogs.  As a loose analogy, think of the difference between UDP and TCP in terms of managed lossiness.


I'm still torn on exactly how bandwidth on mobile devices affects real-time apps.  With proper buffering, modern wireless devices are capable of streaming full-motion video from YouTube; conversely, real-time apps based on XMPP are systems pushing bits of hypertext from server to client.  So the assumption would be that mobiles should be able to sufficiently handle such inbound traffic.


The counterargument, especially for apps tracking an infinite number of trends with a variable amount of popularity at that moment, comes under duress: imagine trying to push 35 IMs per second to a smartphone for an entire minute.  That's 2,100 instant messages.  And most hot topics are a lot noisier for a lot longer.  For most commercial forms of connectivity, something's got to give.  Message overflows could be conveniently discarded, making room for fresher items being harvested from across the Web to get their chance at being displayed, keeping the UI "inconsistently consistent" to the true general pulse of the community's response to some event, if not accounting for every single  atomic level response thought, image, comment or video.


More discussion is clearly needed in this arena.


Keep in mind that mine was a trivial experiment.  For simple searches, the loss of information in this regard is of no great consequence; but scaled up and for more mission-critical applications with sensitive data, the network effects could be significantly more damaging.


So it might be the case with systems like Collecta that lossiness for resultsets is merely a happy little accident.  Regardless, take this as an object lesson in how to do things the right way.  The analogy of the patient dying but the operation being a smashing success applies - intelligently manage throughput to keep pace with the community overall.

Posted via email from jasonsalas's posterous

Hi Jason,

An interesting and thought provoking post. I've a few points:

>> Collecta's also a great practical use of Bidirectional-streams Over Synchronous HTTP (BOSH), the XMPP extension that facilitates real-time push for web apps

If you look up the definition of BOSH it says "emulates a bidirectional stream between two entities". The key word here is "emulates" so your use of "real-time push for web apps" isn't strictly true. Therefore Collecta use a polling solution and not real-time push.

I believe that Comet is key to the real-time web and it may be that XMPP proves to the the messaging protocol (see later comment) of choice but that Comet is used for the real-time PUSH notifications.

>> Simply stated: a faster computer means faster results, which means more results, which means a more accurate picture of the Real-Time Web.

This is definitely true to a certain extent. A faster computer will clearly be able to deal with the real-time data quicker and get on to processing the next message or, for the Collecta website, making the next poll request within the browser. See my next point for extra clarification.

>> It seems clear that XMPP over HTTP might not store undelivered messages for future delivery.

Although I'm not presently an XMPP expert I would assume that XMPP is a messaging protocol and does not define a caching strategy. What you appear to be seeing is that Collecta's implementation keeps a reasonable number of results within their cache at any one time. Therefore the time between the polling web requests to the Collecta web service may mean that results have arrived in the cache and, as more results have come in, been discarded. Machines that take longer to consume the polling requests, or are on a more latent network, and then make another polling request may simply miss results.
Hi Phil! Thanks much for your comment. Very well-put.

I do think that while BOSH admittedly emulates a bidirectional flow, it uses a quasi-realtime push architecture, meaning low-latency, but certainly not up to the millisecond when a bit of data is ready as a response. Comet is a great solution, however, like most things, not without its drawbacks.

Have a look at the WebSockets API for HTML 5:

This seeks to alleviate some of the scalability problems inherent in HTTP polling, while capitalizing on the best features of Comet/reverse AJAX. It's really quite an elegant solution. I'm excited to see how it'll pan out.

I don't believe Collecta uses pure polling...I've profiled their HTTP traffic in Firebug (have a look at this video I did on the topic...really interesting), and the frequency of XHR calls seems to be inconsistent/sporadic enough as to imply some event-driven thing instead of a timer or JavaScript's setTimeout() method.

But you're completely right about's pretty much impossible to cache something that hasn't happened yet.

Thanks for the insight!
Hey Jason,

I did exactly what you show in your Firebug video (it is a great Firefox Plugin). I noticed the sporadic requests and didn't dig into the reasoning behind this. I still can't see how this can be classed as push since the browser is making individual requests using the XMLHttpRequest object (XHR tab in Firebug) to get each piece of data and there doesn't appear to be a single streaming maintained connection, as would be the case using HTTP streaming or a WebSocket, so I can't see how the server is pushing data to the browser. I still think Collecta are using a pull paradigm.

It's possible that with each pull of data the web service ( indicates how much data it is getting for the current search term and therefore updates the frequency of the polling (javascript setTimeout).

I've confirmed that without window.setTimeout Collecta stops working by overriding the window.setTimeout definition using the Console tab in Firebug and running:

window.setTimeout = function(){};

When you do this Collecta no longer updates with new "real-time" results :o)

If you know the guys at Collecta it would be really interesting if you could get a bit of information from them about the technology they are using and how they are using it. Maybe an interview?

I plan to read up on WebSockets along with XMPP and am too excited to see how they will be used. There are also things around crossdomain access in web browsers that need to be ironed out and implemented in all browsers to further encourage web application mashups.


That's good info, Phil. Have a look at some of the XMPP stuff and BOSH...I think it'll show you about how the system (and others like it - case in point, apps consuming Twitter's firehose) go cross-transport by streaming data over HTTP. :)

I don't see why you think BOSH isn't real-time push but Comet is. They use the same underly technique of HTTP long polling. One is not inherently more real-time than the other.

In both cases, you use more connections in order to minimize latency as much as possible.

Traditional polling has latency equal to the polling interval. Long polling has latency equal to 0.5 * ping time + HTTP request overhead, which you can reason about as being effectively zero compared to traditional polling.

As for setTimeout stuff on Collecta, it is inefficient to make UI updates for every incoming piece of data, as there might be a lot Instead, we batch these up and do updates at intervals. If you stop the interval clock, we can't do much. Similarly, the Strophe library for XMPP in JavaScript has an idle timer which keeps the connection alive; whenever a long poll request returns, a new one must be sent out, and this is done by the idle timer function. If you kill the timer, the long polling will stop, and the connection will be terminated.

If you watch the XHR requests carefully, you will see that one XHR is always open and waiting for data; this is the marker for long polling. The request will be returned once some timeout expires (for BOSH this is usually 60 seconds) or once data has arrived. This means that during complete idle (no data in either direction), one XHR will be sent every 60 seconds which gets returned at the end of 60 seconds. If data ever arrives, the request is returned immediately, and the client receives the new data with essentially zero latency. When the client sends data, a new XHR is used, and the server will immediately return the old one. It will hold the new XHR until data arrives or the timeout expires. The server always holds onto one request.

Post a Comment

Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]