Monday, April 26, 2010

Build a better browser

When it comes to matters of global affairs, I'm generally apolitical and let (wo)men smarter than me handle it.  But when it comes to solving technological problems born largely of common sense, the huge step forward and giant leap backwards irks me to no end. 

My good friend Carter has an interesting take on a escalating issue, largely favoring the iTunes App Store's model for building/distributing apps. 

I'm all about the best possible experience for everyone, which includes the developers building the apps and users who enjoy them.  We really started to make some headway in software with the uptake in mid-2000s Web 2.0 space, with AJAX and other JavaScript-based abilities emulating a lot of UI enhancements that were previously only thought to be exclusive to Flash or other esoteric disciplines that everyone doesn't always play nicely with (yes, Apple, I'm talking to you).

My thing isn't denouncing on the existence of apps for native mobile operating systems - it's the setback they create, which at the macro level damages how we all stay connected.  A mammoth service like Pandora works great because it's supported across as many as 9 major mobile OSes, each with its own idiosyncratic nuances that are par for the support course for companies at that scale, but hell on earth for smaller shops.

Garage operations simply can't afford to engineer and maintain that many versions of the same core service.  And from the user standpoint, it bleeds into the ever-deadly vendor lock-in (which of course organizations like Apple, Google and RIM secretly pray for)...imagine digging Chad Ochocinco's app so much, but one day hating your iPhone. You couldn't switch over to a different phone because it's only on that platform.  

Coupling is something we started to move positively away from with web development with the turn of the new century...only to come back to it when third-party software developers started writing platform-specific apps.  

So the solution I most favor: keep the innovation on the Web. You build a single service with open, standardized technologies (e.g., HTML(5), JavaScript, CSS, XML, et al.).  Deployment is a snap, as is pushing updates and making bug fixes. Access is both mobile (via multiple connected devices) and portable (from any computer).  But that means the vendors need to build better web browsers, and that's coming.

My $0.02 on the matter. 

You?  :-)

Posted via email from jasonsalas's posterous


Saturday, March 13, 2010

me, as Jigsaw, appealing to Facebook

[ in my best Jigsaw voice ]: "hello, <YOURNAMEHERE>. i want to play a game. for the last year, you've wasted your - and what is infinitely more important, MY time - with senseless, regurgitated Facebook posts. you are now challenged to show me evidence of original thought. you don't have to cure a disease or split the atom...just show me there's more to you than just quizzes about your favorite color, what celeb you think you look like, FarmVille updates ad nauseum, or (mis)quoting song lyrics. you have 90 seconds, lest i bounce you from my social network. in or out...make your choice."

Posted via email from jasonsalas's posterous


Friday, February 26, 2010

Well this is unfortunate...

From Nick Johnson's excellent blog: "Fortunately, the PubSubHubbub protocol provides for the possibility of hubs doing polling on feeds that do not support PubSubHubbub themselves. The public hub on http://pubsubhubbub.appspot.com doesn't currently have this enabled, but there are plenty of alternatives."

Bummer.

Posted via email from jasonsalas's posterous


Monday, February 15, 2010

7 Questions for Superfeedr

Superfeedr is the brainchild of French developer Julien Genestoux, a leading authority on the emerging crop of applications and systems based on delivering data with extremely low-latency.  The project is the evolution of Notifixious (which I've previous profiled), which sought to help content creators by making the distribution of RSS- and Atom-based feeds currently in use more expedient.

Julien benefited from some very generous and impressive external investment, allowing him to expand his platform to be a service using a variety of real-time technologies including XMPP, PubSubHubbub and SUP to harvest distant-end feeds.

Here's 7 Questions for Julien about Superfeedr:

1. Superfeedr was the driving force (your custom feed-fetching/parsing technology) behind Notifixious, and it's now become the brand of your service.  The service's major notable form of growth is notifications via PubSubHubbub, alongside pure XMPP.  What's the reaction been to expanding your supported real-time protocols?

We love XMPP and we think it's the most powerful technology for real-time web. Yet, with a little pragmatism, it's obvious that this protocol scares a lot of people. HTTP is something quite well-known by many people: how to scale it, cache, etc.; it's pretty much a required computer science skill. Adopting PubSubHubbub brought us a lot of attention from people who would have never otherwise looked into our direction.

On top of that, PubSubHubbub is a very well thought protocol and covers the publish/subscribe mechanism in a much clearer and precise (albeit not as rich in terms of features than XMPP PubSub): they both belong to the real-time web world.

 

2. You're perhaps only the second major instance of a deployed PubSubHubbub hub, aside from the initial instance the Google engineers who developed it put out there.  Have you seen a rise in interest in using such a technology stack for enabling web-based pubsub systems?

Yes, definitely. The pubsub pattern was known for years. Yet as I said previously, this implementation of it is very elegant and the massive use of WebHooks makes it very attractive. The emergence at the same time of the real-time web interest was also a major traction factor.

 

3. Superfeedr is based on a very clever business plan.  Describe it.

For subscribers: stop polling, we'll do that for you. Tell us how much it costs you, we'll match that anyway. The assumption is that we fetch the same data for several people. We're selling the same thing several times, when the cost of the n entries is the same as one entry. 

For publishers: we'll make your feeds real-time for free (and we benefit from that for the subscribers, too). If you want cool features (analytics, customization, etc.) then we'll charge you, based on the volume of data which transits through us on your behalf.

 

4. You've received investment from some pretty well-known sources - Mark Cuban among them.  What has the capital infusion allowed you to do in terms of your technology and supporting infrastructure?  And explain the composition of your backend - specific languages, servers and technologies.

This was a small seed round. I'm no "Internet superstar" and I don't host parties. Yet, this investment was made for us to invest in servers (we basically tripled our number of servers since then), as well as - more importantly - key hires. Speed is our only competitive advantage, staying small is important to stay fast, but this money also helped us get a few great engineers who are helping us with new features to be announced!

Our architecture is quite simple: almost like a distributed BotNet. We have many parsers who receive feed URLs from dispatchers and then return the parsed content to it. The dispatchers send the feeds based on a pre-determined 'next-fetch' time. And that's pretty much it.  Everybody is connected via XMPP, which brings stuff like presence, querying and XML - convenient when dealing with feeds!

 

5. I see that some of your more notable clients include Posterous, Tumblr and twitterfeed.  In what ways have they leveraged low-latency push notifications and with which technologies?  And you've also got FriendFeed as a client, which uses SUP (its own protocol).  How is Superfeedr being used for that platform?

We host PubSubHubbub hubs for them. We believe publishers should focus on publishing, and we can help them improve their deliverability, by providing them with a hosted solution where the only things they have to do is (1) add some discovery inside their XML feeds, and (2) ping us whenever they update these. We will deal with the subscription and notification processes. Most of them have some kind of ping mechanism in place; we just make sure we translate these pings into an universal protocol, which is PubSubHubbub.

As for FriendFeed, we also use their Simple Update Protocol (SUP), on the subscriber side to know when a feed has been updated. It's not a ping protocol per se, but we consider it as similar - if a publisher already uses SUP, they can very easily turn on a hub at Superfeedr.

 

6. The service works great for relatively low-volume RSS/Atom feeds.  But as we move away from that format towards update-intensive stream-based data, how does your system scale under periods of heavy duress with feeds like Digg's 'Popular Stories' or firehoses? 

Well, there's heavy and then there's heavy. I think we could pretty much handle Digg's "Popular Stories" feed, but we couldn't handle Twitter's firehose.  PubSubHubbub, given the fact that it's build on HTTP, can't go seriously and reliably under a few seconds of latency. So feeds like the Twitter firehose couldn't really work (at more than 1 entry/sec, it's not reasonable to expect anything just yet). However, all feeds can be seen as an aggregate of many other feeds. The Twitter feed is nothing more than the sum of all the user feeds, and, expect maybe for Robert Scoble (kidding!), a user wouldn't update his Twitter feed more often than once every few seconds.  And then, we're good to go!

 

7. Superfeedr now promises to provide notifications to feeds "within 15 minutes, or it's free".  What continue to be some of the challenges of trying to deliver instantaneous alerts about content updates?

This 15-minute guarantee comes from the fact that we still have to do some polling in the worst-case scenario, and if we have to do polling, going much less than that is hard to do. So our approach is to decrease the average detection time rather than the maximum. If for 95% of our feeds we can guarantee 1 minute, then who cares about the 5% that are guaranteed to be below 15 minutes?

As long as they're will still be content that isn't pushed anywhere, we will have no way to get it without polling.

Thanks Julien, and congratulations, again! Good luck in all your work!  :-)

Posted via email from jasonsalas's posterous


Sunday, February 14, 2010

Breaking into The Worldwide Leader - my ESPN interview

You need only to spend five minutes with me to pick up on my affinity for sports, and a few more moments will inevitably lead to my fancy with ESPN.  As a lifelong sports fan, athlete, broadcaster, writer and trivia buff, it's literally the ultimate gig for people in my line of work.  Recently, the TV sports network considered me for a position; while the process didn't work out in my favor, I'm motivated to document my courting (pun most certainly intended).

While I won't divulge the particularities of the interviewing process, I will use this space to give aspiring and prospective staffers of the self-proclaimed "Worldwide Leader in Sports" an abstract view of what to expect when interviewing.  Hey, I'm not working there and I signed no non-disclosure agreements, so all's fair in love and human resources, right?  :-)

So last week I get an e-mail from an HR staffer letting me know I'm being considered for the organization's vaunted Stats & Information Group.  The process, I was quickly told, involved me securing a timeslot in a published schedule for a quick oral exam over the phone on general sports knowledge.  I booked such an appointment, which wound up being the next morning.  The exam was administered by the HR staffer himself, taking 7 minutes and consisting of 5 questions.  You really can't prepare for the grilling - by the company's own admission, it's trivia. You either know the material or you don't.

I assume the questions change from time to time and candidate to candidate, so I'll say only that the inquiries probed my acumen of baseball stats (dissecting the mathematical formulas for statistics), college and pro football (significant events, all-time leaders), golf and the NBA.  While I flubbed the baseball section, which in my case was the first question, I rebounded well and apparently did well enough to allow me to advance to the second phase - the written assessment.

The next stage was administered the very next day, and was a "speed drill exam" - a series of categorical questions on sports math, assuring accuracy in data from wire reports, general knowledge of players' collegiate associations, player names, terminology and esoteric rules.  The exam is long, and you only get 45 minutes to complete as much of it as you can and then return it to your proctor within the time limit.  It's open book/web, but the more time you spend looking up stuff the less time you're working on the exam.  

I drew from my SAT days and skipped an entire section I was spending a little too much time on to get to stuff I could breeze through faster.  I was tons of fun, and very draining.

I was informed two days later that the conclusion based on my scores was that I wouldn't be proceeding to the next phase, which is actual phone time with a hiring manager.  I was down, but still grateful for the opportunity, and hopeful another might pop-up in the near future.

To their credit, the gentleman helping me was incredibly gracious and patient; he answered all my questions honestly and responded quickly.  When it was determined that I didn't have what it took to proceed any further along the interview trail, I was informed right away.  This was actually the second time I'd interviewed for a spot - the first wasn't as enjoyable, being run by a junior staffer for a JavaScript developer position in 2005 who wasn't into sports at all.  Needless to say, this time I had much more fun and encourage anyone interested in a career in sports at that scale and scope to give it a shot.

If you're up for a challenge, the testing alone is worth the price of admission.  So to speak.

Posted via email from jasonsalas's posterous


Book review: Professional XMPP

Book review: Professional XMPP Programming with JavaScript and jQuery

Since the Jabber project officially became XMPP, this book is the second official tome of knowledge on the subject, and the first to specifically concentrate on XMPP development within the vein of leveraging the protocol to power low-latency web-based applications.  Author Jack Moffitt does a tremendous job of introducing, working with, and building systems based on the often-confusing but critically important topic of creating online experiences based on real-time. 

The book focuses on using Bidirectional streams Over Synchronous HTTP (BOSH) for empowering real-time communications over the web.  The basic layout of an infrastructure to support such systems over current web technologies is dissected, and in so doing being one of the better discussions on the topic.  This is helpful given the pushback many web devs typically have expressed in embracing a new technology stack.  

After well-written overviews of XMPP, its lifecycle and requirements, the book is all about using BOSH to build practical, real-world demos.  The examples are based on Strophe, a JavaScript library written by Moffitt, using a surprisingly simple and consistent pattern even beginning developers can pick up and be productive with in their own projects.  It's code you can easily understand and use for your own work.

The book is divided into 14 chapters that won't take you all day to read and follow along with.  Each chapter is about 20-30 pages, intelligently written, logically organized and appropriately enhanced with URLs, illustrations, screengrabs and syntactical explanations to support the subject.  Moffitt's voice is very friendly, and the chapters are long enough to give attention to the topic at-hand, but not drawn out to be boring.  You can tackle each demo at a single sitting, run the code, and then expand upon it.

The appendices are also extremely helpful, focusing on introduction to the jQuery framework, and working with BOSH connection managers.  Both are very concise and helpful (although I would have appreciated an additional appendix that gets more in-depth on working with Strophe).  

As far as the book's physical qualities, the Wrox binding is sturdy with thick paper, so it'll survive the process of violently flipping back and forth and forcing the book to lie flat as you work through the examples.

In short, Professional XMPP explores just a corner of the full range of services you can create with XMPP within the browser - not just merely IM for the web.  That serves to be the underlying theme and key message for this book: you can create great, powerful, push-based apps with the universally familiar toolset of HTML, CSS and JavaScript.

It's a great read for anyone wanting to get up and running with XMPP for the web, and will make a very welcome addition to any developmental library.

Posted via email from jasonsalas's posterous


Saturday, January 23, 2010

mea culpa (well, not really)

my apologies to all of you who've had to endure the sequential and obviously annoying posts over the past 12 hours or so. i'm testing Superfeedr's responsiveness, which lives up to its billing. sorry for the excessive noise. still love me?

besides, people, it was only a few posts. 7 to be exact. :p

Sunday, January 10, 2010

Inadvertently gaming real-time search services

One duly noted facet is the critical importance of relevance in a real-time search query's resultset.  And within that, a criterion with serious weight towards determining such relevance is frequency - if a post is being re-posted by other members of the community and across other platforms.  It's the latter that draws my ire this morning.

Traditional search systems succeed based on their ability to leverage indexing of resources based on the direct correlation of time - the longer a document exists online, the better its ability to appear in a search engine due to other resources referencing it, and the more it's discussed/retweeted/copied/indexed by other services/etc.  Obviously, when displaying content that's just been created mere seconds ago, this isn't the case.

A recurring concern for those in developing real-time search applications is unscrupulous spamming of such systems - those people (or 'bots, usually) jumping on the popular trends of the moment to draw attention to their own devices.  But how about good-natured, well-meaning folks that simply just want to publish their data across multiple platforms?  Does multi-service publishing inadvertently influence relevancy algorithms?  

For instance, suppose I author a thought on Twitter having to do with a certain topic - say, I post "I really dig the new CD by Iron Maiden" - and use a service like Ping.fm replicate my post across other social platforms like Facebook, del.icio.us, LinkedIn, Friendster, MySpace, Identi.ca, FriendFeed, Posterous, Blogger and Jaiku?  In the case, I'd be creating the same data at 11 different endpoints on the Internet, each with its own unique URL.  

Real-time services harvesting real-time feeds would pickup the fact that Iron Maiden-related posts would suddenly spike, and if continuing to do so within a given time period, create a trend.  Therefore, could those systems that assign relevance to topics based on the number of times a post is referenced be gamed by me just trying to get my stuff out over the maximum number of platforms?

At a macro level, would this affect the chance that I could personally influence an entire trend?  Or at the micro level, could I escalate my own thoughts higher within search results?  At the very least, the result from a user experience perspective would be that I'd just be inadvertently making results pages a whole lot noisier by diluting search subsets with the same stuff.  Whether I did so deliberately or accidentally, none are optimal.

Fortunately, such apocalypse hasn't befallen us just yet.  Only a few social services publish public firehoses at the moment, Twitter being the canonical example, and several other services have been setup by real-time providers in order to pulse-track them.  MySpace and Facebook are both coming online very soon, and the growing popularity within the development community of XMPP will make it easier for publishers to do make their data payloads accessible with low-latency.

But such gaming might be something to think about going forward.  Thoughts?

Posted via email from jasonsalas's posterous


Saturday, January 09, 2010

7 Questions for Collecta

Collecta is the latest creation from Jack Moffitt, a pioneering Albuquerque-based open source developer who has spearheaded the migration of real-time experiences on the Web with XMPP.  The service has etched out a place for "curated search", harvesting items across the Internet, displaying them mere moments after they're published online - without latency.

He's also authored the new book "Professional XMPP Programming with JavaScript and jQuery", and has given numerous talks about some of the opportunities and challenges inherent to managing real-time search.  Here's 7 Questions with Jack.

1. One of the most impressive features of Collecta is its ability to harness updates not only from the usual suspect microblogging platforms like Twitter and Identi.ca, but also from news sources and blog posts/comments, as well as through Flickr for images and video.  Describe how your back-end works. 

The main difference between the back-end at Collecta and the ones of most other search systems is that Collecta is almost entirely push-based. For example, instead of a crawler (which visits web pages on a periodic basis), a publisher notifies us about content at the same time that they publish it to their own sites.

When Reuters sends a new story out to their own sites, they also send it directly to us, and that content shows up immediately to any matching searches.

Of course, not all publishers are ready for push notifications, but we have also built specialized polling systems that bootstrap those data sources. We've also worked with the number of publishers to develop and deploy their push systems. For example, we helped Wordpress.com develop their XMPP infrastructure, which makes all Wordpress.com-hosted blogs and blog comments available as a publish/subscribe stream.

The primary advantage of this design is that the latency between the time someone publishes content and the time we see it is very nearly zero. We typically measure latencies of a third of a second or less on push sources.

2. What have some of the challenges been in managing so much inbound data?

The biggest challenge is getting access to the data. Since few publishers have working push systems today, we must work with them to develop such systems. Generally they are quite eager to do so, as they know that they need to provide their data ever faster in today's world. We may be the first to be knocking on these doors, but it is easy to see that there will be a lot of others after us wanting this data.

The next challenge is processing the data as fast as possible. In traditional search, relevancy is improved over time. For example, it may take several weeks for a new Wikipedia article to get enough incoming links to start showing up as very relevant. In real-time search, we can't afford to measure relevancy this way. One side-effect of this is that all the algorithms involved must be well tuned so that they don't introduce any delays. There is no buffer zone for processing.

3. I see Collecta being valuable to people in three distinct types of situations: for scheduled events with global importance that people can anticipate and plan for (i.e., Christmas, the Super Bowl, the Emmy Awards); for breaking/developing items that people can't predict or didn't expect (i.e., celebrity deaths, national tragedies, etc.); and also for the general entertainment value in seeing things pop-up as they happen.  How have people told you they've used your system?

These categories match-up well to several use cases we've heard about. Here are some examples:

One of our developers used Collecta to find information on local road conditions that was unavailable through other means. This got his wife to her destination on time by alerting her to closed roads and flooded areas.

We've heard stories of people researching items they were thinking of purchasing and finding deals and coupon information that made their purchase cheaper. Imagine that you're thinking about buying my book, and you search for it on Collecta. Someone may have just been talking about how they just bought it at some specialty site for a discount. You can then take advantage of the same deal.

My in-laws are farmers in rural Minnesota. They used Collecta to find out about issues affecting the grain markets that wasn't available in their traditional news sources for several days. This allowed them to structure their financial transactions to accommodate this. You could imagine similar stories playing out for any market trader.

One of our internal use cases is watching how people respond to our own announcements and features. When we launched Collecta we were all glued to it, and there was no other way to watch how people reacted to our product. Before we launched, we used Collecta to monitor President Barack Obama's inauguration, and the photo stream in particular was stunning and much more interesting than what you could get from TV.

I think you've left out one big category, though: discovery. We call this thing real-time search, but the name is really inadequate. It's the closest analogy we could that would give potential users some idea of what to expect. Once you've aggregated
all the data, you have some real possibilities that aren't really search-related in the traditional sense. For example, when you visit Collecta.com lately, the page is filled with current, trending topics, all with the most relevant articles attached.

This category is more like the function of newspapers and nightly news. It gives you an overview of what is going on in the world, even if you aren't sure what you're looking for.

We also plan to expand this to determine trending information and relevant summaries within any search you do.

4. One concern constantly popping up about real-time systems in regards to current events is the need to group data based on the authoritativeness, a la Google News, wherein more credible resources are assigned a greater weight based on the assumption that they're more accurate. Seeing as how reporting results as they happen is naturally linear, what are your thoughts?

In general, I think authority is a tricky subject. Most of the authority based relevance systems are pretty bad; they often apply authority over a broad range of topics when a source may only be authoritative for a small set. This is not a limitation of technology,
though. The same situation happens in the real world. Who is the authoritative source for information on healthcare reform in the U.S.? I think you'll find many people can't agree on this.

The way this is dealt with is by applying time. Eventually some source or set of sources will be deemed authoritative and this becomes the history we read about. The time scale here is quite large though, easily decades in some cases.

I don't think you can do a great job on authority alone, and I don't think it applies as broadly as people think. Finally, I don't see how you can resolve authority issues in a fraction of a second, when the only method I know of requires significant amounts of time.

The case may be different for a particular knowledge domain, but I still think even in these cases, authority is relative to a worldview.

5. Seeing as how we're in the embryonic stage of working with real-time information at a consumer level, many of the first-generation services are based on search and filtering of large data sets.  What other types of apps do you predict being popular as this model continues to evolve?

We're very interested in the idea of participation. Once you can inject your own content into the real-time flow, the world becomes a conversation, not a search. An example I often use is that the television network ABC would have a tough time getting all the people in the world to a single place to chat about its show Lost (not to mention getting them there at the same time). However, the Collecta search engine can aggregate every conversation about Lost in a single place. If you imagine a chat room interface where the other participants are made of up incoming search results, you can see where I'm going. You now have an Internet-wide chat room on any topic whatsoever, filtered and customized however you want. 

Data aggregation is not easy, and I think it will enable a host of applications that depend upon access to large amounts of data. For example, currently TweetDeck and Brizzly and the other social media clients all implement the Twitter API, the Facebook API, etc to get data to their users. You can see this clearly in their UIs as well; each service is boxed by itself. Imagine how easy it might be to make an integrated social media client when access to the data is uniform. This is just a simple example, but there are more complex situations that data aggregation can enable.

6. Like any emerging technology, some enterprising folks will develop frameworks to abstract/automate a lot of the tedious minutiae of working with XMPP over HTTP - you've released Strophe both as a C and JavaScript library.  How do you see this maturing in the near future?

Eventually we will reach a point where web browsers don't need long polling and other similar hacks to do push-based or two-way communication. I don't know if this will be realized in the form of XMPP support in browsers or whether it will be in the form of the
HTML 5 WebSockets protocol or something entirely different.  On the client side, libraries like Strophe already make a lot of the tedium of the process invisible.

The server-side is just as important though. XMPP and WebSockets require more than just a simple web server. I think we will see growing sophistication in that space to match what we've seen with web application servers like Django, Ruby on Rails, and others.  Soon we will have XMPP application servers and WebSocket application servers, and the frameworks like Rails and Django will seem as quaint as server-side includes and CGI scripts.

7. It would seem logical that you're going to get a lot of requests for custom app development based on the real-time paradigm for industries like media, politics and sports.  Have you, to this point?  (At the TV news station I work for, I'm planning to have Collecta running non-stop in our lobby on a large plasma HDTV as eye candy for visitors, with posts relative to the local current events scene.)

Yes. As I said earlier, we talk to many of these publishers about getting them set up to push content instead of just making it available via pull methods. It doesn't take long for these publishers to turn the questions around and start asking us how they can leverage real-time in their own properties.

Sometimes they want easy, aggregated access to their own content, and sometimes they are intrigued by the possibility of integrating a view on everything else on the web.

We're developing several products that will make this easy for these publishers and for anyone else to build what they want. The first such products we launched were the XMPP API and HTTP API for Collecta search results. I can't name any names, but we give out API keys to well-known publishers all the time.

One of the common requests is for "curated search" products. The publishers want to control the queries that happen and just provide an updating view, or they want to restrict the input sources to a specific knowledge domain. You can imagine a sports version of Collecta where both the queries and the data sources were restricted or augmented to facilitate a focused experience. This is pretty much everyone's first application idea with our technology, and it's one we are really excited about facilitating.

You can see one of our early prototypes in this area, which we designed for the Obama inauguration. Our recent launch of MySpace site search is another example. The former is limited to a specific query on the full data set, and the latter is scoped to a specific data set but allows arbitrary queries.

There is a lot of unexplored ground here, which is what makes the work extremely fun.

Thanks Jack!  Great feedback and good luck with your work in the real-time search space! :)

ARCHIVE

Posted via email from jasonsalas's posterous


Monday, January 04, 2010

Design decisions: real-time apps on mobiles & data lossiness

One of my favorite words in graphic design and networking parlance is "lossiness". For those distinct disciplines, the term cleverly denotes the amount of degradation in quality an image can sustain, and how packets may be dropped while routed across a network as a means of faster delivery, respectively.

 

Either way, lossiness infers efficiency as a performance driver towards accomplishing a goal.

 

I used my New Year's Weekend to really apply some serious thought to the multidimensional problem space of real-time search applications for the web.  I wanted to attack the issue from two specific angles - focusing on the feasibility of low-to-no latency systems for mobiles, and the rate of lost items in a real-time resultset.


 

Method 

Since I was using a late Saturday afternoon to conduct my experiment, I had several available devices and the full bandwidth of my company's Internet connection at my disposal.  My LAN doesn't proxy access, so each device's responses were autonomous. Additionally, the iPhone and netbook got online via WiFi; all other devices were all hard-lined to a hub.

 

 

On all devices I launched Collecta at the same time, querying “New Years OR NYE” I find that Collecta's vertically-scrolling collection of asynchronous items is the most logical interface for this type of experience. If items are going to be found as they're being published online, it stands to reason that they would be displayed that way without repetitive user-initiated page refreshing.  It's also one of the more comprehensive real-time search platforms, listing tweets, Flickr images, videos, news articles, and blog posts/comments.  Collecta's also a great practical use of Bidirectional-streams Over Synchronous HTTP (BOSH), the XMPP extension that facilitates real-time push for web apps.

 

I've always wondered about the reliability of real-time services on wireless devices, not only because of the lack of proper JavaScript support in mobile browsers to handle the repetitive XMLHttpRequest transactions, but also due to the bandwidth requirements for so many inbound payloads. 

 


Results 

While Collecta held up beautifully (rendering seamlessly on the iPhone), the results for each digital device were noteworthy, indicating that the better the specs on the machine, the more voluminous the subset of found items.  Specifically, in a span of the first 11 minutes after accessing the endpoint, my servers reported a resultset containing more than 390 matched items.  My desktop tower had slightly fewer, my laptop a little less than that, and my netbook a little less, with the iPhone only getting about half of the server's matches - all from running the same URL. It didn't appear as if the lagging devices had some sort of queue to catch up to the larger load - missing posts/images/status updates simply disappeared into the ether.

 

So that brought to light the question of whether processing power, memory, bandwidth, and/or browser vendor/version makes a difference in the droppage factor - how many items get passed up due to constraints or limitations imposed by those criteria. 

 

(Notable other discoveries: after running a hugely popular Collecta trend pulse “christmas” for 9 straight hours on a Windows computer, Firefox occupied more than 1.3GB of RAM and eventually crashed.  Collecta also uses a large amount of HTTP conversations, as compared to Google's real-time search product.)

 

  

Conclusions

The data gathered implies that the loss rate of real-time search items in apps like Collecta may be directly proportional to the capabilities of the requesting device.  This makes perfect sense - the increased client-side processing cycles necessary for rendering rich, dynamic UIs delivering rapidly changing data streams may impact the ability of a system to reflect the true pulse of messages being filtered down from its source(s).  Thus, devices with reduced memory, computational power and network connectivity may be subject to higher rates of dropped items than machines of greater ability. 

 

Specifically, mobile devices attempting to run real-time search apps may incur significant performance hits due to the increased processing.  The additional computations from heavy AJAX-based functionality should be considered in applications where data is reported as it happens.

 

Simply stated: a faster computer means faster results, which means more results, which means a more accurate picture of the Real-Time Web.  But all is not lost - there's a method to this madness.

 

It seems clear that XMPP over HTTP might not store undelivered messages for future delivery.  This is logical, seeing as how an escalating loss rate over time could create massive queues that would backlog a user's entire experience and be counterproductive to the real-time nature (showing content from hours ago while catching-up).  While not optimal, being lossy is actually safer. 

 

This may be savvy and deliberate preventative engineering due to the fact that such a large load of messages coming in will, at some point, clog the system; messages unable to make it into the UI are simply ignored to avoid such backlogs.  As a loose analogy, think of the difference between UDP and TCP in terms of managed lossiness.

 

I'm still torn on exactly how bandwidth on mobile devices affects real-time apps.  With proper buffering, modern wireless devices are capable of streaming full-motion video from YouTube; conversely, real-time apps based on XMPP are systems pushing bits of hypertext from server to client.  So the assumption would be that mobiles should be able to sufficiently handle such inbound traffic.

 

The counterargument, especially for apps tracking an infinite number of trends with a variable amount of popularity at that moment, comes under duress: imagine trying to push 35 IMs per second to a smartphone for an entire minute.  That's 2,100 instant messages.  And most hot topics are a lot noisier for a lot longer.  For most commercial forms of connectivity, something's got to give.  Message overflows could be conveniently discarded, making room for fresher items being harvested from across the Web to get their chance at being displayed, keeping the UI "inconsistently consistent" to the true general pulse of the community's response to some event, if not accounting for every single  atomic level response thought, image, comment or video.

 

More discussion is clearly needed in this arena.

 

Keep in mind that mine was a trivial experiment.  For simple searches, the loss of information in this regard is of no great consequence; but scaled up and for more mission-critical applications with sensitive data, the network effects could be significantly more damaging.

 

So it might be the case with systems like Collecta that lossiness for resultsets is merely a happy little accident.  Regardless, take this as an object lesson in how to do things the right way.  The analogy of the patient dying but the operation being a smashing success applies - intelligently manage throughput to keep pace with the community overall.

Posted via email from jasonsalas's posterous


Wednesday, December 30, 2009

Only half the real-time web equation has been solved

Typically with any new technology platform, bark outweighs bite.  So often a new shift in thinking gets the online community so jazzed just based on raw potential, everyone buzzes about how cool, efficient and profitable the new paradigm could be.  All we need is a sound case study to reinforce our suspicions and support our theories.

It's apropos that I used the holiday break to track the emergence of Real-Time Web applications.  The relevance came in the form of me being able to track the pulse of "Christmas" as a searchable topic, seeing as how that specific keyword involved a scheduled event of global importance, and had enough intro/outro traffic for a couple of days before and after the actual holiday that it made the pace of updates in services like Collecta and Google reflecting real-time publishing incredibly entertaining to watch.  (Not to mention the breakneck pace of posts on the holiday itself.)

There were literally hundreds of thousands of tweets, Flickr images, news stories and open microblogging platform updates all having to do with the events surrounding December 25 all over the planet.  

The real-time search services performed beautifully, living up to their hype, facility to mammoth amount of work and delivering a user experience that carried high entertainment value, if not some sort of usefulness.  That's significant, because Christmas is naturally something we collectively knew about and could predict.

But that was only half of the equation.  Given its annual nature, everyone saw Christmas coming and mentally prepared to post all sorts of media to the Internet ad infinitum.  What we now need as the second half of the litmus test is for real-time search services to properly handle a major breaking story.

Twitter gained international credibility (and notoriety), by allowing its users to interact with the events unfolding in regards to the plane crash in the Hudson River, with the political strife in Iran, and with the untimely passing of Michael Jackson. If the new generation of real-time search tools can effectively harvest and report coverage of a news event of similar magnitude, with the publishing load imposed by the worldwide social networking community at an unprecedented scale, we'll have completed the cycle.  I believe they will.

And thus, will have achieved our case study.

Posted via email from jasonsalas's posterous


Wednesday, December 23, 2009

The Realtime Paradigm & media shrinkage

Much these days is being written, (re)tweeted, posted and discussed about the crop of new services that make the first iteration of what's being called the Realtime Web possible.  Dealing primarily with search, a crop of enterprising platforms spearheaded by Collecta, Taptu and Google rapidly deliver a user experience that is unique: being able to see information literally within seconds of it being published online.

So while as a developer I salivate at the prospect of being able to harness new emerging platforms and the forthcoming generation of abstract frameworks that will empower me to write me own systems centering on an instantaneous feedback loop, the marketer in me wonders how I can exploit such technologies into valuable commodities.  But those concerns are trumped, perhaps more importantly, by the journalist side of my persona as I question the future of media as we know it.

I'm dubbing this balance The Realtime Paradigm": the symmetry between those who generate content and those who receive it.

The predication central to making a Web that includes discovery of new information, sans latency, is brevity.  The delicate synergy between content creator and content consumer relies on the pace of the former's ability to generate data and the latter's ability to take it in.  Obviously with the overwhelmingly exponential expansion of the 'Net's major social networks, the demand for knowledge is insatiable.  

It's unfathomable to think that even infinitely resourceful organizations like the mighty CNN would be able to crank out multiple 900-word essays and articles fast enough to satisfy the online audience's appetite.  Likewise, most people don't have neither the time nor desire to read a collection of aggregated feature-length compositions. It's too much work.  This also technologically enables more mobile adoption of platforms, working within the confines of screen and bandwidth limitations for handsets and smartphones.

So the next step becomes the impact on traditional media products - a regressive extension, if you will.  How will print media (already in the final stages of its own death throes), radio, television and even existing online platforms adopt this hyperaccelerated production cycle?  But before skipping merrily down that path, consider the shifts towards more condensed packaging that mass media has already seen.

You almost never see a double-feature movie anymore.  Several of Cartoon Network's shows in its Adult Swim programming block are only 11 minutes long.  The FX network features 3-minute recaps of that week's episode of its drama series after they've aired.  Songs on the Top 40 chart are getting shorter.  The average story in a newscast isn't as long as it used to be. Talk radio programs aren't as long as they used to be. Magazines have become more terse with their offerings.  

Clearly, consumer behavior has driven us into the Age of the Short Attention Span; those tasked with developing the information they rely on have to react accordingly if they are to leverage The Realtime Paradigm.

From a content creation standpoint, Twitter has been pretty revolutionary in (de)volving the way we've become accustomed to communicating.  I've said many times over the last year that as a professional broadcaster, anything I write now longer than the canonical 140 characters seems like an epic.  And this is, and will be, key to how realtime systems flourish.  We can keep our stuff of high quality, but in so doing make our material shorter and punchier, delivered in tasty bite-size morsels.

Media's not going flaccid by getting more condensed. It's getting more valuable.  Consider this: it took me 20 minutes to write and post this piece, where I could have generated 10 tweets.  You've made it this far, but Which would you have preferred?

So keep this in mind as we head in 2010 and the Realtime Web continues to take off.  The popular 'Net-lexical acronym "KISS" may be in need of revision, from hereon to be understood as "keep it short, stupid".  ;)

Posted via email from jasonsalas's posterous


Monday, December 21, 2009

Treatise on social graph utility

"Ugh. Sorry I was late - I've been catching up with Facebook."

I've given up trying to keep track of the number of times I've heard friends, family and colleagues utter this all-too familiar sentiment in recent history.  It's evidently become an acceptable burden that society bears of weeding through what can be miles upon miles of posts from users within one's social network. 

In my own foray with the Social Web, I've deliberately kept my friend/follower lists small.  For me, the true value of my social graph is its brevity - quality content over inundation.  Don't get me wrong, I like discovering new people and what they have to say, but I'll eventually drop someone from my roster if their stuff isn't doing it for me.  Drowning in empty chatter is just pointless to me.  And that's a skewed indictment on how the way we interact with each other has (d)evolved.

The Web is no longer an interconnected network of hypertext-based documents and media files; it's become a subsystem of the digital projections of human beings.  The somewhat spurious utility a user obtains from their social graph - the collection of users within their social network and the connections between them - led me to question a theory I developed earlier this year.

I proposed an extension to Metcalfe's Law - the fundamental principle defining the value of all networks - to incorporate the added utility generated in a social application.  (Do I know how to party or what?)  Essentially, I concluded, the personal value of your social network increases even further not just with the nodes added (your friends/followers), but also with the extension in the outward users that, by association, you become connected to.  This overall value can be quantified by applying a coefficient to the initial formula n(n-1)/2.  

(That is, if you're into that kind of thing.)

So a thought hit me while paging through friend requests on various apps: for me, the aggregate value of my social network actually decays were I to add each and every person that solicits my connectivity.  I don't stay logged in all day reading posts, so I choose to experience my friends' activity in short info-bursts.  Were I to allow everyone, I'd be diluting my lifestream, degrading the overall experience and taking away what little of a life I have now.

As I see it, there are three major classifications of social network user on the Internet at the time of this writing:

1 - The Serial Friend-Adder: someone who prowls the safari that is cyberspace, actively hunting for any and all connections, happily rapidly extending her social graph outwards; she derives primary value from the sheer number of friends/followers she's amassed.  Her main source of pride is being able to say "I have [X]-thousands of friends!", caring less about the quality of the content generated thereby.  Her bragging rights are determined by volume.
2 - The Audience Expander: someone who meticulously increases their friendlist, but only as a means of imposing their own will on the world.  What gets this type of user off isn't so much the volume of users, but the quality of those that will receive their stuff and pass it onto others.  Think of this as the crass capitalist or dictator.
3 - The Pragmatic Pessimist: a reserved, happily sheltered, introverted online user, who cautiously rejects more friend requests than she accepts.  Her major utility is extracted from the efficiency resulting from a reduced number of posts through which to filter, 

In case you haven't figured it out, I fall into the third classification.  My philosophy is that if you're not generating any information that motivates, educates, entertains, inspires, angers, titillates or otherwise makes my life better, you're wasting my damn time.  And to me, this is significant.  But not all social networks are the same.

Let's look at the mothership for the modern social network: Facebook.  The mighty platform/internet within the Internet/online community/online operating system uses a bidirectional model that requires a user allowing a requester to be his friend access to that person's stuff, too.  So for every nth user added, you potentially affect your personal system's value by a factor of 2 (either increasing/decreasing it).

Now consider the model employed by Twitter.  The microblogging platform doesn't require a user to follow those who follow her.  So you have a fragmented system of interconnectedness, wherein a celebrity with 2.5 million devout followers hanging on their every word may realistically only follow 35 people - and not necessarily from that subset.

But of course, interacting with a social network, inline with society in the real world, is completely subjective.  The signal-to-noise ratio of social applications is something everyone determines for themselves.  I know people who are absolutely delighted to sit and read pages upon pages of posts, just for the entertainment value, albeit petty.  Cool.  Whatever floats your boat.

And of course, the converse applies.  Expanding your followerbase logically increases the chance that you can discover some really cool inbound things, or be able to pass on neat things to others.  So it's a sticky wicket to manage.  Go with what model works best for you.

And realistically...who thinks about quantifying the utility of their social graph?  Shut up and read.  ;)

Posted via email from jasonsalas's posterous


Saturday, November 28, 2009

7 Questions for TweetBookz

On the day before Thanksgiving, I was using Twitter's web interface (which, coincidentally, I rarely use these days), when I discovered a curious link to TweetBookz in the vaunted right column. Investigating, I found an online service that cleverly archives a user's microblogging history, assembles the data in a printed form and prepares it as a personalized book, suitable as a gift. I instantly found it to be intuitively clever and full of potential, if not overtly innovative.

Showing it off to several friends in the past couple of days IRL and via numerous tweets I've authored, I've found it to be quite polarizing. People either fall head-over-heels for it, or don't get it at all. But people are talking about it.

Here's 7 Questions with Tweetbookz co-founder Jacob Shwirtz:


1. Describe how Tweetbookz came to be and what gave rise to this service.

We spend a lot of time thinking about and being active within social media on behalf of our clients, such as Zagat Survey (check out @ZagatBuzz). As a result, we started imagining what it would be like if Twitter were an “offline” thing and what it would be like to treat Twitter as a book. From there the idea evolved into something more like a book of poetry or inspiration quotes, as opposed to a full archive. We thought it would be very fun to have your favorite tweets in a beautiful book on your coffee table, so that’s the idea we ran with.


2. What's your sales pitch and what counterargument would you lob at someone who says "this is just for people too lazy to print"?

People seldom think about their history on Twitter and go back to see the things they wrote a week, month or year ago. Our books are a fun and nostalgic way to reminisce about those things. It's less about narcissism, as some may think, and more about nostalgia.

We’re giving permanence to something so ephemeral and from that comes a cool, unique and unexpected product.


3. What logistical setup are you using to take orders, bind and ship books to users?

We partnered with a world-class on-demand book binding house in New York and they fulfill all our orders. Like this we focus on our specialty of building and marketing excellent web sites and they focus on their specialty of printing, binding and shipping.

4. Obviously, the operation is very lean, using the Twitter API. Are you planning on similar services with other social networks?

The beauty of Twitter is its 140-character limit. Because of that, we were able to invest in beautifully designed books that have templates which we know will never be exceeded. Obviously there are several additional concerns when printing something like blogs, because the length can vary so wildly. This has to do with our desire to create great-looking, specific products, as opposed to full archives that would look like encyclopedias or “white pages.”

5. The natural primary buying audience seems to be active Twitterers, with people not on that platform being the natural recipients. Have you found this to be so?

Actually, our initial vision was about 50/50 split between two groups. One group was gift-givers (whether or not they themselves are on Twitter) - people who buy gift certificates for active Twitter users in their lives. The other group was people buying the books as keepsakes for themselves. We envision people thinking more about what they tweet now because they know they’ll be able to get it in a book.

For example, imagine tweeting the process of your wife’s pregnancy and then printing a book that reports on that wonderful life-moment. It's really too early to tell exactly what people will do but we’re really curious to see!

6. Can we get books assembled only of original tweets, exclusive of replies?

Right now we will populate the book with a user’s last 200 tweets. Using a basic “delete” feature, users can remove any tweets they don’t want printed. When they delete a tweet we will loads additional ones from Twitter in order to maximize the 200 pages of the book. In this way people can completely curate the content of the book. In the near future we plan to unveil other, more advanced, editing tools and filters, such as automatically removing all tweets with hashtags, links, and/or replies.

7. What are some immediate and future plans for Tweetbookz?

We are working hard on additional editing tools to let people create their perfect books. Also, we hope to let users upload their own custom cover designs and maybe some more personalization options around the books.


Thanks Jacob, and best of luck!


Past interviews:


Sunday, November 01, 2009

The (d)evolution of the American sportswriter

For any professional communicator, apathy is a fate worse than death. Throughout history, but no more evident than in today's media market, having the ability to evoke some sort of emotion from an audience - in any format and across any topic - is key to survival.

Literacy was always big in the Salas household. I grew up reading Sports Illustrated, and I consider myself privileged to have considered great works by tons of acclaimed people you've likely never heard of. I'd read everything from recollections of the Super Bowl experience, to comments about cricket, to essays on the emerging interest in some new concoction known as free agency. Even before my time, I'd obtain older pieces from columnists as far back as the 1940's, spinning tales of the golden ages of baseball and boxing.

Their words flowed slowly and gracefully, like honey off a wooden spoon.

The styles of the older sportswriters were generally akin, all being engaging, respectful and informative - everything they were taught to be as journalists. Leveraging humor (God forbid) was always tenuous, because with not everyone having that ability, if mismanaged it might depreciate their work. So they played it safe by playing it straight.

Fast forward to today: it's the age of reality television, an overabundance of pornography, and a culture not only completely happy with, but in constant demand of, replete voyeurism. This is spurned on by affordable consumer technology empowering practically anyone with near-realtime multiplatform immediacy and an equally simple ability to become an active reporter themselves.

The profession has morphed, but the demands on a writer to serve in ways that forces people to react to their creations remains as strong as ever.

Sportswriters of years past were scribes, true and distinct. They were gentleman scholars. Expert storytellers both of events taking place on the field or court of play, as well as the unseen drama unfolding off it. They were masters of the craft of creating poetry through their retelling multiple angles stories of athletics. It's a role that I considered a venerated art form, a modern-day sophist.

Today, sportswriters are largely only as good as their last punchline. It's all about how well you can smacktalk, namedrop, be referential to pop culture, and how many one-liners you can fit into an 800-word contribution. Today's sportswriter functions as more stand-up comedian than reporter. The older generation that's still around and clinging to their time-honored axiology are seen as old hat and irrelevant by today's audiences. Hence, no response.

My favorite writers today all have a signature edginess about their writing that lets them standout from their peers: Bill Simmons. Rick Reilly. Tom Rinaldi. Jim Rome. Christine Brennan. Max Kellerman. Keith Olbermann. Edwin Pope.

Even the great Tony Kornheiser, of whom I'm a huge fan, has massaged his natural wit to be punchier through biting and topical sarcasm, to stay relevant to a readerbase that expects and demands controversy - if not from the subject matter at hand, then by the people relaying it. Art just reflects society's expectations, because inline with the decline of Western civilization, kids today just don't know any better.

It's the direct result of the death of the newspaper industry and the rise in integrated TV and new media formats and applications. There simply are too many sources now to consider; to be distinct and secure eyeballs on a daily basis, sportswriters have to take an angle as being funny, daring or downright rude that'll lock you into consuming their stuff and formulating an opinion one way or the other that gets you to come back.

And let's not neglect the ESPN Influence, wherein everything the network does is considered the gold standard of sports journalism. Their every move heavily drives what is seen as acceptable for the masses; the ubiquitous catch phrases, clever references and gimmicky running gags pressure their print counterparts to follow suit in that format.

Lest they be cast into the purgatory that is reader indifference.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]