The Jason Salas Experience

Guam's Mr. Media - making people think, making people laugh, pissing people off

Monday, June 12, 2006

We all look so bad because Google is so good

One thing I passionately appreciate and painfully decry is how bad my web apps stack up to Google's (duh). Most visible in the arena of web-wide searches, I've noticed after analyzing the behavior of those who use my site's search tool (a full-text index-based searech utility I built) that the average user's mean level of comprehension in searching content gets progressively worse over time. I attribute this to the fact that Google's so damn good at what it does.

Google allows us to be lazy, and still works great.

Take for example spelling. Google's intelligent recommendation feature (i.e., "Did you mean...") helps out so many people to the point that users don't place as high an emphasis on accuracy when doing WAIS searches like they did a decade ago. I haven't emulated such functionality in my own utilities...which understandably makes them look inferior due to society's acceptance of Google as the gold standard.

Here's an excerpt from a support e-mail I exchanged with a user who complained that my site's search facilities weren't serving him well enough and needed help. I find it ridiculous that on today's Web I've still got to suggest the following courses of action:
  • Watch your spelling – this is the #1 reason people aren’t able to find things, and you can see in our LiveSearch tags (http://www.kuam.com/search/livesearchresults.aspx) people often misspell a name, place or event. Unfortunately, KUAM Search is very unforgiving in this regard.
  • Search shorter phrases, not entire sentences – better yet, use single words (i.e., searching for ‘port authority of guam protest in which employees were yelling and joe mesa talked about possible privatization and mentioned contracts’ will just dilute your results and won’t get you anywhere)
  • Search for content, not content areas – I’ve seen people trying to search for things like “news from January” or “local news”. This shouldn’t return anything, and if it does, it’ll be useless information having nothing to do with stories we ran
  • Use Boolean keywords – make liberal use of terms like AND, OR and BUT NOT to refine your results.
  • Wrap quotes around your keywords - search for things like (i.e., “Karch Kiraly” as opposed to Karch Kiraly will be more specific)

What's comical is that after offering a collection of helpful hints to get better, quicker, less diluted search resultsets, I finally gave in and ended the e-mail with "And if that still doesn’t get you the results you’re after, try doing a KUAM.com-specific Google search, by visiting this URL and entering your search keyword: http://www.google.com/search?hl=en&q=site%3Akuam.com".

16 Comments:

  • At June 13, 2006 6:46 AM, Anonymous jonahu said…

    Jason,
    It seems the search tool has some odd results when you search for any text that also appears in "today's headlines" section at the bottom of each articles page. For example if I search for "department of education", and this text is a part of the "today's headlines" section - your search tool will return a bunch of articles that don't really contain that search term - though the "today's headlines" section does.

     
  • At June 13, 2006 11:35 AM, Anonymous jonahu said…

    Jason,
    What about putting that exceprt from your support e-mail on the KUAM.com search page? I imagine that could be considered "good to know" information when using your search tool.
    Jonah

     
  • At June 15, 2006 8:44 PM, Blogger Jason Salas said…

    The "Today's Headlines" content doesn't affect the outcome of queries through KUAM Search...they're dynamically inserted through a different process and aren't in the same index as the text stories. Good theory, though.

     
  • At June 15, 2006 8:45 PM, Blogger Jason Salas said…

    Jonah (second comment),

    I was thinking of just that...adding an FAQ section. Stupidly, I had assumed that people had gotten used to searching syntax by now...which kinda led to this blog post. :-)

     
  • At June 16, 2006 8:04 AM, Anonymous jonah said…

    Jason,
    Attached below are a few articles (URL information included) of 250+ that matched this search query "DOA transfers $3.5M". In this list you'll find that only the first article actually contains the search phrase. 3 More theories.
    1. phrase search feature is not functioning properly
    2. something is missing from your tip on searching for phrases: Wrap quotes around your keywords - search for things like (i.e., “Karch Kiraly” as opposed to Karch Kiraly will be more specific)
    3. this is by design and is how you intended for it to work

    STORIES MATCHING YOUR SEARCH IN THE KUAM.COM ARCHIVES
    DOA transfers $3.5M, avoids payless payday - Thursday, June 15, 2006
    [ http://www.kuam.com/news/18179.aspx ]

    $6.5M shortfall at GPSS forces payless payday - Thursday, June 15, 2006
    [ http://www.kuam.com/news/18178.aspx ]

    Dollars and sense: Calvo crunching GPSS numbers - Wednesday, June 14, 2006
    [ http://www.kuam.com/news/18157.aspx ]

    Frustrated rider files complaint against Guam Mass Transit System - Monday, June 12, 2006
    [ http://www.kuam.com/news/18140.aspx ]

    Committee expected to grill GEDCA over 'Havoc' loan - Sunday, June 11, 2006
    [ http://www.kuam.com/news/18127.aspx ]

    DOA assessing mass transit situation - Thursday, June 08, 2006
    [ http://www.kuam.com/news/18092.aspx ]

    ....

    Moylan wants Duenas and Blas to receive paychecks - Thursday, April 15, 2004
    [ http://www.kuam.com/news/9191.aspx ]

     
  • At June 16, 2006 12:25 PM, Blogger Jason Salas said…

    Hi Jonah,

    This is due to the fact that you're searching for a specific string based on the title of a story, which winds up being an amalgamation of related stories that "kinda" resemble the search string, based on a weighted algorhithm. Basically, the index looks at mainly the body of an article, not the title.

    Our initial tests with our KUAM.com Beta Community implied that when we built early builds of KUAM Search, including the titles in the search capability diluted the resultset, and wound up confusing people too much, so we just base it on full-text searches of the body of articles.

    As an example, check out this search on "DVR". You get only a few articles on digital video recorders, and a couple of the Department of Vocational Rehabilitation. That's the type of resultset people who use our search services best go after.

    But as KUAM's main web guy, I agree...I should make it better. This is something we're aware of and working on all the time. On that note, I'm almost done developing our public API so people can do Web 2.0-ish mashups from KUAM Search, so you can use this in your own projects, regardless of the web platform you're using.

    Should be out soon.

     
  • At June 16, 2006 2:44 PM, Anonymous jonah said…

    Jason,
    From the title of your blog post I think it's safe to say that you consider Google to be the gold standard of search. For those of us used to searching with Google, it is not surprising that doing a Google search on the following phrase comes up with zero results: "google will not find asdfasdf"

    That's because this exact phrase "google will not find asdfasdf" does not exist in any of the gazillion pages google has managed get it's hands on. This is the way people search for phrases with Google - when using your search tool it's likely people will use it in similar fashion. I believe this is what you meant when you wrote: I had assumed that people had gotten used to searching syntax by now...which kinda led to this blog post. :-)

    In my opinion, instead of worrying about features like Google's intelligent recommendation feature - you might want to consider emulating the way Google does phrase searches, no doubt this would be a much smaller technical hurdle to clear - but more importantly it will start returning much more meaningful results. Going back to my previous post, there is no reason a search query for "DOA transfers $3.5M" should have returned this article for example:

    Frustrated rider files complaint against Guam Mass Transit System - Monday, June 12, 2006
    [ http://www.kuam.com/news/18140.aspx ]

    It's not relevant, it's just noise.

     
  • At June 17, 2006 1:09 PM, Blogger Jason Salas said…

    Good advice...and we're emulating some of what Google's been able to accomplish with WAIS a bit more than people will probably immediately recognize. For example, the Google search algorhithm mainly looks for an exact phrase AND related entries in its index secondarily.

    While what we've built fails by comparison to Google's search features, it does to a certain degree do the same thing, looking first for the specific phrase and then for any matching individuals terms. In the search you did, the word 'DOA' is found in the article Guam Mass Transit article.

    But long story short, we're always keeping tabs on how (in)accurate our search facilities are, and we have a threshhold against which we measure when it's time to refine the indexing procedures. :)

    But we're also nothing without the comments of our users, so thanks for using our searh tools, and for the exellent feedback! I really appreciate it.

     
  • At June 17, 2006 7:24 PM, Anonymous jonah said…

    Ok I'm going to beat the cat just once more :)

    You wrote:
    For example, the Google search algorhithm mainly looks for an exact phrase AND related entries in its index secondarily.

    What search algorithm is this exactly? We're talking about "phrase" searches. If what you say is true, wouldn't you expect a Google phrase search for "President Bush some related story about Iraq" - quotes included - to return something "related" to the terms? Instead you will get the following message:
    Your search - "President Bush some related story about Iraq" - did not match any documents.


    Now if I wanted to improve my search to get better results I would probably do a search closer to this:
    "President Bush" AND Iraq
    And as expected I get a response like this: Results 1 - 10 of about 66,700,000 for "President Bush" AND Iraq. (0.12 seconds)

    Doesn't this make more sense?

     
  • At June 18, 2006 10:52 AM, Blogger Jason Salas said…

    Jonah,

    Here's some links you might find helpful, and then this is the last I'm going to comment on this post:

    Read Google Hacks by O'Reilly in which it discusses the particularities of Google-esque search queries

    Read my cheat sheet on using Google search syntax

    Read about PageRank (and consider how this wouldn't apply to KUAM.com stories

    Read Sergey & Larry's original research paper on PageRank

     
  • At June 19, 2006 9:13 AM, Anonymous KUAM Blog Administrator (BillyR) said…

    Jonah,

    We routinely monitor all comments made to our blogs, so if we notice any posts that we feel aren't going anywhere or otherwise don't belong online, we have them removed.

    Thanks for understanding.

     
  • At June 20, 2006 9:10 AM, Anonymous Anonymous said…

    thanks for the info about kuam search! i use this all the time to catch up on what's going on at home.

    it works for me, so keep on keeping us here in the middle east informed!

    souljazz in kirkuk

     
  • At June 20, 2006 9:15 AM, Blogger Jason Salas said…

    Hi Souljazz,

    Glad you're able to use our search tools and RSS feeds. My pleasure to be able to keep you guys abreast of what's going on back home.

    Have you tried our podcasts, too?

    Also, what company are you with? I've got some friends in engineering units over there.)

     
  • At June 20, 2006 3:10 PM, Anonymous Anonymous said…

    yes! we use your podcasts to get us up to speed everyday. we like the fact that they're FREE!!!!

     
  • At June 20, 2006 3:32 PM, Blogger Jason Salas said…

    Yep - and we're adding more in the coming weeks...stay tuned!

     
  • At August 18, 2006 5:40 PM, Anonymous Anonymous said…

    ahh search is soo much better now! nice job jason!

     

Post a Comment

Links to this post:

Create a Link

<< Home