Tired of scraping data? Just ask!
A constant source of stress for modern-day web programmers is finding a good source of data (or two or three). Hackers constantly bemoan how their favorite sources for information don't have public APIs, web services, RSS feeds or other Web 2.0-style remote access. Thus, they're thus relegated to considering one the few options we have left: web scraping. This is can be tough (coding isn't simple), fickle (if the source changes its HTML, the app breaks), and arguably illegal (a form of unauthorized stealth). However, Alex Iskold points out a compelling argument against the theft concern.
With more and more local web programmers getting into mashups and remixing, KUAM.com has become a popular candidate as a data source, since we generate a lot of content frequently. But even with the various ways we offer developers to access our data, sometimes it's not enough. I totally understand. Some people have even e-mailed me asking if they can get a view of a database table, an XML file or an Atom feed of some aspect of our data. In some cases, I've accommodated them. (Hey, making this type of stuff available gives us one more thing to show of in our RSS gallery.)
In fact, a project I've started recently seeks to develop a web UI that serves as contact form and end-user license agreement, and upon submission dynamically builds RSS feeds based on a user's specifications and filtering options. So, you could easily request certain information, inherently comply with the legalese, and have a new source of data...instantly!
(Other more specific APIs, non-standard formats or complex data types to be serialized as XML are routed to site administrators who write the SQL or data shaping to spit the data out in the desired format with the intended structure, with proper frequency.)
So next time you get stuck looking for a good raw source of data, just ask if one can be set up for you. The worst you'll get back is 'no', and it's a safer and more reliable alternative to scraping.
With more and more local web programmers getting into mashups and remixing, KUAM.com has become a popular candidate as a data source, since we generate a lot of content frequently. But even with the various ways we offer developers to access our data, sometimes it's not enough. I totally understand. Some people have even e-mailed me asking if they can get a view of a database table, an XML file or an Atom feed of some aspect of our data. In some cases, I've accommodated them. (Hey, making this type of stuff available gives us one more thing to show of in our RSS gallery.)
In fact, a project I've started recently seeks to develop a web UI that serves as contact form and end-user license agreement, and upon submission dynamically builds RSS feeds based on a user's specifications and filtering options. So, you could easily request certain information, inherently comply with the legalese, and have a new source of data...instantly!
(Other more specific APIs, non-standard formats or complex data types to be serialized as XML are routed to site administrators who write the SQL or data shaping to spit the data out in the desired format with the intended structure, with proper frequency.)
So next time you get stuck looking for a good raw source of data, just ask if one can be set up for you. The worst you'll get back is 'no', and it's a safer and more reliable alternative to scraping.
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home