No API? No problem! Fake it with browser automation & web scraping

Neal Shyam
Devpost Hacks
Published in
4 min readJul 14, 2015

--

As you can tell from our latest Summer Jam Hackathon, we ♥ APIs at Devpost. They’re like the one ring of programming, enabling you to pull info and perform actions from different services. I doubt Slack would be nearly as popular is it is without all those cool API integrations.

Now, considering how popular APIs are these days, it’s frustrating to run into a service or site without one. But, it’s actually quite common. Netflix shut down it’s API years ago. My bank doesn’t have one. Most news sources don’t either. Bottom line, many apps & data aren’t designed for programmatic access. 😭

But don’t let that discourage you from building your next big thing. If you need to collect data or perform an action on the web without access to an API, there are a couple ways you can hack it. 🙌 💪

Web scraping — Collect information from the web

Web scraping is the process of downloading a web page’s source code and parsing it to find particular data. Before the proliferation of APIs and ‘open data’, this was actually state of the art tech. Richard Murby, our hackathon guru, actually began his career by writing scrapers for a travel startup.

The key to web scraping is figuring out how to identify the exact elements you’re looking for. This could be by looking for element types (divs, list items), particular ids or classes, or by doing regex / XPath searches.

No matter what language you’re into, there’s a great scraping library for your project:

You can also use a SaaS scraper like Kimono to create ‘live APIs’ from webpages. It’s pretty neat, so try it out.

Scraping Examples

→ I used BeautifulSoup and Python to create a command line interface to Poncho 😺, my favorite weather service:

Poncho is pretty great and you are too!

The script downloads my personal forecast and searches the HTML for the elements that contain the forecast description, temperature, etc; extracts the information; and then formats it for output to the console.

→ I also used a scraper to build a social media reporting tool for HootSuite. The app pulls all of the HTML from my HootSuite tabs and parses the HTML to create a unified report that I can share with my clients:

HootSuite Extractor makes my job so. much. easier. when Thursday rolls around and it’s time for client reports.

Browser automation — Do stuff on the web

While many APIs are about input/output, some of the best APIs are the ones that perform actions like updating your profile, sending SMS / making phone calls (Twilio), or handling payments (Striple, Venmo, Braintree).

However, there are tons of services with no public API. For example, you can’t check your Netflix / Hulu history programmatically. That’s where scripting or automating browser actions comes in handy.

Automation Examples

→ The absence of a Netflix API inspired me to create BingeWatcher, a bot that monitors your Netflix activity for binge watching events and then asks you if you want to order snacks for your Mad Men marathon. This project also incorporates web scraping to figure out how many episodes of a TV show I’ve watched.

That image is actually an animated GIF — which thanks to Twilio, works pretty reliably.

How did I do it? I used PhantomJS (a headless browser, not actually a JavaScript app) and Selenium (a driver for headless browsers), to automate browsing actions like logging in, filling out forms, and pressing buttons. A lot of developers use these tools to QA run extensive tests for their apps, but you can use them to do anything a human would do with Chrome / Safari / Firefox / Internet Explorer.

So, while you can’t programmatically tell Hulu to subscribe to your favorite show — because there’s no API — you can write a script to log in, search for, and queue all your favorite Community episodes.

There are PhantomJS & Selenium wrappers for most languages, but Ruby fans should also consider Mechanize.

→ Into sports? My buddy Devin Mancuso wrote a script that monitors his fantasy lineup every week and starts any benched players in case he forgets to set them himself. Again, there’s no public API for Yahoo Fantasy, but if you know which buttons to press and their CSS ids, you can hack it!

Hack. Scrape. Automate.

Building an API isn’t always easy, (we’re still working on our own), but browser automation and web scraping are excellent tools and you should add them to your digital toolbox! Need help with a scraping or automation project? Tweet me @nealrs.

--

--