Quantcast
Channel: ActiveState - open data
Viewing all articles
Browse latest Browse all 2

Bentframe: Developed with Komodo, deployed to Stackato

$
0
0
Bent Frame: bike collision data

For years now I've been working on tools that help build and debug Rails applications, but other than the Candidate Buzz demo application, I haven't actually created many myself. I've started many apps as weekend projects over the years. I enjoy building tools to solve common problems I run into when working on these apps, and enjoy using those tools for future projects.

But when it came time to actually launch one, I would ask myself if it was worth the trouble of spending weekends and late nights trying to placate angry or confused users. Shouldn't I spent my off-work time doing something other than... work? So yet another project would be consigned to a remote corner of my repository.

Bentframe: Visualizing Bicycle/Car Collision Data with Google Maps

In early December word got out that the indefatigable David Eaves had managed to get ICBC, the provincial automobile insurance company in British Columbia, to release a dataset of collision reports involving a car and bicycle for the years 2006-2010. Each record in the data contained a year, month, and location as a latitude, longitude pair.

I figured I could whip up a visualization of this data relatively quickly, seeing how one of those web sites I never launched was a Google Map visualization I wrote in late 2007 that showed a database of foreclosed houses for sale in the U.S. I was struck by how large the database was: about 70,000 properties scattered over all fifty states, but didn't consider the economic implications. Had I not been focusing on the technical aspects of how to display tens of thousands of data points in a 600x300px window, and instead dipped into the financial press to learn about credit default swaps, short sales, and predicting the future, I might be writing this from a Caribbean island. But fortunately for fans of visualizations, I had focused on writing efficient clustering engines, and figured I could whip up something useful during the single-day hackathon.

Within the first hour I wrote a quick Ruby program to read in the data, fired up Komodo to create a new Rails app, copied over the relevant bits of the old "Closing Town" foreclosure code, and was back in Rails development (as opposed to deployment) mode.

Debugging the Initial App in Komodo

The 4-year-old foreclosure app didn't need much changing to handle the new data set. Change mentions of "properties" to "collisions". Change the default view from central Florida to Vancouver, BC.

By noon, I felt ready to go. I started the server, pointed Firefox at good old http://localhost:3000/spots, and found... an empty Google map of Vancouver. Komodo's DB Explorer showed the database was populated with over 5,000 locations, as expected. Log statements on the server side showed that it was reading the locations from the database, and returning a non-empty JSON packet. On the front end, Firebug was showing that my Ajax callbacks were firing, and creating markers at the correct locations. The problem was, they weren't showing up.

Meanwhile, my colleague for the day, Nathan Griffiths, had taken a different approach, pouring the raw CSV file into a new app at geocommons.com. While I was dealing with updating code from Rails 1.2 to 3, and pouring the usual combination of HTML, JavaScript, and CSS together into a coherent whole, Nathan had quickly mapped all the collisions, added city bike route data for the city of Vancouver to the mix, and quickly had built an impressive working app. The provincial minister for Labour, Citizens' Services and Open Government, Margaret MacDiarmid, had spent most of that Saturday morning at the hackathon talking to people. When she came by to see what we were working on, Nathan's app had locations marked as points, overlays, animations, and heat maps. I didn't think she'd be as impressed that I could debug Rails on the server and JavaScript on the client side at the same time, with a log window open as well. The minister was a family physician before she entered politics, not a retired Cobol programmer. I stayed quiet while Nathan demoed his app, wondering if I was reinventing yet another wheel.

It turns out that the markers weren't showing up because my code was using custom images provided by the authors of the Google Maps book I used in late 2007. Attempts to wget the images gave me silent 404s instead. I pulled out the code that was setting the custom images, reran the app, and my map filled with hundreds of the stock Google Map red pin images. Yay. Time to solve a few more problems, wondering if maybe I should take another look at using Geocommons instead.

But the rest went quickly. I found a source of images with digits so I could display the number of collisions in the marker, and used reverse geocoding to translate latitudes and longitudes back into street addresses.

I showed the app to a journalist, and she asked me what the top 10 spots were. I'm not displaying them yet, but I have the flexibility to add that. Vancouver added a couple of controversial bike lanes downtown during 2009 and 2010. Once I pour in the 2011 data, I hope to provide some charts that show how collision frequencies are dropping near those areas. That's why in 2012 we're still writing our own code -- to create something new without being constrained by existing tools.

Painless Deployment to Stackato

Unlike the foreclosure data app, word got out, and the aforementioned journalist was asking when my web site would be up. Four years ago there weren't any single-button deployment solutions. Today there are a few, including Stackato by ActiveState. Deploying Rails apps used to be a major pain point, and the subject of many conference sessions. Now it took one evening after work to deploy the app on Stackato.

New features of Rails 3 makes deploying on a system like Stackato easier. For example, I added a data_loader.rb file to the config/initializers directory that checked to see if the database was empty. If it was, the single line Rake::Task['db:seed'].invoke would run db/seeds.rb, which was nothing more than the program I used during the hackathon to populate the database from the CSV file. I added the CSV file to git, and the next time the app was restarted it rebuilt the database automatically.

Now that I was actually deploying the code, I had to make sure the code behaved differently depending on whether I was testing the code in development or test mode, or deploying it. For example, I was juggling two Google Map API keys, one for localhost, and one for the target URL. There was an easy fix for this, in config/environment.rb:

GOOGLE_MAPS_KEY = (Rails.env == "production" ?
                   [key for sandbox.activestate.com] :
                   [key for localhost]);

I pushed the app to the Stackato sandbox, tested it out manually and found it worked fine, notified the local cycling advocates, and watched the hits come in. The map helped me meet one of my goals, an increase in awareness that the designated routes were not 100% safe. In particular, one intersection, where the 10th Avenue bike route crossed Clark Drive, a 6-lane truck route through the city, was showing 16 collisions over the five-year period, easily one of the top ten intersections. Apparently people at city hall are discussing possible improvements.

Improving Performance

I was happy that I could release the app without any work at all, but was less pleased with the performance. When you zoomed out to a level that contained all of Vancouver, the server was taking up to 20 seconds to respond -- it was spending most of that time converting over 2,000 locations into about 200 locations that Google Maps could quickly render. This hadn't been as much of a factor with the foreclosure data, because there were few blocks that had more than one bank-owned house on them. But the ICBC data was revealing more than one collision along each block along designated routes and main streets. Quick profiling revealed that 10% of that time was spent converting rows from the data base into ActiveRecord objects, and most of the balance spent clustering those items.

I spent the next Saturday rewriting the clustering code in C. I had worked out the subtleties of the clustering algorithm in Ruby, so this was essentially a rewrite into C. I made sure the core code was independent of any scripting language API, and then wrote an interface layer between C and Ruby (so this library could be easily ported to any other language with a C interface). It went amazingly quickly, with benchmarking showing a drop in processing time of over 90%. The advantage of the C code, besides the speedup in executing machine code over interpreted Ruby bytecode, is that I knew the code's memory requirements in advance, allocated it all in one shot, freed it at the end, and could structure the code to minimize sloshing bits around. With Ruby that was out of my control. And writing a layer of C code to interface with Ruby is actually surprisingly fun.

I built a gem for the C code, added an entry for it in the app's Gemfile, ran bundle package (side note: use bundle install for gems registered with rubygems, bundle package if your app uses private gems), pushed the app to Stackato, and it worked much faster. As I zoomed out, the server was taking less than a second to process a large set of points. You can see the result at http://bentframe.sandbox.activestate.com.

Note that while I developed the C library on OSX, it didn't matter that I was deploying to Linux. Stackato compiled the gems for me during staging.

Handling Cars

Car Collisions

Just in time for Christmas, I received an Open Office Calc .xmlx file from ICBC containing about 110,000 rows of data representing car collisions at intersections in the province for the same years, 2006 through 2010. It took about a day to modify the bike app to handle the slight differences in the car data, and the clustering C module was performing fine. However the numbers were larger, compared to the 5500 rows for the bike data. If I zoomed out long enough, the server would take over 20 seconds to respond. Benchmarking showed that most of that time was spent simply reading data from the database (using a pure SQL call to bypass Rails' ActiveRecord). But a call to determine how many rows matched a given query took no time, so I modified the JavaScript side to show a warning when there would be too much data to look at when you zoomed out too far, and left it at that. You can see the car crash map at http://carcrash.sandbox.activestate.com.

Trackback URL for this post:

http://www.activestate.com/trackback/3261

Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images