A friend of mine’s work deals a lot with geographical data. He was working on clustering large sets of coordinates into .25 mile groups and sent me over a small sampling to play with. By small, I mean a 8,736 coordinates. His real data set is much, much larger.
He had started off by utilizing GeoKit, but I noticed that the only thing he was using it for was to calculate distance. His was nicely encapsulated, but this is basically what it was doing.
1 2 3
I assumed there was some overhead involved in the library, so I wrote my own implementation of the haversine formula. I did see a slight performance increase, but not much. Geokit seemed to do a pretty good job at staying efficient.
Enter RubyInline. RubyInline is a gem that allows you to write other languages inside your ruby code. It comes with out of the box support for C/C++. I had been wanting to play with RubyInline for awhile, but nothing had come across my plate that needed that level of optimization.
I translated the haversine formula to C++ using RubyInline and the results were amazing. On my machine I was able to reduce the time it took to calculate 1,000,000 distances from 11.247408 seconds to 0.679327 of a second. It only took me a few minutes to rewrite the method and from an API standpoint it is identical to the pure ruby version.
The next time you’re doing work that is easily optimized with C, like this math formula, give a shot and be reap the rewards. The code and benchmarks are on github.