Bugsense worked on a year-long joint research initiative with pHD researchers at ISTLab of AUEB to analyze the root causes of Android app crashes. At Droidcon Berlin, the findings were presented at the conference by Panos, Bugsense Co-Founder and CEO.
More than 10M stack traces were extracted & analyzed with some very cool findings. We thought the cool findings needed to be shared with conference attendees - and developers everywhere - in a cool way, so we decided to demonstrate Android errors and their root causes in real-time!
An Infographic seemed like a great real-time showcase so we got busy. Displaying ~200 errors per second in real-time and analyzing traffic of ~10,000 req/sec is not an easy task and the conference was coming up quickly, so we decided to pull it together in a weekend Hackathon. Using 1 server, our custom database - LDB, and our big-data crunching experience, our team went to work.
Using an LQL script we created (essentially LISP code with an analytics DSL), we tracked which root exceptions were trending for different Android flavors and devices (in real-time), stored results in memory, and saved to HD every couple of minutes.
Then, it was just a matter of grabbing the results and returning a JSON response - ALL from the same LQL script! This is the main file that does all the processing (imagine doing the following with Hadoop or Storm):
The data was streaming in, and the Infographic was displaying data perfectly. Then, 4 hours before the presentation, we noticed data was slowing. We had chosen some arbitrary limits for the number of results and we were streaming all root exceptions and all OS versions to LDB. We realized we had 800,000 x 800,000 combinations on our hands and the view was taking up to a minute to respond. Yeap, just a minute (long live in-memory databases) - but we wanted something snappier.
Because of our joint analysis with ISTLab, we knew the top 10 root cause exceptions account for > 80% of all Android errors (pareto theorem anyone?), so we limited the number of OS versions tracked to the official releases by Google. This reduced the available combination to < 1000 and voila! we were delivering a snappy real-time infographic! We should give a shout out to pusher.com for delivering the individual crashes to the World Map via websockets! Using Pusher was a no-brainer, set and forget! Thanks pusher.com.
What started as a 2-day Hackathon, ended up as a featured Visual.ly infographic, the buzz at Droidcon Berlin, and a great display of LDB’s BigData real-time data-crunching power!
Feel free to share thoughts or questions at @bugsense or stop by and see us at Google IO May 15 - 17.