Small tips when you are building Erlang drivers

 I can already hear you asking WHY, but there are cases when you just need to combine Erlang’s high availability and message parsing with the raw power and speed of C.

We @BugSense are using Erlang and C to power our stream database called “Lethe”, which can handle and process more than 2.5M rows in under 5 secs. On a single node.

We plan to release most of it as OpenSource in the near future, so keep bugging us!

First of all, there are many ways to build Erlang C drivers (called Ports). Let’s separate them in two categories: The proper way and the lunatic way.

We’ll go the lunatic way. I am calling it lunatic because if your C code has an error, BEAM (Erlang’s VM) will also take a dive. Good times. But this is also the fastest way to connect Erlang and C. These lunatic drivers are called linked in drivers (for the least brave, check C nodes where your driver is just another Erlang node. Slower but way safer).

There are excellent official tutorials here and here on how to build your driver but let’s talk about a couple of things I’ve seen in my experience.

When you are about to load your shared library (erl_ddll:load_driver) make sure that the library is in the correct path AND that you add the proper checks (you could spend a good amount of time searching for what is going wrong).My way is this:

Trust me, when you are using rebar and you don’t know why your library was not loaded, you will thank me. If you have upgraded to the latest Erlang (R15) there is a small change that if you are just copy/pasting examples from the Internet, you’ll face some problems. Make sure that this is your structure:

Watch for MAJOR - MINOR in the last two lines.

At this point, let’s talk about Mnesia. Never heard of it? Mnesia is a SUPER fast key/value storage which is trivial to scale, maintain, manage, replicate and it’s included in Erlang. It feels so natural adding and retrieving data from the storage and not caring where your data really are. It even has its own query language, if you want to do more complicated stuff.

But what really has blown my mind is the following: I’ve started building a small key/value storage just to play around with erlang drivers and it was fast. Really fast (I think it was 0.14micro sec for each put). According to some rough tests, it was 100 times faster than other key/value databases like Redis or Mongo (of course it was just a plain key/value, no replication, no backups etc). Now, I’ve tried Mnesia (one node) and the performance was 0.20micro. And Mnesia has all of the above. It does have a 4GB disk persistence limit as well BUT you can always connect a different backend ( and go around that. That means that the guys behind Mnesia have really done an excellent work in the optimization of data streams, as well as the serialization/deserialization of the data, which is the real problem in our example. Or both.

I was thinking about starting a series of posts about “Building the fastest (and simplest) key/value database on earth”, describing everything from hash algorithms, to replication strategies and all the nasty details. What do you think? PS: If you want to learn more about Erlang, you need to check learnyousomeerlang.

It is one of the best tutorial that I’ve found online. Do it!

- Jon