Monday, June 7, 2010

Why redis for time series data?

I just saw a contrived benchmark from a gentleman in China about redis vs tokyo db vs mongodb for storing time series data.

http://bit.ly/9GGCLP

I would comment, but I can speak Chinese much better than I can read it and I can't seem to post.

Here's what I would say if I could:

You're completely ignoring the extra functionality you get with redis. The nature of time series is that you'll be getting results in sort order by window 100% of the time after you store it.

Redis presorts which makes this operation very fast, but costs a bit more when you store it.

Yes, it stores it slower, but it's presorted and will return in sorted order very very fast(as you have shown) when queried by window.

If this was weighted by importance, "read last 30 days ohlc by symbol" would be like 90%+ of your priority.

The only case where presort makes less sense is if you are writing so fast presort is not optimal(not likely).

I've been looking for a time series data store as good as redis for 10+ years. We telemetry geeks really like the way it works. It just makes sense.

Having said all that there is nothing wrong with the other solutions as they will likely be fast enough for what you want to do with them. NoSQL is wicked fast by nature, but redis is wicked faster and pre-optimized for 90% of your operations.

1 comment:

epaulin said...

hi, I'm the author of the benchmark.

> You're completely ignoring the extra functionality you get with redis

This is not true, since I was using redis list in the benchmark so sort or weight was not needed.

see: http://github.com/yinhm/nosql-tsd-benchmark/blob/master/redis_list.rb#L19

I think redis was great suited for State Store, but not Time Series Data, since redis need all data fits in memory, and the performance was not that great, this did surprised me.

From my point of view, bdb or hdf5 was still the best options for TSD if you don't care about scale.