Monday, June 7, 2010

Why redis for time series data?

I just saw a contrived benchmark from a gentleman in China about redis vs tokyo db vs mongodb for storing time series data.

http://bit.ly/9GGCLP

I would comment, but I can speak Chinese much better than I can read it and I can't seem to post.

Here's what I would say if I could:

You're completely ignoring the extra functionality you get with redis. The nature of time series is that you'll be getting results in sort order by window 100% of the time after you store it.

Redis presorts which makes this operation very fast, but costs a bit more when you store it.

Yes, it stores it slower, but it's presorted and will return in sorted order very very fast(as you have shown) when queried by window.

If this was weighted by importance, "read last 30 days ohlc by symbol" would be like 90%+ of your priority.

The only case where presort makes less sense is if you are writing so fast presort is not optimal(not likely).

I've been looking for a time series data store as good as redis for 10+ years. We telemetry geeks really like the way it works. It just makes sense.

Having said all that there is nothing wrong with the other solutions as they will likely be fast enough for what you want to do with them. NoSQL is wicked fast by nature, but redis is wicked faster and pre-optimized for 90% of your operations.

2 comments:

@yinhm said...

hi, I'm the author of the benchmark.

> You're completely ignoring the extra functionality you get with redis

This is not true, since I was using redis list in the benchmark so sort or weight was not needed.

see: http://github.com/yinhm/nosql-tsd-benchmark/blob/master/redis_list.rb#L19

I think redis was great suited for State Store, but not Time Series Data, since redis need all data fits in memory, and the performance was not that great, this did surprised me.

From my point of view, bdb or hdf5 was still the best options for TSD if you don't care about scale.

Unknown said...

I have a question if you could help.

The last days I have been looking for the right storage for my data. I have a very big database of ohlc data and i need to dump it into some other storage every 1 second.
Im doing this in c++ and the db im dumping is proprietary of a 3rd party.

Anyway i tried using redis and its by far the fastest db on both write and read.

I have been using sorted sets and I have a key like company:instrument:resolution
Resolution is the bar size (1m, 1d,etc)
I have about 100 instrument and 9 resolutions. For each key I only dump 500 items.

So my question is: is there a better way to do it? How can I asssure there is only 1 bar for each score (the score is the timestamp)?

Thanks