Selecting fastest Database Engine for Netflow Storage
Well, if you are reading this article you probably know what NetFlow is and how much data it can generate. 100GB of NetFlow records per day is not something totally unusual in NetFlow business.
We tried most of the commercial and open-source existing database engines like Microsoft SQL Server, MongoDB, PostegreSQL, MySQL or even Oracle and were not happy with results.
We tried turning off all the indexing, implemented daily partitions, decided not to store some less important NetFlow fields and may be even
upgraded your storage to those fancy M2 SSD and still NetFlow reports took minutes to appear.
If your NetFlow rates are somewhere under 1,000 flows per second you can skip this reading, pick any DB and it will produce acceptable results.
Problems becomes visible when your flow rates reaches 2,000 fps and at 10,000 fps just doing INSERT to your tables takes 70% of the time.
10,000 flows per second produces 600,000 database records every minute and only INSERT statement takes around 40 second to process leaving only 20 seconds of
available time for any reports to be generated. The main problem here is that conventional DB engines are not optimized for storing read-only sequential data such
as time based event logging or NetFlow. Best database engine for Netflow has to be designed with read-only sequential access in mind, no “Delete-Update” functionality is required for NetFlow.
This restriction allows great simplification of DB internal formatting structure and processing logic. Second DB feature that best suited for NetFlow is reduction of possible Indexes to one.
There is absolutely no value in having indexes for Source/Destination IP addresses or Source/Destination ports as those indexes only benefit single type of NetFlow report
and each index will double your table size on disk. The only Index that is used in all reports is Index by flow time stamp as all NetFlow reports are focused on very specific time frame.
And last but not least DB feature that is required for perfect NetFlow storage is hardware optimization. Allocating each DB thread to a dedicated CPU core has shown
to increase query processing time by 10x.
When developing our own NetFlow collector we tried all well-known DB engines with one performing slightly better than others but none of them were able to support the golden standard of 30Kfps.
This was until we met the ClickHouse, open source DB engine developed by Yandex. ClickHouse has a long list of limitations, but those limitations are implemented with
a single purpose to have the fastest logging DB engine available on the market. With a help of ClickHouse Nectus can process 50,000 flow per second in single VM which is currently
a record among all commercially available NetFlow collectors.
Download your 60-day Trial of the best NetFlow collector.