I need to load more than one billion keys into Berkeley DB and so I want to tune it in advance to get better performance. Now, with standard configuration, I now have about 1, 000,000 keys loads of 15,000 which are very slow, for example, there is no proper way to tune the B + tree of Berkeley DB (node size etc.) for example ?
(After tuning the cabinet after Tokyo toning, it loads 1 billion keys in 25 minutes).
PS I see a tuning tip as a code and there is no defined parameter for running system (eg jvm size etc ...)
I'm curious, when the Tokyo cabinet loads 1 minute keys in 25 minutes, then what is the size of the keys / values stored? What is the I / O system and the storage system you are using? Do you mean to use the word "weight" for 1B transactions permanent permanent storage? It would be ~ 666,666 inserts / second, which is physically impossible, which I know of any I / O system. Multiply the key and value of that number of time and you are no longer unhappy with physical limitations.
Please take a look at the blog, read a bit about the I / O system and how it works in the hardware. Then review your statement. I am very interested in finding out what the Tokyo Cabinet should do. And it is not what it is doing. If I thought I could say that it is committed to caching the file system in the operating system, but not flushing (fdsync () - ing) those buffers on disk
Full Disclosure: For Oracle Berkeley DB (direct opponent of Tokyo cabinet), a product manager in Oracle, I am playing with these databases and playing with the best hardware for them for almost ten years. I am, therefore, I am both biased and suspicious.
Berkeley DB has flags that you can set on the transaction handle, which imitates the speed and other mobility ("D" in ACID).
As far as the Berkeley DB Java version (BDB-JeE) is fast you can try the following:
- Deford writes: When this delay is over By writing the transaction log as long as possible (flashes this data when buffers are complete)
- Sort your keys in advance: Most B-trees (our included) fast Loads better with in-order entries for load time -
- Log files come Increasing the amount is up to 10 MIBs to be clear about the claims of performance with the database - it is very important - like 100 MB, it reduces I / O cost -
They seem simple, but it becomes very difficult to correct them so that they can never become corrupt data or lose committed transactions.
I hope this will help you a little bit.