While working with several different NoSQL databases heavily loaded with write requests, we faced a situation when the hard drive became a bottleneck. Scaling the cluster horizontally could easily solve this kind of problem, but it would also increase the monthly payments. This is why we decided to take a look at other options.
The first thing that comes to mind when a DB starts experiencing HDD performance issues is to combine several virtual drives into a RAID array, but how will it work with Windows Azure virtual infrastructure? To check this, we compared the performance of a single virtual drive and different RAID arrays (types: 0, 1, 4, 5, and 6) using the Bonnie++ tool for hard drive subsystem verification.
Below you will find the test results and step-by-step instructions on how to configure a RAID array on your own.
Test 1: RAID performance under Write/Read/Re-write workloads
In the first test, we measured the performance of different RAID arrays for simple read/write operations:
sudo bonnie++ -d /raid1/ -m 'raid1' -u root -n 100:8192:16384:20 -x10 -s 16g -f > raid1.csv
Bonnie++ was run 10 times (-x10). Each test worked with 100 files of 8-16 KB in size and 20 subdirectories. In total, there were 16 GB of “files” in each iteration. Since a large Windows Azure instance has 7 GB of RAM, we had a chance to avoid caching.
You can see the first test results below. The x-axis stands for megabytes per second, the y-axis indicates repetitions (we ran each test 10 times).
Write test results: