*Coder Blog

Life, Technology, and Meteorology

Distributing load across multiple volumes

When it was time to implement a new online service to store observations for tens of thousands of weather stations and make that data available to Seasonality users, I had a lot to think about with respect to the hardware configuration of the weather servers. The observations service requires a lot of disk I/O (not to mention storage space), but it’s pretty light on processor and memory requirements. I had spare cycles on the current weather servers, so I didn’t see the need to buy all new equipment. However, I wanted to be careful because I didn’t want the increased disk load to slow down the other services running on the servers.

Let’s back up a bit and talk about what kind of setup I currently have on the weather servers for Seasonality. I have a couple of servers in geographically diverse locations, each running VMware ESX with multiple virtual machines. Each virtual machine (VM) handles different types of load. For instance, one VM handles dynamic data like the weather forecasts, while a different VM serves out static data like the websites and map tiles. These VMs are duplicated on each server, so if anything goes down there is always a backup.

One of the servers is a Mac Mini. It had an SSD and a hard drive splitting the load. With the new observations service in the pipeline, I replaced the hard drive with a second SSD to prepare for the upgrade. With this particular server being marked as a backup most of the time, I didn’t have any load issues to worry about.

The other server is a more purpose-built Dell rack mount, with enterprise hardware and SAS disks, and this is the box that I lean on more for performance. Before the observations server I had two RAID mirrors setup on this server. One RAID was on a couple of 15K RPM disks and handled all the dynamic VMs that needed the extra speed, like the forecast server and the radar/satellite tile generator. The other RAID was on a couple of more typical 7200 RPM disks and hosted VMs for the base map tiles, email, development, etc. There were two more disk bays that I could put to use, but I had to decide the best way to use them.

One option was to fill the extra two disk bays with 7200 RPM disks, and expand the slower RAID to be a bit more spacious, and probably increase the speed a reasonable amount as well. The other option was to add two disks that didn’t match any of the other RAIDs, effectively adding a 3rd mirrored RAID to the mix.

I decided on the later option, because I really wanted to make sure any bottlenecks would be isolated to the observations server. For the price/performance, I settled on 10K RPM disks to get some of the speed of the faster spindles, while not breaking the bank like 15K or SSDs. The observations service would be run completely on the new RAID, so it wouldn’t interfere with any of the current services running on the other volumes. So far it has worked beautifully, without any hiccups.

My point here is that it’s not always the best idea to create a single big volume and throw all your load at it. Sometimes that setup works well because of its simplicity and the extra speed you might get out of it. However, with most server equipment having enough memory and CPU cycles to create several virtual machines, usually the first limitation you will run into is a disk bottleneck. When splitting the load between multiple RAID volumes, you not only make it easier to isolate problem services that might be using more than their fair share, but you also limit the extent of any problems that do arise while still retaining the benefit of shared hardware.

1 Comment

  1. Hi, thanks for the very clear tutorial on allow ICMP/tracert through the ASA5505!

    Enjoying reading your posts.

    Re: “The observations service requires a lot of disk I/O ”
    Have you looked into V-locity by Condusiv before? They optimize the writes to reduce the number of IOps.

Leave a Reply

Your email address will not be published.

*

© 2017 *Coder Blog

Theme by Anders NorenUp ↑