*Coder Blog

Life, Technology, and Meteorology

Category: Storage

Distributing load across multiple volumes

When it was time to implement a new online service to store observations for tens of thousands of weather stations and make that data available to Seasonality users, I had a lot to think about with respect to the hardware configuration of the weather servers. The observations service requires a lot of disk I/O (not to mention storage space), but it’s pretty light on processor and memory requirements. I had spare cycles on the current weather servers, so I didn’t see the need to buy all new equipment. However, I wanted to be careful because I didn’t want the increased disk load to slow down the other services running on the servers.

Let’s back up a bit and talk about what kind of setup I currently have on the weather servers for Seasonality. I have a couple of servers in geographically diverse locations, each running VMware ESX with multiple virtual machines. Each virtual machine (VM) handles different types of load. For instance, one VM handles dynamic data like the weather forecasts, while a different VM serves out static data like the websites and map tiles. These VMs are duplicated on each server, so if anything goes down there is always a backup.

One of the servers is a Mac Mini. It had an SSD and a hard drive splitting the load. With the new observations service in the pipeline, I replaced the hard drive with a second SSD to prepare for the upgrade. With this particular server being marked as a backup most of the time, I didn’t have any load issues to worry about.

The other server is a more purpose-built Dell rack mount, with enterprise hardware and SAS disks, and this is the box that I lean on more for performance. Before the observations server I had two RAID mirrors setup on this server. One RAID was on a couple of 15K RPM disks and handled all the dynamic VMs that needed the extra speed, like the forecast server and the radar/satellite tile generator. The other RAID was on a couple of more typical 7200 RPM disks and hosted VMs for the base map tiles, email, development, etc. There were two more disk bays that I could put to use, but I had to decide the best way to use them.

One option was to fill the extra two disk bays with 7200 RPM disks, and expand the slower RAID to be a bit more spacious, and probably increase the speed a reasonable amount as well. The other option was to add two disks that didn’t match any of the other RAIDs, effectively adding a 3rd mirrored RAID to the mix.

I decided on the later option, because I really wanted to make sure any bottlenecks would be isolated to the observations server. For the price/performance, I settled on 10K RPM disks to get some of the speed of the faster spindles, while not breaking the bank like 15K or SSDs. The observations service would be run completely on the new RAID, so it wouldn’t interfere with any of the current services running on the other volumes. So far it has worked beautifully, without any hiccups.

My point here is that it’s not always the best idea to create a single big volume and throw all your load at it. Sometimes that setup works well because of its simplicity and the extra speed you might get out of it. However, with most server equipment having enough memory and CPU cycles to create several virtual machines, usually the first limitation you will run into is a disk bottleneck. When splitting the load between multiple RAID volumes, you not only make it easier to isolate problem services that might be using more than their fair share, but you also limit the extent of any problems that do arise while still retaining the benefit of shared hardware.

Setting up a Small Desktop RAID System

With the exodus of mass internal storage hitting even the top end of the line in the 2013 Mac Pro, a lot more people are going to start looking for external solutions for their storage needs. Many will just buy an external hard drive or two, but others like myself will start to consider larger external storage arrays. One of the best solutions for people who need 5-15GB of storage is a 4 disk RAID 5 system. As I mentioned in a previous post, I went with a Pegasus2, and set it up in a RAID 5. This brings up a lot of questions about individual RAID settings though, so I thought I would put together a primer on typical RAID settings you should care about when purchasing a Pegasus or comparable desktop RAID system.

Stripe Size
Stripe size is probably the setting that has one of the biggest impacts on performance of your RAID. A lot of people will run a benchmark or two with different stripe sizes and incorrectly determine that bigger stripe sizes are faster, and use them. In reality, the best performing stripe size highly depends on your workload.

A quick diversion to RAID theory is required before we can talk about stripe sizing. With RAID 5, each drive is split up into blocks of a certain size called stripes. In a 4 disk RAID 5, 3 disks will have real data in their stripes, and the 4th disk will have parity data in it’s stripe (in reality, the parity stripes in a RAID 5 alternate between drives, so not all the parity is on the same disk). The parity stripe allows a disk to fail while still keeping your array online. You give up 25% of the space to gain a certain amount of redundancy.

When you read data from the volume, the RAID will determine which disk your data is on, read the stripe and return the requested data. This is pretty straightforward, and the impact of stripe size during reading is minimal.

However, when writing data to the disk, stripe size can make a big performance difference. Here’s what happens every time you change a file on disk:

  1. Your Mac sends the file to the RAID controller to write the change to the volume.
  2. The RAID controller reads the stripe of data off the disk where the data will reside.
  3. The RAID controller updates the contents of the stripe and writes it back to the disk.
  4. The RAID controller then reads the stripes of data in the same set from the other disks in the volume.
  5. The RAID controller recalculates the parity stripe.
  6. The parity slice is written to the final disk in the volume.

This results in 3 stripe reads, and 4 stripe writes every time you write even the smallest file to the disk. Most RAIDs will default to a 128KB stripe size, and will typically give you a stripe size range anywhere from 32KB to 1MB. In the example above, assuming a 128KB stripe size, even a change to a 2KB file will result in almost 1MB of data being read/written to the disks. If a 1MB stripe size is used instead of 128KB, then 7MB of data would be accessed on the disks just to change that same 2KB file. So as you can see, the stripe size greatly determines the amount of disk I/O required to perform even simple operations.

So why not just choose the smallest stripe size? Well, hard drives are really good at reading contiguous blocks of data quickly. If you are reading/writing large files, grouping those accesses into larger stripe sizes will greatly increase the transfer rate.

In general, if you use mostly large files (video, uncompressed audio, large images), then you want a big stripe size (512KB – 1MB). If you have mostly very small files, then you want a small stripe size (32KB – 64KB). If you have a pretty good mix between the two, then 128KB – 256KB is your best bet.

Read Ahead Cache
A lot of RAID systems will give you the option of enabling a read ahead cache. Enabling this can dramatically increase your read speeds, but only in certain situations. In other situations, it can increase the load on your hard drives without any benefit.

Let’s talk about what happens in the read cache when read ahead is disabled. The read cache will only store data that you recently requested from the RAID volume. If you request the same data again, then the cache will already have that data ready for you on demand without requiring any disk I/O. Handy.

Now how is it different when read ahead caching is enabled? Well, with read ahead caching, the RAID controller will try and guess what data you’ll want to see next. It does this by reading more data off the disks than you request. So for example, if your Mac reads the first part of a bigger file, the RAID controller will read the subsequent bytes of that file into cache, assuming that you might need them soon (if you wanted to read the next part of the big file, for example).

This comes in handy in some situations. Like I mentioned earlier, hard drives are good at reading big contiguous blocks of data quickly. So if you are playing a big movie file, for instance, the RAID controller might read the entire movie into cache as soon as the first part of the file is requested. Then as you play the movie, the cache already has the data you need available. The subsequent data is not only available more quickly, but the other disks in your RAID volume are also free to handle other requests.

However, the read ahead results in wasted I/O. A lot of times, you won’t have any need for the subsequent blocks on the disk. For instance, if you are reading a small file that is entirely contained in a single stripe on the volume, there is no point in reading the next stripe. It just puts more load on the physical disks and takes more space in the cache, without any benefit.

Personally, I enable read ahead caching. It’s not always a win-win, but it can greatly speed up access times when working with bigger files (when the speed is needed most).

Write Back Cache
There are two write cache modes: write through, and write back. Your choice here can have a dramatic impact on the write speed to your RAID. Here’s how each mode works.

Write Through: When writing data to the disk, the cache is not used. Instead, OS X will tell the RAID to write the data to the drive, and the RAID controller waits for the data to be completely written to the drives before letting OS X know the operation was completed successfully.

Write Back: This uses the cache when writing data to the disk. In this case, OS X tells the RAID to write a given block of data to the disk. The RAID controller saves this block quickly in the cache and tells OS X the write was successful immediately. The data is not actually written to the disks until some time later (not too much later, just as soon as the disks can seek to the right location and perform the write operation).

Enabling the write back cache is “less safe” than write through mode. The safety issue comes into play during a power outage. If the power goes out between the time that the RAID told OS X the data was written, and the time when the data is actually on the disks themselves, data corruption could take place.

More expensive RAID systems, like the Pegasus2, have a battery-backed cache. The benefit here is that if a power outage happens as described above, the battery should power the cache until the power goes back on and the RAID controller can finish writing the cache to disks. This effectively overcomes the drawback of enabling write back caching.

Another potential drawback for enabled write back caching is a performance hit to the read speed. The reason for this is that there is less cache available for reading (because some is being used for writes). The hit should be pretty minor though, and only applicable when a lot of write operations are in progress. Otherwise, the amount of data in the write back cache will be minimal.

The big advantage of using a write back cache is speed though.  When write back caching is enabled, OS X doesn’t have to wait for data to be written to the disks, and can move on to other operations.  This performance benefit can be substantial, and gives the RAID controller more flexibility to optimize the order of write operations to the disks based on the locations of data being written.  Personally, I enable write back caching.

Wrap-up

That about covers it.  Small desktop RAID systems are a nice way to get a consolidated block of storage with some a little redundancy and a lot more performance than just a stack of disks can provide.  I hope this overview has helped you choose the options that are best for your desktop RAID system. In the end, there is no right answer to the settings everyone should use. Choose the settings that best fit your workload and performance/safety requirements.

Pegasus2 Impressions

With the lack of drive bays in the new Mac Pro, Apple is definitely leaning toward external storage with its future models.  My Mac Pro won’t arrive until next month, but in the mean time I had to figure out what kind of storage system I was going to buy.

As I mentioned in a previous post, I had considered using my local file server as a large storage pool.  After trying it out for the past couple months, I wanted something that was a bit faster and more reliable though.  I decided to look at my direct attached storage (DAS) options.  Specifically, I was looking at Thunderbolt enclosures.

My data storage requirements on my desktop machine are currently between 3-4TB of active data, so single disk options weren’t going to cut it.  I need at least 2 disks in a striped RAID 0 at a minimum.  I’m not particularly comfortable with RAID 0 setups, because any one of the drives can fail and you would lose data.  However, with good automatic Time Machine backups, that shouldn’t be too much of an issue.  Ideally I want something with 3-4 drives that included a built-in hardware RAID 5 controller though.  This way, I would have a little bit of redundancy.  It wouldn’t be a replacement for good backups, but if I disk went offline, I could keep working until a replacement arrives.

The only 3 disk enclosure I found was the Caldigit T3.  This looks like a really slick device, and I was pretty close to ordering one.  The main drawback of the unit is that it doesn’t support RAID 5.  I would have to either have a 2 disk RAID 0 with an extra drive for Time Machine, or a 3 disk RAID 0 (which is pretty risky) to support the amount of storage I need.  I decided this wasn’t going to work for me.

Once you get into the 4 disk enclosures, the prices start to go up.  There are two options I considered here.  First is the Areca ARC-5026.  Areca earned a good reputation by manufacturing top-end RAID cards for enterprise.  The 5026 is a 4 bay RAID enclosure with Thunderbolt and USB 3 ports on the back.  The drawback is that it’s pretty expensive ($799 for just the enclosure), and it doesn’t exactly have a nice look to it.  It reminds me of a beige-box PC, and I wasn’t sure I wanted something like that sitting on my desk.

The other option I looked at was a Promise Pegasus2.  It’s also a 4 disk RAID system (with 6 and 8 disk options).  They offer a diskless version that is less expensive than the Areca.  It doesn’t support USB 3 like the Areca, but it does support Thunderbolt 2 instead of Thunderbolt 1.  And the case is sharp.  Between the faster host interface and the cost savings, I decided to get the Pegasus.

The diskless model took about 2 weeks to arrive.  The outside of the box claimed it was the 8TB R4 model, so Promise isn’t making a separate box for the diskless version.  I suspect that Apple twisted Promise’s arm a little bit to get them to release this model.  Apple knew there was going to be some backlash from Mac Pro upgraders who needed an external replacement for their previous internal drives.  Apple promoted Promise products back when the xServe RAID was retired, and I imagine Apple asked Promise to return the favor here.  The only place you can buy the diskless R4 is the Apple Store.  It isn’t sold at any other Promise retailers.

Since the enclosure doesn’t include any drives, I decided on Seagate 3TB Barracuda disks.  They are on the Promise supported drive list and I generally find Seagate to make the most reliable hard drives from past experience.  With a RAID 5, I would have about 9TB of usable space.  More than I need right now, but it’s a good amount to grow into.  Installing the hard drives was pretty straightforward: eject each tray, attach each drive with the set of 4 screws, and latch them back in.  Then I plugged it into my Mac with the included 3 foot black Thunderbolt cable and turned it on.

This being the diskless version, the default setup is to mount all four disks as if there was no RAID.  This is counter to the Pegasus models that include drives, where the default configuration is a RAID 5.  This module instead uses this pass-through mode (JBOD), so you can take drives right out of your old computer and use them with the new enclosure.  I had to jump through a few hoops, but getting the RAID setup wasn’t too bad.  I had to download the Promise Utility from their website first.  Once you install the software, you can open up the utility and then do the advanced configuration to setup a new RAID volume.  The default settings for creating a RAID 5 weren’t ideal.  Here’s what you should use for a general case…

Stripe Size:  128KB
Sector Size:  512 bytes
Read Cache Mode:  Read Ahead
Write Cache Mode:  Write Back

The Pegasus2 has 512MB of RAM, which is used for caching.  It’s a battery-backed cache, so using Write Back mode instead of Write Through should be okay for most cases.  Only use Write Through if you really want to be ultra-safe with your data and don’t care about the performance hit.

Once you get the RAID setup, it starts syncing the volume.  The initial sync took about 8 hours to complete.  The RAID controller limits the rebuild speed to 100MB/sec per disk.  This is a good idea in general because you can use the device during the rebuild and it let’s you have some bandwidth to start using the volume right away.  However, it makes me wonder how much time could be saved if there wasn’t a limit (I found no way to disable or increase the limit using their software).

Drive noise is low to moderate.  The documentation claims there are two fans, one big one for the drives and one small one for the power supply.  Looking through the power supply vent though, it doesn’t look like there’s actually a fan there.  Maybe it’s further inside and that is just a vent.  The bigger fan spins at around 1100-1200rpm (this is while doing the rebuild, but idle is no lower than 1000rpm).  It’s definitely not loud, but it’s not really quiet either.  Sitting about 2 feet away from the Pegasus, it makes slightly less noise as my old Mac Pro (I keep the tower on my desk about 3 feet away).  The noise from the Pegasus is a bit higher pitch though.  When the new Mac Pro gets here, I’ll have the Pegasus further away from me, so I’ll wait to fully judge the amount of noise at that point.

Overall I’m very happy with the system so far.  Initial benchmarks are good.  Since I don’t have the new Mac Pro yet, I’m testing on a 2011 MacBook Air over a Thunderbolt 1 connection.  Using the AJA System Test, I saw rates of around 480MB/sec reads and 550MB/sec writes.  Switching to BlackMagic, the numbers bounced around a lot more, but it came up with results around 475MB/sec reads and 530MB/sec writes.  With RAID 5 having notoriously slow writes because of the parity calculation, I’m a little surprised the Pegasus writes faster than it reads.  The RAID controller must be handling the parity calculation and caching well.  It will be interesting to see if benchmarks improve at all when connected to the new Mac Pro over Thunderbolt 2.

File Server Upgrade

Last month, the RAID card in my file server died.  I tried to replace the card with a newer model, but found that not all PCI Express cards match well with all motherboards.  The motherboard was old enough that the new card simply wouldn’t work with it.  Being that the server components (other than the drives) were almost 10 years old, I decided it was time to rebuild the internal components.

I already had a solid base from the old file server.  The case is a Norco RPC-4020.  It’s a 4U enclosure with 20 drive bays.  The most I’ve ever used was 12 bays, but with the increasing size of modern drives, I am whittling it down to 8.  The drives I have are pretty modern, so this build doesn’t factor in any additional drive cost.  Other than the drives though, the rest of the server’s guts needed a good refresh.  Here’s what I put in there:

Motherboard:  Asus Z87-Pro
I went with this Asus because it had a good balance of performance and economy (and Asus’ reliability).  The board has 8 SATA ports, which is great for a file server when you are trying to stuff a bunch of disks in there.  I also liked how the board used heatsinks instead of fans for cooling.  Less moving parts to wear out.  Finally, this board has plenty of PCIe slots in case I want to add RAID/HBA cards for more drives, or a 10GBASE-T Ethernet card down the line.

CPU:  Intel Core i5-4570S
This is one of the low power models in the Haswell (4th generation) line.  TDP is a moderate 65 watts.  I was debating between this chip and the 35 watt Core i3-4330T.  If this server just served files, then I would have bought the Core i3, but I also use the box to host a moderately-sized database and do some server-side development.  The Core i5 chip is a quad core instead of a dual core, and I decided it would be worth it to step up.  You’ll notice that a GPU isn’t included in the list here, and that’s because I’m just using the embedded GPU.  One less component to worry about.

Memory:  2x4GB Crucial Ballistix Sport DDR3-1600
I’ve never been into over-clocking, so I just went with whatever memory ran at the CPU’s native 1600Mhz.  Crucial is always a safe bet when it comes to memory.  This particular memory has a relatively low CL9 latency.

Power Supply:  Antec EA-550 Platinum 550 watt
The power supply is a make-or-break part of a server, especially when you have a lot of disks.  I wanted something that was very efficient, while also supplying plenty of power.  This power supply is 93% efficient, meaning a lot more energy is making it to the computer components themselves instead of being wasted in the form of heat.  The one drawback of this power supply is that it’s a 4 rail unit and all the Molex/SATA power connectors are on a single rail.  So it’s not quite ideal for servers with a lot of disks (you need enough to cover the power spike as the disks spin up), but it handles 8 drives just fine with some room to grow.

Boot Drive:  USB 3 internal motherboard header and flash drive
I really wanted the OS to stay off the data drives this time around.  The best way I found to do that is to use the USB 3 header built in to most modern motherboards.  Typically this header is for cases that have USB 3 ports on the front, but my case only has a single USB 2 port on the front so this header was going unused.  I found a small Lian Li adapter to convert the 20 pin port on the motherboard to 2 internal USB 3 ports.  Then I picked up a 128GB PNY Turbo USB 3 flash drive on sale.  The motherboard has no problem booting off the USB drive, and while latency is higher, raw throughput of this particular flash drive is pretty good.

The Lian Li adapter is great because I don’t have to worry about the flash drive coming unplugged from the back of the case.  It’s inside the server, where it won’t be messed with.

Once I had all the components installed, I had to cable everything up.  You use about a million tie-wraps when cleaning up the cabling, but it looks nice in the end.  The cables are nowhere near as elegant as the cabling inside a Mac, but for a PC I think it turned out pretty good.  Here’s a shot of the inside of the server:

The power savings over the old server components were pretty dramatic.  The old system had a standard 550 watt power supply and was using an Athlon X2 CPU.  Typically, the load would hover between 180-240 watts.  This new server idles at 80 watts and will occasionally break 100 watts when it’s being stressed a little bit.  It’s great to get all this extra performance while using less than half the power.

Overall, it turned out being a great build.  Component cost was less than $600 (not including the case or drives), while still using quality parts.  Looking forward to this one lasting another 10 years.

On the New Mac Pro

Apple talked more about the new Mac Pro at it’s special event today, giving more details on when it will start shipping (December) and how much it will cost ($2999 for the base model). They also covered some additional hardware details that weren’t mentioned previously and I thought I would offer my 2 cents on the package.

Storage

There’s been a lot of complaints about the lack of expansion in the new Mac Pro, particularly when it comes to storage. With the current Mac Pro able to host up to 4 hard drives and 2 DVD drives, the single PCIe SSD slot in the new Mac Pro can be considered positively anemic. This has been the biggest issue in my eyes. Right now in my Mac Pro, I have an SSD for the OS and applications, a 3TB disk with my Home directory on it, and a 3TB disk for Time Machine. That kind of storage just won’t fit in a new Mac Pro, which only has a single PCIe SSD slot.

I believe Apple’s thought here is that big storage doesn’t necessarily belong internally on your Mac anymore. Your internal drives should be able to host the OS, applications, and recently used documents, and that’s about it. Any archival storage should be external, either on an external hard drive, on a file server, or in the cloud. Once you start thinking in this mindset, the lack of hard drive bays in the new Mac Pro start to make sense.

Personally, if I decide to buy one, I’ll probably start migrating my media to a file server I host here in a rack and see just how much space I need for other documents. I already moved my iTunes library a couple months back (300GB), and if I move my Aperture photo libraries over, that will reduce my local data footprint by another 700-800GB (depending on how many current photo projects I keep locally). That’s an easy terabyte of data that doesn’t need to be on my Mac, as long as it’s available over a quick network connection.

VMware virtual machines are a little tricky, because they can use a lot of small random accesses to the disk, and that can be really slow when done over a network connection with a relatively high latency. The virtual disks can grow to be quite large though (I have a CentOS virtual machine to run weather models that uses almost 200GB). I’ll have to do some testing to see how viable it would be to move these to the file server.

All this assumes that you want to go the network storage route. To me, this is an attractive option because a gigabit network is usually fast enough, and having all your noisy whirring hard drives in another room sounds… well… peaceful. If you really need a lot of fast local storage though, you’ll have to go the route of a Thunderbolt or USB 3 drive array. If you have big storage requirements right now, you most likely have one of these arrays already.

CPU/GPU Configurations

The new Mac Pro comes with a single socket Xeon CPU and dual socket AMD FirePro GPUs. This is reverse from the old Mac Pro, which had 2 CPU sockets and a single graphics card (in its standard configuration). The new Mac Pro certainly is being geared more toward video and scientific professionals that use the enhanced graphics power.

With 12 cores in a single Xeon, I don’t think the single socket CPU is a big issue. My current Mac Pro has 8 cores across 2 sockets, and other than when I’m compiling or doing video conversion, I have never come close to maxing all the cores out. Typical apps just aren’t there yet. You’re much better off having 4-6 faster cores than 8-12 slower cores. Fortunately, Apple gives you that option in the new Mac Pro. A lot of people have complained about paying for the extra GPU though. FirePro GPUs aren’t cheap, and a lot of people are wondering why there isn’t an option to just have a single GPU to save on cost.

I think the reason for this is the professional nature of the Mac Pro. The new design isn’t really user expandable when it comes to the graphics processors, so Apple decided to include as much GPU power as they thought would be reasonably desired by their pro customers. The new Mac Pro supports up to three 4K displays, or up to six Thunderbolt displays. A lot of professionals use dual displays, and it’s increasingly common to have three or more displays. With dual GPUs this isn’t a problem in the new Mac Pro, while if they just configured a single GPU the display limit would be comparable to the iMac. Personally, I have 2 graphics cards in my Mac Pro, and have used up to 3 displays. Currently I only use 2 displays though, so I could go either way on this issue. I do like the idea of having each display on it’s own GPU though, as that will just help everything feel snappier. This is especially true once 4K displays become standard on the desktop. That’s a lot of pixels to push, and the new Mac Pro is ready for it.

External Expansion

I’ve seen people comment on the lack of Firewire in the new Mac Pro. This, in my opinion, is a non-issue. Even Firewire 800 is starting to feel slow when compared to modern USB 3 or Thunderbolt storage. If you have a bunch of Firewire disks, then just buy a $30 dongle to plug into one of the Thunderbolt ports. Otherwise you should be upgrading to Thunderbolt or USB 3 drives. USB 3 enclosures are inexpensive and widely available.

Outside that, the ports are very similar to the old Mac Pro. One port I would have liked to see in the new Mac Pro was 10G ethernet. The cost per port of 10G is coming down rapidly, and with moving storage out onto the network, it would have been nice to have the extra bandwidth 10G ethernet offers. Apple introduced gigabit ethernet on Macs well before it was a common feature on desktop computers as a whole. Perhaps there will be a Thunderbolt solution to this feature gap sometime down the road.

Power Consumption and Noise

This alone is a good reason to upgrade from a current Mac Pro. The new Mac Pro will only use around 45W of power at idle, which isn’t much more than a Mac Mini and is about half of the idle power consumption of the latest iMacs (granted, the LCD in the iMac uses a lot of that). My 2009 Mac Pro uses about 200W of power at idle. Assuming you keep your Mac Pro on all the time, and are billed a conservative $0.08 per kilowatt hour, you can save about $100/year just by upgrading. That takes some of the sting out of the initial upgrade cost for sure.

Using less energy means needing less cooling. The new Mac Pro only has a single fan in it, and it’s reportedly very quiet. Typically the unit only makes about 12dB of noise, compared to around 25dB in the current Mac Pro. With perceived volume doubling for every 3dB increase, the new Mac Pro is about 16 times quieter than the old one. Surely the lack of a spinning HD helps here as well.

Overall

Overall the new Mac Pro is a slick new package, but you already knew that. It isn’t for everybody, but it fits the needs of the professional customer pretty well moving forward. Personally, I haven’t decided if I will buy one yet. My Mac Pro is almost 5 years old at this point, and while it still does a good job as a development machine, I’m starting to feel its age. However, I haven’t decided whether I will replace it with a new Mac Pro, the latest iMac, or even a Retina MacBook Pro in a form of docked configuration. There are benefits and drawbacks to each configuration, so I’m going to wait until I can get my hands on each machine and take them for a spin.

Packing in the inodes

The new forecast server I’m working on for Seasonality users is using the filesystem heirarchy as a form of database instead of PostgreSQL.  This will slow down the forecast generation code a bit, because I’m writing a ton of small files instead of letting Postgres optimize disk I/O.  However, reading from the database will be lightning fast, because filesystems are very efficient at traversing directory structures.

The problem I ran into was that I was quickly hitting the maximum number of files on the filesystem.  The database I’m working on creates millions of files to store its data in, and I was quickly running out of inodes.

Earlier today I installed a fresh copy of Ubuntu on a virtual machine where the final forecast server will reside.  Of course I forgot to increase the number of inodes before installing the OS on the new partition.  Unfortunately, there is no way to add more inodes to a Linux ext4 filesystem without reformatting the volume.  Luckily I caught the problem pretty early and didn’t get too far into the system setup.

To fix the issue, I booted off the Ubuntu install ISO again and chose the repair boot option.  Then I had it start a console without selecting a root partition (if you select a root partition, it will mount the partition and when I tried to unmount it, the partition was in use).  This let me format the partition with an increased number of inodes using the -N flag in mkfs:

mkfs.ext4 -N 100000000 /dev/sda1

That ought to be enough. 🙂  After that, I was able to install Ubuntu on the new partition (just making sure not to select to format that same partition again, wiping out your super-inode format).

The forecast server is coming along quite well.  I’m hoping to post more about how it all works in the near future.

Office Network Updates

Over the past several weeks, I’ve been spending a lot of time working on server-side changes. There are two main server tasks that I’ve been focusing on. The first task is a new weather forecast server for Seasonality users. I’ll talk more about this in a later post. The second task is a general rehash of computing resources on the office network.

Last year I bought a new server to replace the 5 year old weather server I was using at the time. This server is being coloed at a local ISPs datacenter. I ended up with a Dell R710 with a Xeon E5630 quad-core CPU and 12GB of RAM. I have 2 mirrored RAID volumes on the server. The fast storage is handled by 2 300GB 15000 RPM drives. I also have a slower mirrored RAID using 2 500GB 7200 RPM SAS drives that’s used mostly to store archived weather data. The whole system is running VMware ESXi with 5-6 virtual machines, and has been working great so far.

Adding this new server meant that it was time to bring the old one back to the office. For its time, the old server was a good box, but I was starting to experience reliability issues with it in a production environment (which is why I replaced it to begin with). The thing is, the hardware is still pretty decent (dual core Athlon, 4GB of RAM, 4x 750GB disks), so I decided I would use it as a development server. I mounted it in the office rack and started using it almost immediately.

A development box really doesn’t need a 4 disk RAID though. I currently have a Linux file server in a chassis with 20 drive bays. I can always use more space on the file server, so it made sense to consolidate the storage there. I moved the 4 750GB disks over to the file server (setup as a RAID 5) and installed just a single disk in the development box. This brings the total redundant file server storage up past 4 TB.

The next change was with the network infrastructure itself. I have 2 Netgear 8 port gigabit switches to shuffle traffic around the local network. Well, one of them died a few days ago so I had to replace it. I considered just buying another 8 port switch to replace the dead one, but with a constant struggle to find open ports and the desire to tidy my network a bit, I decided to replace both switches with a single 24 port Netgear Smart Switch. The new switch, which is still on its way, will let me setup VLANs to make my network management easier. The new switch also allows for port trunking, which I am anxious to try. Both my Mac Pro and the Linux file server have dual gigabit ethernet ports. It would be great to trunk the two ports on each box for 2 gigabits of bandwidth between those two hosts.

The last recent network change was the addition of a new wireless access point. I’ve been using a Linksys 802.11g wireless router for the last several years. In recent months, it has started to drop wireless connections randomly every couple of hours. This got to be pretty irritating on devices like laptops and the iPad where a wired network option really wasn’t available. I finally decided to break down and buy a new wireless router. There are a lot of choices in this market, but I decided to take the easy route and just get an Apple Airport Extreme. I was tempted to try an ASUS model with DD-WRT or Tomato Firmware, but in the end I decided I just didn’t have the time to mess with it. So far, I’ve been pretty happy with the Airport Extreme’s 802.11n performance over the slower 802.11g.

Looking forward to finalizing the changes above. I’ll post some photos of the rack once it’s completed.

Building a SoHo File Server: The Hardware

During the past few weeks, I have been working on a solution to help consolidate the storage I’m using for both personal and business files. Weather data is pretty massive, so any time I do some serious weather server work for future Seasonality functionality, I’m almost always using a ton of disk space to do it. On the home side, media files are the dominant storage sucker. With switching to a DSLR camera and shooting RAW, I’ve accumulated over 100GB of photos in the past 6 months alone. I’ve also ripped all my music into iTunes, and have a Mac with an EyeTV Hybrid recording TV shows to watch later. All of this data adds up.

Before starting this project, I had extra hard drives dispersed across multiple computers. My Mac Pro (used as a primary development box) had weather data, a Linux file server held media and some backup, the older Mac has the media files, and my laptop has just the essentials (but as much as I can possibly fit on it’s 320GB disk).

Requirements

The requirements for this my new storage solution are as follows…

  1. Provide a single point of storage for all large files and backups of local computers on the network.
  2. Support more than 10 disks, preferably closer to 20. I have 7 disks ready for this box right now, another 4 coming in the next 6 months, and want to have more room after that to grow.
  3. Offer fast access to the storage to all machines on the network. I have a gigabit network here, and I would like to see close to 100MB/sec bandwidth to the data.
  4. Rackmounted. I recently setup a rack for my network equipment in another room of the house. A file server should go in that rack, and have an added benefit of keeping me from having to listen to a bunch of disks and fans spinning while I work.
  5. Keep a reasonable budget. This setup shouldn’t break the bank. Fiber Channel SANs costing $5k+ need not apply.

There are several ways to attack this problem. Here are just a few of the options that could potentially solve this storage problem.

Option 1: Buy a Drobo

The easiest solution to this problem is to just go out and buy a Drobo. Drobos will let you just pop in some hard drives and it takes care of the rest (redundancy, etc). Unfortunately, being the easiest option comes with some drawbacks. First is cost… A 4 bay Drobo goes for around $400, and the more formidable 8 bay model starts at $1500. With 10+ drives I would need to spend $1200 at a minimum, and that is more than I want to budget. The second disadvantage is speed. I’ve heard from many people who have the 4 bay model that copying files to/from the device take a long time (maybe only around 20MB/sec bandwidth). If I’m paying a premium, I want fast access to the storage.

Option 2: Buy an external multi-bay eSATA enclosure

This is an appealing option, especially if you want all the storage to just be available on a single machine. It’s directly attached, so it’s fast. The enclosures can be relatively inexpensive. The main problem with this option for me was that buying an enclosure with space for 10+ disks was more costly, and having that many disks spinning at my desk would be pretty loud and distracting. Furthermore, I would like to have a storage system that is all one piece, instead of having a separate computer and storage box.

Option 3: Buy a NAS

Cheap NAS boxes are a dime-a-dozen these days. I actually already went down this route a couple of years ago. I bought a 1TB NAS made by MicroNet. The biggest drawback was it was too slow. Transfer rates were usually only around 10MB/sec, which got to be a drag. The better NAS boxes these days offer iSCSI targets, giving you some speed benefits as well as the advantage of your other client computers seeing the disk as DAS (direct attached storage). Again though, check out some of the costs on a rackmount NAS supporting 10+ disks…they can get to be pretty expensive. This time I’m going to try another route.

Option 4: Build out a more advanced Linux file server.

This is the option I chose to go with. With my current Linux file server, I can get around 75MB/sec to a 3 disk RAID 5 using Samba. Rackmount enclosures supporting several disks are fairly inexpensive. All the storage is in one place and, if you know something about Linux, it’s pretty easy to manage.

The Chassis

I’m using my current Linux file server as a base, because it’s already setup to fit my needs. I needed to find a new enclosure for the Linux server though, because my current case will only hold 4 hard drives. I recently started to rack up my network equipment, so I began looking for a rackmount enclosure (3-4U) that would hold a bunch of disks. I ended up finding the Norco RPC-4020. It’s a 4U chassis with 20 hot-swap disk trays. The disk trays all connect to a backplane, and there are two different versions of the case depending on what kind of backplane you would like to have. The first (RPC-4020) has direct SATA ports on the backplane (20 of them, one for each disk). The second (RPC-4220) has 5 mini-SAS ports on the backplane (one Mini-SAS for 4 disks), which makes cable management a little easier. I went with the cheaper (non-SAS) model in an effort to minimize my file server cost.

The Controllers

After finding this case, the next question I had to answer was what kind of hardware I would need on the host side of these 20 SATA cables. This ended up being a very difficult question to answer, because there are so many different controllers and options available. My motherboard only supports 4 hard drives, so I need controllers for 16 more disks. Disk controllers can get to be pretty expensive, especially when you start adding lots of ports to them. A 4 port SATA PCI-Express controller will run you about $100, and jumping up to 8 ports will put you in the $200-300 range for the cheapest cards. When buying a motherboard, try to find a model with decent on-board graphics. That way, you don’t have to waste a perfectly good 16 lane PCI-Express slot on a graphics card (you’ll need it for a disk controller later).

This is also the point where you will need to decide on hardware or software RAID. If you are going for hardware RAID, there’s just no other way around it, you’ll have to spend a boatload of money for a RAID card with a bunch of ports on it. I’ve been using software RAID (levels 0, 1, 5, and 10) on both FreeBSD and Linux here for almost 10 years, and it’s almost always worked beautifully. Every once in awhile I have run into a few hiccups, but I’ve never lost any data from it. Software RAID also has the benefit of allowing you to stick with several smaller disk controllers and combining the disks into the RAID only once you get to the OS level. So with that in mind, I choose to stick with software RAID.

Depending on the type of computer you are converting, you might be better off buying a new motherboard with more SATA ports on it. Motherboards with 8 and even 12 SATA ports are readily available, and often are much less expensive than an equivalent RAID card. With more SATA ports on the motherboard, you have more open PCI(-Express/X) slots available for disk controllers, and more capacity overall.

Digression: SAS vs. SATA

There are many benefits to using SAS controllers over SATA controllers. I won’t give a comprehensive comparison between the two interfaces, but I will mention a couple of more important points.

1. SAS controllers work with SATA drives, but not the other way around.
So if you get a SAS controller, then you can use any combination of SAS and/or SATA drives. On the other hand, if you just have a SATA controller, you can only use SATA drives and not SAS drives.

2. SAS is much easier to manage when cabling up several disks.
Mini-SAS ports (SFF-8087) carry signals for up to 4 directly attached disks. Commonly, people will buy Mini-SAS to 4xSATA converter cables (watch which ones you buy, “forward” cables go from Mini-SAS at the controller to SATA disks, “reverse” cables go from SATA controllers/motherboards to Mini-SAS disk backplanes). These cables provide a clean way to split a single Mini-SAS port out to 4 drives. Even better if you get a case (like the 4220 above) that have a backplane with Mini-SAS ports already. Then for 20 disks, you just have 5 cables going from the RAID controller to the backplane.

PCI Cards

The least expensive option to adding a few disks to your system is buying a standard PCI controller. These run about $50 for a decent one that supports 4 SATA disks. The major drawback here is the speed, especially when you start adding multiple PCI disk controllers to the same bus. With a max bus bandwidth of just 133MB/sec, a few disks will quickly saturate the bus leaving you with a pretty substantial bottleneck. Still, it’s the cheapest way to go, so if you aren’t looking for top-notch performance, it’s a consideration.

1 Lane PCI-Express Cards

PCI-Express 1x cards are a mid-range price consideration. The models that support only 2 disks start pretty inexpensive, and you’ll see the price of these go all the way up to the couple hundred dollar range. Most of the time, these will not support more than 4 disks per card because of bandwidth limitations on a 1 lane PCI-Express bus. A 1x slot has about double the bandwidth of an entire PCI bus (250MB/sec). The other advantage here is when adding more than one card, each card will have that amount of bandwidth, instead of sharing the bandwidth with multiple cards in the PCI slots.

4 Lane PCI-Express Cards

By far the most popular type of card for 4 or more disks, PCI-Express 4x cards have plenty of bandwidth (1000MB/sec) and are fairly reasonably priced. They start at around $100 for a 4 disk controller, and go on up to the $500 range. What you have to watch here is to check how many 4x slots you have on your motherboard. It doesn’t do you any good to have several 4x cards without any slots to put them in. Fortunately, most newer motherboards are coming with multiple 4x or even 8x PCI-Express slots, so if you are buying new hardware you shouldn’t have a problem.

For my file server, I ended up with a RocketRAID 2680 card. This PCI-Express 4x card has 2 Mini-SAS connectors on it, for support of up to 8 disks (without an expander, more on that later). Newegg had an amazing sale on this card, and I was able to pick it up for half price. A nice bonus is its compatibility with Macs, so if I ever change my mind I can always move the card to my Mac Pro and use it there.

Using Expanders

Expanders provide an inexpensive way to connect a large number of disks to a less expensive controller. Assuming your controller provides SAS connections (it’s better if it has a Mini-SAS port on the card), you can get a SAS expander to connect several more disks than the controller card can support out of the box. When considering a SAS expander, you should check to make sure your RAID controller will support it. Most SAS controllers support up to 128 disks using expanders, but not all do.

A typical expander might look something like the Chenbro CK12804 (available for $250-300). Even though this looks like a PCI card, it’s not. The expander is made in this form factor to make it easy for you to mount in any PCI or PCI-Express slot that is available on your computer. There are no leads to make a connection between this card and your motherboard. Because of this, the expander draws power from an extra Molex connector (hopefully you have a spare from your power supply). You simply plug a Mini-SAS cable from your controller to the expander, and then plug several Mini-SAS cables from the expander to your disks. With this particular expander, you can plug 24 hard drives into a controller that originally only supported 4. A very nice way to add more capacity without purchasing expensive controllers.

The drawback is that you are running 24 drives with the same amount of bandwidth as 4. So you are splitting 1200MB/sec among 24 disks. 50MB/sec for a single disk doesn’t seem too unreasonable, but if you are trying to squeeze as much performance out of your system as possible, this might not be the best route.

Power Supplies

When working with this many disks, the power supply you use really comes into play. Each 7200 rpm hard drive uses around 10-15 watts of power, so with 20 drives you are looking at between 200-300 watts of power just for the disks. Throw in the extra controller cards, and that adds up to a hefty power requirement. So make sure your power supply can keep up by getting one that can output at least 500 watts (600-750 watts would be better).

Results

So putting all this together, what did I end up with? Well, I upgraded my motherboard by buying an old one off a friend (they don’t sell new Socket 939 motherboards anymore), this one has 8 SATA ports on it. Then I bought a RocketRAID 2680 for another 8 disks. What about the last 4? Well, for now I’m not going to worry about that. If I need more than 16 disks in this computer, I’ll most likely get another 4 disk (1x PCI-Express) controller and use that for the remaining drive trays. What did it cost me? Well, the chassis, “new” motherboard, RAID card, and some Mini-SAS to SATA cables came in just over $500. Components that I’m using from another computer include the power supply (550 watt), processor, memory, and of course the disks I already have. Pretty reasonable considering the storage capacity (up to 32TB with current 2TB drives and without using the last 4 drive bays).

Next will come the software setup for this server, which I’ll save for another blog post.

New Disk

Having an application like Seasonality that relies upon online services requires those services to be reliable. This means any server I host has to be online as close to 100% of the time as possible. Website and email services are pretty easy to host out to a shared hosting provider for around $10-20/month. It’s inexpensive, and you can leave the server management to the hosting provider. For most software companies, this is as far as you need to go.

This also worked okay when Seasonality was simply grabbing some general data from various sources. As soon as I began supporting international locations, I stepped out of the bounds of shared hosting. The international forecasts need to be hosted on a pretty heavy-duty server. It pegs a CPU for about an hour to generate the forecasts, and the server updates the forecasts twice a day. Furthermore, the dataset is pretty large, so a fast disk subsystem is needed.

So I have a colocated server, which I’ve talked about before. It’s worked out pretty well until earlier this week when one of the 4 disks in the RAID died. Usually, when a disk in a RAID dies, the system should remain online and continue working (as long as you aren’t using RAID 0). In this situation, the server crashed though, and I was a bit puzzled as to why this occurred.

After doing some research, I found that the server most likely crashed because of an additional partition on the failed disk—a swap partition. When setting up the server, I configured swap across all four disks, with the hope that if I ever did go into swap a little bit it would be much faster than just killing a single disk with activity. The logic seemed good at the time, but looking back that was a really bad move. In the future, I’ll stick to having swap on just a single disk (probably the same one as the / partition) to reduce the chances of a system crash by 75%.

After getting a new disk overnighted from Newegg, I replaced the failed mechanism and added it back into the RAID, so the system is back up and running again.

This brings up the question of how likely something like this will happen in the future. The server is about 2 and a half years old, so disk failures happening at this age is reasonable, especially considering the substantial load on the disks on this server (blinky lights, all day long). At this point, I’m thinking of just replacing the other 3 disks. That way, I will have scheduled downtime instead of unexpected downtime. With the constantly dropping cost of storage, I’ll be able to replace the 300Gb disks with 750Gb models. It’s not that I actually need the extra space (the current 300s are only about half full), but I need at least 4 mechanisms to get acceptable database performance.

In the future, I will probably look toward getting hot-swappable storage. I’ve had to replace 2 disks now since I built the server, and to have the option of just sliding one disk out and replacing it with a new drive without taking the server offline is very appealing.

MicroNet G-Force MegaDisk NAS Review

If you have been following my Twitter feed, you know that I just ordered a 1TB NAS last week for the office network here. I wanted some no-fuss storage sitting on the network so I could backup my data and store some archive information there instead of burning everything to DVD. (In reality, I’ll still probably burn archive data to DVD just to have a backup.)

Earlier this month, MicroNet released the G-Force MegaDisk NAS (MDN1000). The features were good and the price was right so I bought one. It finally arrived today and I’ve been spending some time getting to know the system and performing some benchmarks.

When opening the box, the first thing that surprised me was the size of the device. It’s really not much bigger than 2 3.5″ hard drives stacked on top of each other. The case is pretty sturdy, made out of aluminum, but the stand is a joke. Basically, two metal pieces came with rubber pads on them. You’re supposed to put a metal piece on each side to support the case. It’s not very sturdy, and a pain to setup like this, so I doubt I’ll use them.

I had a few problems reaching the device on my network when I plugged it in. I had to cycle the power a couple of times before I was finally able to pick it up on the network and login to the web interface. I’m guessing future firmware updates will make the setup process easier. It’s running Linux, which is nice. The firmware version is 2.6.1, so I’m guessing that means the kernel is version 2.6 (nmap identifies it as kernel 2.6.11 – 2.6.15). Hopefully it’s only a matter of time before someone’s hacked it with ssh access. MicroNet’s website claims there is an embedded dual-core processor on board, which again sounds pretty cool. The OS requires just under 61MB of space on one of the hard drives. There are two 500GB drives in this unit. Both are Hitachi (HDT725050VLA360) models, which are SATA2 drives that run at 7200 RPM with 16MB of cache. From the web interface, it looks like the disks are mounted at /dev/hdc and /dev/hdd.

Disk management is pretty straightforward. You can select a format for each disk (ext2, ext3, fat32), and there is an option to encrypt the content on the disk. The drives are monitored via the SMART interface, and you can view the reports in detail via the web. By default, the drives come in a striped RAID format, but I was able to remove the RAID and access each disk separately (contrary to the documentation’s claims). Unfortunately, for some reason I was unable to access the second disk over NFS. It looks like you might be able to mess with the web configuration page to get around this limitation though.

Moving on to the RAID configuration, you can choose between RAID 0, RAID 1, and Linear (JBOD). Ext2 and ext3 are your filesystem options. Building a RAID 1 took a very long time (~ 4 hours), which I’m guessing is because the disks require a full sync of all 500GB of data when initializing such a partition.

So let’s bust out the benchmarks! I benchmarked by performing 2 different copies. One copy was a single 400.7MB file (LARGE FILE), and the other was a directory with 4,222 files totally 68.7MB (SMALL FILES). All tests were performed over a gigabit Ethernet network from my 2.5Ghz G5 desktop machine. Transfers were done via the Terminal with the time command, to remove any human-error from the equation.

A note about testing Samba with SMALL FILES: I started running a write test and let it go for around 8 minutes. At that point, it was still only done copying around a quarter of the files, and the transfer rate averaged less than 20KB/sec. This was absurdly slow, so I didn’t bother waiting for the full test to go through. It’s difficult to say if this is a limitation of the NAS, Samba, Mac OS X or all of the above.

Striped RAID (Standard) NFS Samba
Write LARGE FILE 1:13 (5,544 KB/sec) 0:42 (9,542 KB/sec)
Read LARGE FILE 0:42 (9,769 KB/sec) 0:35 (11,723 KB/sec)
Write SMALL FILES 3:46 (310 KB/sec) DNF
Read SMALL FILES 0:39 (1,759 KB/sec) DNF
Mirrored RAID NFS Samba
Write LARGE FILE 1:17 (5,328 KB/sec) 0:47 (8,730 KB/sec)
Read LARGE FILE 0:40 (10,257 KB/sec) 0:41 (10,007 KB/sec)
Write SMALL FILES 3:44 (314 KB/sec) DNF
Read SMALL FILES 0:43 (1,636 KB/sec) DNF
Separate Disks NFS Samba
Write LARGE FILE 1:13 (5,620 KB/sec) 0:43 (9,542 KB/sec)
Read LARGE FILE 0:46 (8,919 KB/sec) 0:35 (11,723 KB/sec)
Write SMALL FILES 3:11 (368 KB/sec) DNF
Read SMALL FILES 0:42 (1,675 KB/sec) DNF

All of these were using standard mounting, either through the Finder’s browse window, or mount -t nfs with no options on the console. I decided to try tweaking the NFS parameters to see if I could squeeze any more speed out of it. The following results are all using a striped RAID configuration…

no options wsize=16384
rsize=16384
wsize=16384
rsize=16384
noatime
intr
Write LARGE FILE 1:13
(5,544 KB/sec)
1:00
(6,838 KB/sec)
0:59
(6,954 KB/sec)
Read LARGE FILE 0:42
(9,769 KB/sec)
0:32
(12,822 KB/sec)
0:32
(12,822 KB/sec)
Write SMALL FILES 3:46
(311 KB/sec)
3:47
(310 KB/sec)
3:09
(372 KB/sec)
Read SMALL FILES 0:39
(1,759 KB/sec)
0:42
(1,675 KB/sec)
0:40
(1,758 KB/sec)

In summary, while this NAS isn’t necessarily the fastest out there, it’s certainly fast enough, especially after some tweaking. A RAID configuration doesn’t necessarily improve performance on this device. All of the transfer rates were about the same, regardless of format. You’ll notice slightly slower speeds for a RAID 1, but the difference is minimal. Before tweaking, Samba had a clear lead in transfer rates on large files, but it was completely unusable with smaller files. After modifying the NFS mount parameters, it seems to give the best of both worlds.

Update: I researched the Samba performance (or lack thereof) and found that it is not the fault of the NAS. Using a Windows XP box, writing small files went at a reasonable pace (around the same as using NFS above). Then, testing from my MacBook Pro with an OS that shall not be named, performance was similar to the Windows XP machine. I’m going to attribute this to a bug in the Samba code between version 3.0.10 on the G5 and 3.0.25 on the MacBook Pro.

© 2017 *Coder Blog

Theme by Anders NorenUp ↑