Life, Technology, and Meteorology

Month: August 2006

Super Typhoon Ioke

Check out this incredible satellite image taken from Seasonality of Super Typhoon Ioke. Ioke is the strongest typhoon in recorded history to form in the central Pacific Ocean, with sustained winds of 160 mph and gusts up to 185 mph. That yellow dot in the middle of the storm is Wake Island, a territory of the U.S. The storm surge was supposedly going to completely engulf the island, so all 200 people who live there were evacuated and flown out to Hawaii.

It’s rare to see such a large storm with a organized center “eye” like this. Click on the image to get a larger view that shows a reference of where this is taking place. If you are wondering if Hawaii is at risk, don’t, because the storm is heading in the opposite direction. It is projected to weaken to the equivalent of a category 3 hurricane by next Tuesday, still in the middle of the Pacific.

Rio Upgrade

With the additional resource requirement I wrote about a couple of weeks ago, I ended up deciding it was time to upgrade the server here (Rio) with some additional CPU hardware. When building Rio late last year, I wanted to make sure the hardware was fairly upgradable. The easy choice at the time was to go with Athlon 64 processors, since I could start with a pretty basic 2Ghz single core chip, and have the option to upgrade to a dual-core CPU later on. Well, that time is now, and today the new processor arrived. I ended up purchasing an Athlon 64 X2 4600+ processor, which boils down to a 2.4Ghz dual core CPU. Fortunately, with AMD’s price drop just a couple of months ago, I didn’t pay much more for this processor than I did for the original.

One thing I was a bit surprised with was the difference between the new and old heat-sinks. I wasn’t expecting much of a difference between retail CPUs in the same processor line-up, but the new one is of a much higher quality. Here’s a picture…

So now I have to do some real-world benchmarking to find out just how much faster this CPU will go. I suspect the database importing times will improve dramatically, and the server will be much more usable while the update is taking place with the additional core. I’ll probably post again here with some benchmarks when I’ve had a chance to try things out.

With most of the forecast back-end work complete, I’m hoping to release a public beta sometime later this week or maybe next week. I just need to finish tweaking performance for the new CPU and smooth out some database replication issues. It sure will be a relief to have this new forecast system online.

WWDC Keynote Thoughts

Between all the great sessions here at WWDC yesterday and Buzz’s excellent blogger party last night, I’ve had just about 0 time to blog about anything that has been announced here. The typical news sites have been posting all the details on Mac OS X Leopard that Steve talked about yesterday, but I thought I would add a couple of my own comments on Leopard.

First, though I’m under NDA for a lot of the content here, I’ll just say that Leopard adds a lot of nice features for developers. I would not be surprised to see a lot of applications next year requiring Leopard. I’m sure some Tiger/Panther users will feel a bit left out, but the development time can be collapsed greatly, and these apps will be a lot more polished.

64 bit is a big buzzword around here. It is a big deal…even with 64 bit POSIX available at the UNIX layer in Tiger. That was nice, but it meant that only command-line applications that used straight POSIX libraries would have the ability to run 64 bit. As was mentioned in the keynote, Apple has extended 64 bit support all the way up to the Cocoa and Carbon layers…completing the transition to 64 bit for Mac users. I think this will allow some very high-end scientific applications to provide absolutely beautiful visualization displays without having to write a bunch of extra code to handle 64 bit data processing in a different process on the back-end. I haven’t tried building Seasonality for 64 bit yet, but I suspect that it will provide a slight speed improvement on 64 bit machines because the satellite image is highly accelerated in hardware using the Accelerate framework. 64 bit processors may be able to generate a new satellite image up to twice as fast. I’ll update my blog with performance results on this sometime in the future. changes seem to be aplenty. I haven’t loaded the Leopard preview on my MacBook Pro yet to see just how much has been improved, but already I’m impressed. The templates look to be a good idea, but I can’t see myself using them too often. I’m sure there will be a subset of Mac users that will get a kick out of that though. The notes feature strikes me as a big chunk of bloatware tacked on to Mail. If you need to take notes, there should be another place to do it outside of your inbox. Sure, people spend a lot of time in Mail, and I’m sure a lot of people take notes while reading/responding to email, but that doesn’t mean that notes should be an integrated feature. It seems that a much better solution to write a new system-wide notes application that would let you bring up an interface with a hotkey, type something in, and dismiss it.

Apple still hasn’t updated the Finder. I really hope this is one of the “top secret” features they aren’t releasing until the end. The Finder is something Mac users spend a lot of time using, and the amount of legacy code still in there is pretty staggering. At the very least, the Finder needs to use more threading, but really they should start from scratch and try to implement something that is more efficient. They should also revisit usability. When using the Finder with a modern system with several hundred thousand files, it takes awhile to navigate to where you want to be (Note: this applies to all the current file-system-exploring applications I’ve used on any platform). Spotlight improves this situation somewhat, but it is still a pretty big problem and will only get worse as hard drive capacities skyrocket as they have been in recent years.

Despite these drawbacks, Leopard as a whole is a big improvement. Time Machine and Spaces are greatly welcomed, Core Animation will be a huge win for the usability of Leopard applications and the iChat improvements seem pretty solid.

Weather Computing Resources

One thing I wasn’t expecting when entering the world of weather software with Seasonality was the sheer amount of computing resources that weather data and processing requires. You hear about the supercomputers that different government organizations purchase for weather forecasting and research, but I never really gave a second thought to it. I just assumed that those computers were running high-end computations in a completely different league than the typical online weather service. This is true to some extent, but I severely underestimated how many resources an online weather service can use itself.

Take for example the fairly simple idea of recording temperatures, wind speeds, pressure, and other conditions for locations worldwide. One commonly used system in place for this is a network of 4,000-5,000 different ICAO stations around the globe. This network of weather stations is the same used to generate weather graphs in Seasonality. Most ICAO stations are located at airports or military bases, but some of them are at other public facilities. Each of these stations will report their conditions, on average, maybe once an hour. Now that’s a lot of data for one day, but it seems pretty manageable. However, what if the idea comes up to store this information for a long period of time, say to allow Seasonality users to download several months of data to populate their graphs when using Seasonality for the first time. This raises the required resources to a whole new level. I’ve been collecting data from these weather stations for the past 9 months or so. Every month, about 4.5 million new data records are added to the PostgreSQL database I set up to keep track of this data. By now, the database is several gigabytes. The problem comes when trying to access this data at a later point in time. Even singling out a single ICAO weather station for one month of data can take 30 seconds to query. And this is on a 2Ghz Athlon 64 system with 3GB of RAM and a RAID storage system. What happens when thousands of users are hitting this database at the same time? Literally nothing, the database would be denying more requests than it could fulfill. I’ll continue to work on this functionality, and hope to find a good way to manage this load in the future.

A more recent example presented itself a little over a week ago when Environment Canada ceased to provide forecasts for international weather locations. Seasonality depended on that data to display forecasts outside the U.S., so right now a big hit is taken for users depending on Seasonality to provide that data. So I attempted to find a new data source for Seasonality to use. Prepared worldwide forecasts are hard to come by, but I have found a suitable replacement for this data in a more raw format. The source I found provides GRIB files containing the gridded output from the GFS (Global Forecast System) model. This is really good data, providing more than 50 different variables (temperature, wind speed, wind direction, pressure, cloud cover, among many others) across several different layers of the atmosphere (from the surface to 1 millibar). The grid resolution is pretty good as well, down to 0.5 latitude x 0.5 longitude blocks.

With all this data I can generate a forecast for any location around the globe with reasonable accuracy, but it comes at a cost. The data is plentiful, and it takes plenty of space. The model produces a forecast every 3 hours out to 180 hours (7.5 days) into the future. That’s 60 data sets total. Each data set is around 26MB to download, 1.5GB for a complete forecast. This is just a massive amount of data, especially when a new forecast is generated 4 times a day.

Fortunately, there is a way to pick and choose which variables and which atmospheric levels you would like to download. At the moment, I’ve narrowed down the data set I would require to about 200MB per forecast. Great, that’s doable…I adapted some Perl code I found with a free license on the internet to fit my needs, and I have a cron job going to download the data I want. Depending on the server speed, the download takes an hour or two.

Next, I need to convert the data from the GRIB binary format into something I can throw into the PostgreSQL database. The database should have a row for each block in the longitude/latitude grid, for each 3-hour time period the forecast is generated. With 0.5 longitude/latitude resolution, that’s just under 260k rows in the database for every 3 hour data set, or 15.5 million rows for the entire forecast. My Athlon server takes about 2 hours to parse all the GRIB files and throw them into the database, and this is after extensive optimization on my part.

The data doesn’t do any good if I can’t get to it easily. I need to index the data to help speed up querying. It makes sense to set up indexes based on the longitude and latitude, since this is how I will be querying the data. The indexes I’ve got so far take about 30 minutes to an hour to generate.

When all is said and done, it takes around 4-5 hours to get the forecast back-end ready for Seasonality to query. The data source I’m hitting provides new forecasts every 6 hours, so it really doesn’t make sense for me to refresh that often when it takes so much time to process the data. Maybe updating once or twice a day will be reasonable in the end.

So what about query speed when Seasonality users want to grab a forecast from the server? I think it’s fair for a query to take about 5 seconds to return data…and right now that looks to be doable with my current hardware. With adequate caching, I can probably drop the response time for frequently-used locations even lower. Then there’s redundancy to think about. I want to make sure if something wonky happens with my server or network connection, that there will be a backup somewhere else. I can’t put the CPU load required for parsing the data onto my hosting provider’s servers, but I can use their bandwidth after the data is in a format that can be queried. I’m hoping to replicate the data after it’s been processed and throw it on the hosting server.

All of this for the seemingly simple feature of a forecast. There’s still a lot of work I need to do before this new forecast will be ready, but in the end I think it will be worth it. Seasonality will no longer have a dependency on another weather outfit for forecast data. The GRIB data format is a standard, so it’s easy to add redundant sources for this data. I will also be able to update the code to provide more accurate forecasts without having to release a Seasonality update, because that is all server-side. Initial forecasts will be fairly simple, but I expect that over time I’ll be able to improve forecast detail and accuracy by making use of more data variables.

Some Excellent Blog Postings

There has been a flood of excellent blog postings these past couple of days. Most are related to WWDC, but I threw in a few others as well. Here are the ones that caught my eye…

Brent’s WWDC talk: Brent has posted several entries on his blog, requesting tabbed interface updates (I second this), the open sourcing of Cocoa, UI requests for Leopard, and other random WWDC speculation.

Gus’ WWDC Predictions: Gus predicts resolution independence, VMware for Mac goodness, and other changes in Leopard.

Happy WWDC Guesses: Luis de la Rosa over at HappyApps has a few guesses up his sleeve as well.

Mike McCracken Cards: Mike McCracken won’t be able to make it to WWDC this year, but he’ll be there in spirit thanks to these hilarious cards he put together and posted on his blog today.

My imaginary friend hates your imaginary friend: Logtar talks about religion with respect to the latest political events unfolding in the Middle East. Very insightful.

The Price Is Wrong: Daniel Jalkut at Red Sweather Software talks about pricing shareware applications. Very good read.

Why Apple's virtualization technology in Leopard might not be what you expect…

Since the release of Boot Camp back in February, the rumors have been flying on the future of running Windows on a Mac. When Boot Camp was released, I couldn’t wait to try it, and I wasn’t disappointed with the results. There were some slight drawbacks to installing Boot Camp on my MacBook Pro, but they were mostly hardware related (not having a second mouse button, or a lack of a delete key to control-alt-delete). Fortunately there are software fixes for most of these and I’m now able to use Windows on my MacBook Pro without connecting any additional hardware.

A lot of people believe that Apple will take the next step with virtualization in Leopard, and I tend to agree, but I don’t think the next step Apple takes will be the ability to run Windows apps under Mac OS X natively, or even a Parallels type of virtualization where Windows will be run in a Window. Let’s discussion the options, shall we?

Option #1: Boot Camp is included in Leopard verbatim, users can boot into either Windows or Mac OS X.

This is the easiest path for Apple to follow, because most of the work is already done. Boot Camp works, and it works well. All that would be needed is continued driver support in Windows and some tighter integration between Boot Camp and Mac OS X. I don’t expect Apple to take this path however. Apple isn’t about doing things the easy way. The Apple mentality is to do stuff the right way, and I don’t think this is it. Granted Boot Camp was just released, and a lot of work would have to be done before any of the options below would be realized. Apple might just run out of time before the Leopard release and be forced to integrate Boot Camp for the time being. If this happens though, I think they’ll be working overtime to improve virtualization in Leopard++.

Option #2: Apple licenses Parallels/VMware, users run Windows within a window.

I haven’t run Parallels on my MBP yet, but I do run VMware Server on Linux and it is a very good solution for running multiple OSes at the same time on a single machine. The drawback here, of course, is that you lose some processing power to the emulation environment. Usually this is only between 5-10% per a virtual machine, but if you are running 3 or 4 OS instances, that could utilize up to 40% of a CPU just in overhead. In addition, a lot of Mac users, myself included, have one reason to run Windows on their Mac: games. Thus far, neither Parallels nor VMware have been able to support OpenGL/DirectX in hardware, throwing the option of gaming out the window. And once Vista arrives, hardware graphics acceleration support will be that much more important. Apple is in a pretty good position to add accelerated graphics because they have control over the hardware. They know which GPUs they need to write drivers for, and I think that it’s certainly possible for Apple to go this route.

Option #3: Apple builds upon the Wine project, users run Windows applications natively on Mac OS X.

I think a lot of users would like to see Apple go this way with virtualization in Leopard. How cool would that be to double-click on a Windows application and have it launch on your Mac? There are several problems with this. First, you have a problem set by example. The Wine project has been under development for over 10 years, since 1993. Thus far, Wine only supports a small subset of Windows applications, and often-times the user has to jump through hoops just to get a Windows application working correctly under Wine. This usability is very un-Apple like. Apple, of course, has some very talented engineers in it’s employ along with something the Wine project does not have–a development relationship with Microsoft. Even so, I would find it hard to imagine Apple pulling this off in only a year or two, given how long it has taken the Wine project to get this far.

Furthermore, this option opens a can of worms when it comes to security. When you run Windows in a separate virtual machine, there is some comfort involved. Most likely, even if you were to get a Windows virus (or 100), your Mac OS X files would be safe. With native execution of Windows applications, you no longer have this comfort; Windows apps would have full access to all your Mac files. Apple could possibly build in some form of jail, so Windows would only have access to certain directories on the filesystem. However, I think this would be contrary to the point of implementing virtualization in this native manner. You want the tight Windows integration into Mac OS X, so files in your home directory can be used with applications built for either OS. With such tight integration, you have to compromise security.

Finally, this option is an interface nightmare, comparable to running X11 on Mac OS X . While I love having the ability to run X11 applications on my Mac, the user experience it provides is mediocre. For one, each X application doesn’t really appear to be completely separate like a standard Mac OS X application. Instead, all X11 apps are all running under a single application instance of Native Windows virtualization would most likely operate in a similar manner, where a application would load to provide a framework for other Windows applications to run. Another interface problem is that Windows applications often have their menus associated with individual windows. This is contrary to Mac OS X applications which have a menu bar for the entire application. Mixing these two interface paradigms is messy at best.

In short, I don’t think this is a very good option and this is probably the least-likely route for Apple to follow.

Option #4: Apple releases Boot Camp 2.0 with hardware partitioning, users run any number of OS instances concurrently on their Mac.

This is the option I think Apple should follow. Hardware partitioning has been around for quite a few years (see IBM’s LPAR, for example), but until recently this technology has been restricted to high-end servers and mainframes. With the upcoming releases of 4 and even 8 core Intel processors, hardware partitioning is becoming a much more attractive option in my opinion.

If you aren’t familiar with hardware partitioning, basically it’s a method to split the hardware resources of a machine between different OS instances. Say you have a 4 processor box with 8Gb of memory. With hardware partitioning, you can define one partition to use 2 processors and 2Gb of memory, while the other uses the remaining 2 processors and 6Gb of memory. Likewise, you could have 4 OSes, each with it’s own dedicated CPU and 2Gb of memory. There is a often limit to how much you can split the resources available. For instance, it can be difficult to write software to share a CPU between two OSes, so it makes sense that you can only have as many OS instances running as you have CPUs available. Likewise you can split memory or disks in the same manner. Each partition can run any OS the hardware will support, so in the case of Mac hardware, you could run a partition for Mac OS X, one for Windows, another for Linux, FreeBSD, Solaris, etc…

In more recent years, VMware has been developing a product called VMware ESX Server. This software runs directly on the hardware, an OS of it’s own if you want to think of it that way. Then virtual OS instances sit on top of VMware ESX. VMware has even tackled the problem of sharing CPUs between virtual machines.

I envision Apple’s Boot Camp being able to do the same thing. Boot Camp would become a miniature operating system in it’s own right, and then load up the OS instances you have configured. Of course, Apple would come up with an easy way of configuring your virtual machines, and an easy way to switch between them. I envision the OS switching to be a lot like Fast User Switching…maybe they’ll even call it Fast OS Switching. 🙂 Also, since all of your virtual machines are running at the same time, it’s not a problem to use VNC to view your other running OSes in separate windows on Leopard.

We’ll find out Monday morning

All our questions will be answered come Steve Job’s keynote Monday morning at WWDC. I’ll be taking a flight out to San Francisco tomorrow in order to attend. Virtualization, of course, is just one small topic that may be brought up at WWDC this year. This year I’m expecting quite a few announcements coming from Cupertino, and I think it’s going to be a great week. If you are going to be there, drop me a line (mike at gaucho soft dot com) and we can meet up.

Another one bites the dust…

I’m very thankful for RAID 5 at the moment…gotta love that parity thing. I checked my server status emails this morning, only to find these lines in /proc/mdstat:

md1 : active raid5 sda4[0] sdc4[2] sdb4[3](F)
      576283520 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]

Hmm, that little F doesn’t look too promising, and one of the U’s is missing. So I look into this a bit further and find:

root@rio:/rio# mdadm --detail /dev/md1
     Raid Level : raid5
     Array Size : 576283520 (549.59 GiB 590.11 GB)
    Device Size : 288141760 (274.79 GiB 295.06 GB)
   Raid Devices : 3
  Total Devices : 3
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1

    Number   Major   Minor   RaidDevice State
       0       8        4        0      active sync   /dev/sda4
       1       0        0        -      removed
       2       8       36        2      active sync   /dev/sdc4
       3       8       20        -      faulty        /dev/sdb4

Sure enough, after checking /var/log/messages, last night at around 8pm a disk failed…

kernel: ata2: status=0x25 { DeviceFault CorrectedError Error }
kernel: SCSI error :  return code = 0x8000002
kernel: sdb: Current: sense key: Hardware Error
kernel:     Additional sense: No additional sense information
kernel: end_request: I/O error, dev sdb, sector 18912489
kernel: RAID5 conf printout:
kernel:  --- rd:3 wd:2 fd:1
kernel:  disk 0, o:1, dev:sda4
kernel:  disk 1, o:0, dev:sdb4
kernel:  disk 2, o:1, dev:sdc4
kernel: RAID5 conf printout:
kernel:  --- rd:3 wd:2 fd:1
kernel:  disk 0, o:1, dev:sda4
kernel:  disk 2, o:1, dev:sdc4

I’m a bit surprised because the drives I used for this RAID are manufactured by Seagate, which I’ve had luck with in the past. Fortunately, Seagate offers a 5 year warranty for all of it’s drives, so this one is going back to the manufacturer to be replaced. In the mean time, I ordered another disk with overnight shipping–I need to take care of this before leaving for WWDC on Saturday. 🙂

Update (8/4): The replacement disk arrived yesterday afternoon and I was able to re-add partitions to the RAID volumes using mdadm <raid volume device> --add <disk device>. Rebuilding went pretty quick–/usr finished rebuilding in less than a minute and the larger volume took just over an hour and a half:

Personalities : [raid5]
md1 : active raid5 sdb4[3] sda4[0] sdc4[2]
      576283520 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
      [>....................]  recovery =  2.7% (8006784/288141760)
      finish=98.3min speed=47479K/sec
md0 : active raid5 sdb1[1] sda1[0] sdc1[2]
      5863552 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

© 2022 *Coder Blog

Theme by Anders NorenUp ↑