One fairly common project for a meteorology student to participate in after taking a few years of coursework is to do a case study poster presentation for a conference. With finishing up my synoptic scale course series this past spring, now would be a good time for me to work on a case study. What does a case study involve? Well, typically synoptic storms are fairly short-lived, lasting for 4-10 days. With a case study, you take a closer look at what was happening dynamically in the atmosphere during that storm, usually over a smaller region.
Picking a storm to look at for me was easy. Four years ago this October, I was visiting family in upstate New York and a very strong storm came through the region. Usually storms in October would drop rain, but this one was strong and cold enough to drop snow, and the results were disastrous. In Buffalo, 23 inches of snow fell in 36 hours. Buffalo is used to getting this much snow in the winter, but since the leaves hadn’t fallen off the trees yet, a lot more snow collected on all the branches. Thousands of tree limbs fell due to the extra weight, knocking out power for hundreds of thousands of people. Some homes didn’t have power restored for over a week. When I drove around town the next day, it was like a war zone, having to dodge tree branches and power lines even on main roads in the city.
So it was easy for me to pick this storm (I even wrote about it back then). Next we (I’m working with my professor and friend, Marty, on this project) needed to pick something about the storm to focus upon. I can’t just put a bunch of pictures up and say, “Hey, look at all the snow!” There has to be some content. For this case study, Marty thought it might be interesting to look at how different microphysical schemes would effect a forecast over that time period.
This was a really tough event to forecast. Meteorologists could tell it was going to be bad, but with the temperature just on the rain/snow boundary, it was difficult to figure out just how bad it would be and where it would hit the hardest. If temperatures were a couple degrees warmer and this event resulted in rain instead of snow, it would have been a bad storm, but there wouldn’t have been the same devastation as there was with snow.
Microphysical schemes dictate how a forecast model transitions water between states. A microphysics scheme would determine what physical traits would have to be present in the environment to result in water vapor condensing to form liquid and create clouds, freeze into ice, or collide with other ice/water/vapor to form snowflakes. Some schemes take more properties of the atmosphere and physics into account than others, or weight variables differently when calculating these state changes. If I look at which scheme did the best job forecasting this event, then meteorologists could possibly run a small model with that same scheme on the next storm before it hits, to give them a better forecast tool.
To test these schemes, I have to run a model multiple times (once with each scheme). To do that, I had to get a model installed on my computer. Models take a long time to run (NOAA has a few supercomputers for this purpose). I don’t have a supercomputer, but my desktop Mac Pro (8×2.26 Ghz Xeons, 12 GB RAM) is a pretty hefty machine that might just let me run the model in a reasonable amount of time. I’m using the WRF-ARW model with EMS tools, which is commonly used to model synoptic scale events in academia. This model will compile on Mac OS X, but after a week of hacking away at it, I still didn’t have any luck. I decided to install Linux on the Mac and run it there. First I tried Ubuntu on the bare metal. It worked, but it was surprisingly slow. Next I tried installing CentOS in VMware Fusion, and it was actually faster (20%) than Ubuntu on the bare machine. The only explanation for this I can think of is that the libraries the model is compiled against were built using better compiler optimizations in the CentOS distribution. So not only do I get a faster model run, but I also can use Mac OS X in the background while it’s running. Perfect.
Once the model is installed, I have to setup a control run using parameters generally used in the most popular forecast models. There are several decisions that have to be made at this stage. First, a good model domain needs to be specified. My domain covers a swath 1720×1330 kilometers over most of the Great Lakes area, centered just west of Buffalo. For this large of a storm, a 4 km grid spacing is a pretty good compromise between showing enough detail and not taking years for the model to run. For comparison, the National Weather Service uses a 12 km grid spacing over the whole US to run their NAM forecast model 4 times a day. To complete the area selection, we have to decide on how many vertical levels to use in the model. Weather doesn’t just happen at the earth’s surface, and here I set the model to look at 45 levels from the surface up through around 50,000 feet. (I say “around” here because in meteorology we look at pressure levels, not height specifically, and with constantly changing pressure in the atmosphere the height can vary. The top surface boundary the model uses is 100 millibars.)
In case you didn’t notice, this kind of domain is quite large in computing terms. There is a total of 5,676,000 grid points in 3 dimensions. When the model is running, it increments through time at 22 second intervals. The model will calculate what happens at each of those grid points in that 22 seconds, and then it starts all over again. Usually, the model only writes out data after every hour, and I think it’s pretty apparent why this is the case. If I configured the model to output all the data at every time, there would be more than 44 billion point forecasts saved for the 2 day forecast run. Each of these forecasts would tell what the weather would be like at a particular location in the domain at a particular time, and each forecast would have around 30-50 variables (like temperature, wind speed, vorticity, etc). If those variables were simple 32 bit floats, the model would output about 6 TB of data (yes, with a T) for a single run. Obviously this is far from reasonable, so we’ll stick to outputting data every hour which results in a 520MB data file each hour. Even though we are outputting a lot less data, the computer still has to process the 6 TB (and the hundreds of equations that derive that data), which is quite incredible if you think about it.
My Mac is executing the control run as I’m writing this. To give you an idea, it will take about 12 hours for the model run to finish with VMware configured to use 8 cores (the model doesn’t run as quickly when you use hyperthreading) and 6 GB of RAM. This leaves all the hyperthreading cores and 6 GB of RAM for me to do stuff on the rest of my Mac, and so far I don’t notice much of a slowdown at all which is great.
So what’s next? Well after getting a good control run, I have to go back and test and run the model again for each of the microphysics schemes (there are 5-7 of them) and then look through the data to see how the forecast changes with each scheme. I’m hoping that one of them will obviously result in a forecast that is very close to what happened in real life. After I have some results, I will write up the content to fill a poster and take it with me to the conference at the beginning of October. The conference is in Tucson, which is great because I will have a chance to see some old friends while I’m there.
What does this mean for Seasonality users? Well, learning how to run this model could help me improve international Seasonality forecasts down the line. I could potentially select small areas around the globe to run more precise models once or twice a day. With the current forecast using 25-50km grid spacing, running a 12 km spacing would greatly improve forecast accuracy (bringing it much closer to the forecast accuracy shown at US locations). There are a lot of obstacles to overcome first. I would need to find a reasonably sized domain that wouldn’t bring down my server while running. Something that finishes in 2-3 hours might be reasonable to run each day in the early morning hours. This would be very long term, but it’s certainly something I would like to look into.
Overall it’s been a long process, but it’s been a lot of fun and I’m looking forward to not only sifting through the data, but actually attending my first meteorology conference a couple of months from now.