Today I spent some time optimizing the Particle Mode simulation code in Seasonality Core. While doing some measurements, I discovered that quite a bit of time was spent in GCD code while starting new tasks. I use dispatch_apply to iterate through the particles and run the position and color calculations for the next frame. In the tests below, I was simulating approximately 200,000 particles on the Macs, and 11,000 particles on the iPad.

I decided to try breaking the tasks up into fewer blocks, and run the dispatch_apply for groups of around 50 particles instead of running it for each particle. After making this change, the simulation ran in up to 59% less CPU time than before. Here are some informal numbers, just by looking at Activity Monitor and roughly estimating:

CPU Usage
Device   Before   After   Time Savings
Mac Pro (2009, Oct 2.26Ghz Xeon)   390%   160%   59%
Retina MBP (2012, Quad 2.6Ghz i7)   110%   90%   18%
MacBook Air (2011, Duo 1.8Ghz i7)   130%   110%   15%
 
iPad 3 (fewer particles)   85%   85%   0%

As you can see, the benefits from the new code running on the Mac Pro are substantial. In my earlier code, I was somewhat suspicious of why the simulation took so many more resources on the Mac Pro than on the laptops. Clearly the overhead in thread creation was a lot higher on the older Xeon CPU. This brings the Mac Pro’s processing times closer to what the other more modern processors can accomplish.

Perhaps an even more surprising result is the lack of a speedup on the iPad. While measuring both runs, the two versions averaged about the same usage. Perhaps if I had a more formal way to measure the processing time, a small difference might become apparent, but overall the difference was minimal. I’m guessing that Apple has built logic into the A-series CPUs that allows for a near 0 cost in context switching. Makes you wonder how much quicker something like this would run if Apple built their own desktop-class CPUs.