What is load average? I’ve heard all kinds of vague explanations over the years, and it bothers me to continue hearing all the absolutely wrong descriptions of the term and what are “high” values for this number. I’ve heard things like “anything higher than 3X your number of CPUs is bad”, or “as long as it’s under 10 everything should be fine.” Not so.
Some of the misconceptions come from other UNIX and Linux OS’s, which measure the value differently. So an incorrect definition doesn’t necessarily demonstrate a lack of knowledge, but some amount ignorance to the way Solaris does it. Linux for example, also includes in its calculation the threads waiting for I/O, not just threads waiting for CPU.
In previous versions of Solaris (2.3-2.9), load average was a simple calculation. It was the average number of runnable and running threads. In other words, it was the number of threads running on the CPUs, plus the number of threads in the run queue, waiting for CPUs, averaged over time.
In Solaris 10, load average is calculated slightly differently than in previous versions.
The calculation is made by summing high-resolution user time, system time, and thread wait time, then processing this total to generate averages with exponential decay.
This calculation is slightly more comprehensive (and complex), because it takes into account CPU latency – the time taken to move a thread from the run queue onto a CPU. However, the older way of calculating this will yield almost identical results, so either definition I’d call “correct”. I still use the older definition because it is just easier to understand.
So what is a “high” number for load average? Well, first it depends on how many CPUs you have on your system, since the calculations do not take that into account. If you have one CPU, then a load average of 1.0 would mean you are, on average, consuming exactly 100% of that one CPU over the measurement period. If your number climbs above 1.0, then you have threads in the run queue at some point, waiting for CPU time. Solaris actually handles CPU saturation very well, so this may not mean your performance will degrade; it just means your CPU is well-used.
On the other hand, if you have 8 CPUs and a load average of 32, you may be seeing a performance degradation, as your system is somewhat CPU-bound. Each CPU is, on average, 100% utilized by running threads, and there are, on average, 24 more threads in the run queue. Depending on the application, this may be acceptable – it just depends on the expected response-time or expected processing time for your application.