Our client has hardware NUMA on their SQL Server box. There are 2 NUMA nodes and each contains 8 CPUs. At the beginning, things will run fine. But after a few days, we start to notice the CPUs belongine to one NUMA node (parent_id 1) are under more pressure
(higher runnable_tasks_count). And the performance of our system will start to suffer.
This is a SQL Server 2005 64 bit environment. It is running SP2. The server has hardware NUMA. No affinity mask is set.It is also a dedicated SQL server box. There is no anti-virus software or OS job running on the box.
It is basically an OLTP system. It has very high traffic during certain hours in a day. What we noticed was that things never really went wrong in the middle of those busy hours. Instead, the system would all of a sudden go into problems during quiet hours.
It then occurred to us we had some jobs scheduled during the quiet hours to move data around or delete obsolete data.
So our guess was that this problem could be caused by those jobs filling up the buffer pool or cache of one of the NUMA nodes (we have MAXDOP set o 1). That CPU then became slower when handling normal tasks because it has to do extra memory processing.
To confirm our guess, we have been flushing buffer and cache after all the jobs are complete in the past 2 weeks. So far the system seems to function properly.
View Complete Post