DBCC LOGINFO on SQL 2005 causes “non-yielding processes on scheduler” errors and server timeouts

If you are using DBCC LOGINFO in the production environment and you are lucky enough to still have an instance of SQL Server 2005 which, suddenly out of the blue, started throwing “non-yielding processes on scheduler” errors into the error log and becomes unresponsive, then you should know that there can be a connection between the two.
It is most probably a bug in the DBCC, reproducible on SQL 2005 instances only.
It’s been fixed for SQL 2008 and up.
Maybe this is one of the reasons why this DBCC is undocumented…

I have the following procedure which loops through all the databases and checks the number of VLFs (virtual log files - logical sections in the transaction log file allocated each time a log growth occurs).
It’s using an undocumented, but very useful for this purpose, DBCC LOGINFO. Unfortunately, there was a database snapshot on the server and the procedure got executed there as well as on other databases (not on purpose, of course, just used sp_msforeachdb procedure without too much thinking):



ALTER PROCEDURE [ResourceMonitoring].[CheckVLFnumber]

AS

BEGIN

CREATE TABLE #vlf(field int, filesize bigint,startoffset bigint,fseqno bigint,

status int,parity int, createlsn varchar(50))

CREATE TABLE #countVLF(dbname varchar(256), VLFcount int, LogSize int, LogGrowth int)

EXEC sp_msforeachdb '

TRUNCATE TABLE #vlf;

DECLARE @LogSize int, @LogGrowth int

SELECT @LogSize = (size * 8.) / 1024.,

@LogGrowth = (growth * 8.) / 1024.

FROM ?.dbo.sysfiles

INSERT INTO #vlf

EXEC(''DBCC LOGINFO (''''?'''') WITH NO_INFOMSGS'');

INSERT INTO #countVLF

SELECT ''?'',count(*) ,@LogSize,@LogGrowth

FROM #vlf; '

IF EXISTS (SELECT * FROM #countVLF WHERE VLFcount > 300)

BEGIN

SELECT *

FROM #countVLF

WHERE VLFcount > 300

ORDER BY VLFcount desc

END

ELSE BEGIN

SELECT 'None'[Output]

END

END



According to my investigation, when you issue DBCC LOGINFO against a database snapshot (which, again, can happen by mistake only because there is no database log file in the database snapshot anyway) the session gets stuck with ‘spinloop’ status in sys.dm_exec_requests and never finishes. As soon as any DDL/update/insert/delete statement is issued against the snapshot source database, you get nice mini-dump together with the following error in the error log (143 is the session_id issuing DBCC)

Process 143:0:0 (0x684) Worker 0x000000008086A1C0 appears to be non-yielding on Scheduler 0. Thread creation time: 12963945475631. Approx Thread CPU Used: kernel 0 ms, user 359 ms. Process Utilization 0%. System Idle 99%. Interval: 70015 ms.

Your server then starts to suffer from log waits on ALL databases:










At this point in time mysterious system ‘TASK MANAGER’ processes then start coming into the picture, taking exclusive lock on system tables sysdbreg and sysobjvalues. Take a look below ( click to enlarge) how the locking picture looks like:




Interesting to know that those system processes are used during flushing changes into the log file, WRITELOG waittype indicates a worker thread that is waiting for the LogWriter to flush log blocks).



All locks stay in place and are never released. All other statements against the source database are suspended. My favorite part here is, when I tried to restart the server, SQL Agent wouldn’t go down properly because it too was impacted by metadata locking and without stopping it the SQL Server Service will not go down as well. Well, no failover. Restart.




In my opinion they shouldn’t allow using this procedure on a database snapshot In the first place. Well, they do it with DBCC LOG





Adding a filter, that ignores database snapshots, can avoid this problem and many other potential issues if you like using sp_msforeachdb for a database loop:

EXEC sp_msforeachdb '

IF EXISTS ( SELECT * FROM sys.databases

WHERE source_database_id is null and name = ''?'' )

BEGIN

TRUNCATE TABLE #vlf;

DECLARE @LogSize int, @LogGrowth int

SELECT @LogSize = (size * 8.) / 1024.,

@LogGrowth = (growth * 8.) / 1024.

FROM ?.dbo.sysfiles

INSERT INTO #vlf

EXEC(''DBCC LOGINFO (''''?'''') WITH NO_INFOMSGS'');

INSERT INTO #countVLF

SELECT ''?'',count(*) ,@LogSize,@LogGrowth

FROM #vlf;

END

'



During server monitoring we usually filter out system processes and look at user connections only. It is important to know that TASK MANAGER system processes are taking active part in user processes when they execute distributed queries, extended stored procedures and CLR routines and also participate in user transactions, taking active role in the transaction logging process.

Comments

Popular posts from this blog

SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean

Look back and realize how far you came

The Greatest Reasons to use or not to use a Centralized Data Access Architecture