09 Apr

ESXi 6.0 PSOD with vSGA & Nvidia hardware

Posted by Jonathan Bastin

Ever since ESXi 6.0 was released there have been reports of 'purple screen of death' when using an Nvidia GPU and vSGA. This is something that we ourselves have experienced and have also read posts of others having the same problem.

We have been working with VMware for 18 months to try and get them to recognise that there is an issue, and ultimately find a resolution. Over the last month we have had a break-through with VMware whereby they have identified the issue and the cause. It seems that a bug that was fixed in 5.1u3 is back, being xmap not working correctly and causing the whole host to 'purple screen'.

VMware are working with Nvidia to provide a full fix to this issue. From our tests there is no work-around other than to use vGPU, or no hardware acceleration if keeping with ESXi 6.0 branch. Although ESXi 5.5 is coming to EOS in September 2018 we would recomend staying on ESXi5.5 until a fix is released.

https://kb.vmware.com/s/article/53511

The full stack trace is hard to see when not running debug mode but you should see PSOD "Exception 13" or "Exception 14" both are relating to memory issues but it isn't unstill you dig deeper to see it is relating to the Nvidia card. From our own experiance you also don't see the output from the KB article unless running a debug version of ESXi.

UPDATE - Jun-18

After much testing 6.5 before the December 2017 updates has file locking issues and we were not recommending using 6.5. Following the update we have tested 6.5 and the vSGA purple screen of death is not an issue. Although VMware are releasing an update to 6.0 to fix the issue due to additional speed benefits from 6.5 we would now recommend upgrading.