Troubleshooting: Unresponsive virtual machines during a snapshot removal operation of an on-going Clumio backup task

Scenario

A virtual machine residing on the NFS datastore goes to an unresponsive state/stops responding during the snapshot removal operation of an on-going Clumio backup tasks.

This issue can be captured by correlating the timestamps of any observed VM timeouts with the 'Snapshot Remove Task' timestamp captured from the vCenter environment.

Screen_Shot_2020-10-06_at_6.01.01_PM.png

Impacted Environments

  • VMs residing on NFS Datastores leveraging NFSv3 protocol to mount disks with HOTADD transport method

Cause

This behavior is caused due to a known VMWare issue that occurs when VMs on NFSv3 storage are backed up by backup appliances using the HOTADD method to mount the virtual disk. 

Further details around this are capture by the following VMWare KB Article:

https://kb.vmware.com/s/article/2010953

Resolution

This issue is resolved by using the NFSv4 protocol for mounting the NFS volumes.

Workaround

This issue can be addressed by leveraging NBD transport method as a default for all the backups in the environment.

Contact [email protected] if you wish to address this issue with the recommended workaround.