Azure Site Recovery Error 153006
Updated: Apr 4, 2019
Azure Site Recovery (ASR) provides relatively good details and instructions through the Azure admin portal when encountering an error. But recently, I encountered an error in which the provided guidance in fact had nothing to do with the actual root cause of the error.
So what does this error mean? According to the Azure administrative portal, it indicates that an App-Consistent Snapshot failed due to potential network issues.
Simply looking at the error message in the portal, I tried all the recommended troubleshooting but the reality was that my Azure VMs had no outbound filtering, firewalls or NSGs in place that would inhibit the VMs from talking to ASR. In addition, since the VMs are hosted in Azure, the network latency was completely outside my control. This left me at a standstill, especially since I had other VMs in the same VNET and subnet (which had the exact same outbound network configurations) which were not having any issues at all.
So the underlying issue was not necessarily a networking issue as the error indicates, but rather an issue with the ASR agent and the Volume Shadow Copy Service (VSS).
The indication that an "App-Consistent" recovery point has failed means that VSS could not dump app-specific in-memory contents to disk to be included in the snapshot.
VSS does this by using VSS writers, which are application-specific and enabled by the applications themselves. The Windows operating system itself has several VSS writers that it uses to enable VSS snapshots for Windows components.
The Azure VMs I was experiencing this issue with were an Exchange Server 2016 hybrid server and a server with an application leveraging SQL Server, both of which have their own VSS writers. Sure enough, using an elevated command prompt, I ran the command vssadmin list writers and it came back indicating that there were writers in a failed or erroneous state (basically anything not in the "stable" state).
For each VSS writer that was not in the stable state, I restarted the underlying service and that was able to restore the VSS writer. After doing this, running vssadmin list writers showed all writers in a stable state.
Once all writers were stable, I ran C:\Program Files (x86)\Microsoft Azure Site Recovery\Agent\vacp -systemlevel from the command prompt to have the ASR agent complete an app-consistent recovery point.
This enabled ASR to complete a successful app-consistent snapshot for the effected VMs, but what I noticed on my Exchange Server VM, after a day the issue returned. Therefore this process is not necessary a full fix but can help fix ASR when there are intermittent VSS writer issues.
So ultimately, the issue I need to troubleshoot is not ASR but rather why other VSS writers on my VM are not maintaining a stable state.
In working with Microsoft, they were able to provide a modified ASR agent executable that ignored the states of the other VSS writers by "allowing VACP to take another writer replacing the one that is preventing the creation of the app-consistent snapshot." (explanation given my Microsoft)
They also indicated that this should not have any adverse affects on the quality of the app-consistent snapshot by ASR, so if you cannot figure out why the erroneous VSS writer is occurring, this could also be an option by contacting Microsoft.