Paul Bunn, CTO, UltraBac Software
"Instant availability of a backed up physical system by making virtual virtual-disks available for use by a hypervisor, using compressed and/or encrypted backup data as a source."
Background of Invention
Much prior art exists for generating a "backup" of an existing physical computer system. Typically, the "backup data" has been stored on removable magnetic media such as tape or disk. More recently backing up a user's data over the internet to a backup service provider has become quite common. This is often referred to as "backing up to the cloud" or simply "cloud backup."
When a backup of a physical machine is stored "in the cloud" in this fashion, an expedient restore is quite difficult because this has to be restored in one of the following ways:
- The backup service provider acquires a replacement hard drive for their end user and restores the data to this disk. This new disk is then sent to the end user via courier or a postal service such as USPS or FedEx. Combined with the process of the restore, which may take several hours, this approach means that the end user is left with an inoperable machine for days until the replacement disk is received.
- The end user obtains a new computer or installs a new disk at their location, then restores their data over the internet (from "the cloud") to their new machine. This is a practical approach if only a small amount of data needs to be restored. Since end users' drives are now often in excess of 1TB of data, this could take days or weeks to restore completely and is thus an impractical approach.
- More recently, a hybrid approach has been gaining popularity whereby a backup service provider makes available a virtual machine ("VM") based on the backup made from the backup data. It may take several hours to read the backup data to generate a virtual disk file that represents the original disk data. This requires that, once the backup data has been used to generate virtual disks, a P2V process ("physical-to-virtual" conversion) is applied (a common process in the backup industry) to make the disks bootable in a virtual environment. Then Scenario 1 (above) is used to send replacement disks back to the user. Prior to receipt of the disks, the end user is able to login to the virtual machine representing their backed up machine via standard remote login processes typically offered by the operating system, or it can be made to be available via a "Web portal" where the end user visits a Web page where they can, after authorization (by providing login credentials) manipulate their virtual machine as if it were local. When the user later receives their replacement disk, they can insert it into their original computer, replacing the failed disk. There will need to be a synchronization process where changes made to the computer while it was running in the virtual machine need to be applied to the now out-of-date disk that has just been received. The process for this synchronization is beyond the scope of this invention.
Summary of Invention
When Scenario 3 (above) is used, there is a delay (possibly of several hours) during the restore process when it is not possible to boot or use the virtual machine that represents the end user's backup. This invention eliminates this lengthy delay and makes the virtual-disks available within just a few minutes. These virtual-disk files themselves will be virtual; they initially contain no data, so they are referred to here as virtual virtual-disks (VVD). The VVDs can be created in any form that can be recognized by various virtualization technologies available. For example, VMware Inc.'s "VMDK" file format may be the intended file format, or Microsoft's "VHD" file format used by "Hyper-V" might be the intended format. However, this is not exhaustive, as there are other technologies available such as Virtual Box (Oracle), Xen (Linux Foundation), and others. It is important to note that the data does NOT need to be "restored" at any point in this process, although it is an option to do so in the case the migration to a virtual system is to be permanent. As such, although the term "restore" is used, what is actually taking place is a process by which the backup data is simply made available to the hypervisor in order for the hypervisor to directly use the compressed/encrypted backup data without the need for any intervening "restore" process being performed.
A provided utility program is run on a machine that is designated as a "translation server" that runs Microsoft's Windows Server operating system. This utility is given as a parameter where the backup data may be located, and where the "stub" VVDs will be targeted on an existing filesystem. These stub files may reside locally on an attached disk device, or could be attached remotely on an NFS or SMB share. If desired, a similar program could be written for UNIX-like operating systems such as Linux, and the translation process could be performed there using similar techniques.
The device driver ("UBFD" or UltraBac Filter Driver) on the translation server is provided this information and initialized. It is instructed that it is in "VVD creation mode" so that all writes of data to this file are simply discarded so that it may be created quickly without passing on writes to "zero fill" the file. Since it is a time-consuming process to create an empty file, much time can be saved (many minutes, or even hours) by discarding writes temporarily during the file creation. Once the operating system believes that it has successfully created the file, UBFD is instructed that it is no longer in "VVD creation mode." (CLAIM: A method to decrease the time taken in order to create an empty file). An alternate technique can be used where a sparse-file can be created either locally or on a remote server so that it is not necessary to use the "VVD creation mode" but rather rely on the operating system's inherent capability to manage "sparse" files (data files that have large allocations but empty sections of the file do not occupy actual backing store, and thus the files grow/shrink to match the size of the actual data they contain).
UBFD on the translation server will then be instructed to perform a virtual mount of each of the drives/partitions that represent the end user's desired drives to use in the virtual environment. These mounts will not be exposed to the user environment of the translation server, but will be used exclusively by UBFD itself in translating IO intended to the VVDs. The mounting of this type is covered by prior art (this mount capability was developed for UltraBac Warp in 2007). It involves loading many "indexes" and tables into memory to be able to quickly locate any offset in the original disk and quickly locate the compressed data within the backup files.
Once the disks are mounted, the data held inside the VVDs will have the necessary changes made to the registry and files so that it is converted from a physical to a virtual machine. This process is known as a "P2V" process and is covered by prior art (UltraBac has had this feature for many years).
At this point, the virtualization software (such as VMware's ESX or Microsoft's Hyper-V) may be used to create a new virtual machine that represents the intended end user's actual environment. As part of this, the virtualization software can be pointed to the newly created VVD files as the intended source for the data of these virtual disks. The process described above will only take a few minutes and, once complete, the VM may then be "powered on" to boot from and access all the VVDs, thus "restoring" it.
When the virtualization software issues IO on the VVDs, the UBFD software will automatically intercept all IO requests and, depending on the type of request, handle them thusly (using ESX as an example):
READ: A Read request will be for a given logical address, an offset into the VVD and a length. This will NOT be issued to the underlying filesystem, but rather simply redirected to UBFD which will be given the offset/length. UBFD will perform all the necessary translations to read from the compressed backup data and deliver the usable uncompressed data back to the requestor (ESX). In order to possibly improve performance, it would be optional to write this data into the "real" VVD file, then update tables to indicate that this range of the data has already been previously read. If a Read for this part of the disk is then received, the underlying VVD file will be read directly (or UBFD could simply let the IO pass through without modification/redirection). Because there would not be any need in this case to decompress the data (a computationally intensive task), it would likely be faster than reading it again from the compressed backup files.
WRITE: In the case of a Write, the written data is allowed to be written at the intended location directly to VVD file. UBFD's in-memory tables are updated to reflect the offset/length just written, which must in future be retrieved directly from the VVD file. If a Read of this data is requested in the future, it must not be redirected to the compressed backup data, but must be allowed to be read from the underlying VVD file.
Once the end user has finished using the VM, a utility is run to instruct UBFD that the mounts are no longer needed, can be dismounted, and the memory used by the tables can be freed. UBFD will no longer intercept any IO directed to any file that it previously considered a VVD file. These VVD files may then be deleted.