If you are interested in adding GCFOS technology to your application, please fill out the OEM Form.
How Can GCFOS be Implemented and Used?
An easy-to-implement API is called to determine if a file is common or not. If a file is determined to be common then only 24 bytes need to be stored (instead of the file data). For traditional "file-by-file" backup products, each file is queried and either the file data or a reference is stored for each file backed up. In the case of "image backup" for either physical or virtual machines, the file system can be enumerated so that any files found to be common can be removed from the "used bitmap." Engaging this technology allows the image backup process to skip over all the blocks on disk that are occupied by common files. To enable a subsequent restore or mount, a table must be stored so that the software knows which skipped-blocks are represented by which common files. The original file is then "deleted," freeing the blocks marked as used but not present. The common files will then be restored and reads for skipped blocks are simply re-directed to the just restored files.
What is Provided by UltraBac Software?
A fully documented API specification will be provided (including an example generic client), which exposes all GCFOS functionality via a command-line interface. An experienced software developer can typically implement the GCFOS client code into prospective Partner/OEM applications within several days. During this time, UltraBac Software will be available to provide full development support.
Enabling Block-Based Deduplication with GCFOS
When using a local repository, GCFOS can be configured to enable a "block store" to allow block-level deduplication. This allows a hybrid approach where common files are eliminated from the backup stream first because this is extremely efficient. Any file data not common is put into the block store and automatically de-duplicated. If the same data comes from multiple sources (for example, a file-by-file, image backup, and SQL backup could all deliver the same exact data) the block store will only keep one copy of the data. Subsequent backups of the same data will not result in any additional storage being taken in the block store. Because of this property, subsequent backups effectively become "incremental" backups – storing only the data that is different from the last backup taken. This greatly reduces the storage needed for backups, and reduces backup and restore times.
Overhead and Performance
Advanced caching in the GCFOS client API is simple to use and completely automatic. Hashes are only calculated if the file has been changed since the last time the hash was calculated. Responses from the GCFOS server are also cached on each local machine. Once a GCFOS installation has become fully operational very little overhead is imposed by the GCFOS client API. Typically, only 5-8% of files will need to have any hash calculation performed, and only 20-30% of files will require any communication to the GCFOS server. For example, this means it is possible (on a local GCFOS installation) to enumerate 240,000 files, obtain their hashes, and query the server in as little as a minute. Querying over the internet, like in the case of a GCFOS UltraBac Cloud deployment, will add some incremental time to this process, but it will be more than offset by the reduction in backup time saved by eliminating redundant files. Performance can be further enhanced, when implementing the GCFOS client, by creating a separate thread to query files for commonality. By doing so the answer is known in advance of the physical backup process and virtually eliminates query response latency.
There are no security requirements for the "GCFOS local" type of deployment because the server is run behind a client's firewall and is not exposed to the internet. All clients are auto-discover, auto-configure using a simple authentication mechanism.
For internet deployments (Partner/OEM or UltraBac cloud), all clients are authenticated by the GCFOS server using a 256-bit encrypted challenge/response system based on a unique password for each client (all computers within a client's network share the same client/secret authentication details). Any failed authentication attempt, or any unexpected data received from a client, will result in the IP address being blocked for a period of time. This hardens GCFOS against malicious or unauthorized attacks, and provides some protection against a distributed denial-of-service attack. When clients query a common file they are provided with a key unique to that client for that file. That key must be provided back to the GCFOS server in order to retrieve the file. This is all handled automatically by the backup client, and prevents the malicious use by a client who may somehow know a file's hash but who never physically possessed the file during a GCFOS operation. Therefore, without being able to provide a unique key, it is impossible to retrieve a file even if a correct hash is used. Furthermore, invalid keys provided by a client result in that IP address being blocked from future access for a defined period.
To address additional questions you may have, please visit our FAQ page. To review the four ways that GCFOS can be deployed by Partners and OEMs, please visit the Redundant File Elimination overview page.