Improving backup and restore times for large EC2 instances or EBS volumes

Backup and restore performance

You can improve the performance of your EC2 and EBS backup and restores. When Clumio backs up and restores EC2/EBS data, it uses AWS APIs, specifically GetSnapshotBlk and PutSnapshotBlk. These AWS APIs have a default soft limit of 1000 RPS (requests per second).

Clumio recommends improving backup and restore performance by increasing the limits of these APIs. Increasing these limits is particularly helpful if you are protecting either a large number of EC2 instances, or a few large EBS volumes with high rate of data change.

Increasing GetSnapshotBlk and PutSnapshotBlk API limits

An AWS support ticket will need to be raised in the account for which you wish to increase the limits for GetSnapshotBlk and PutSnapshotBlk APIs.

An AWS support ticket format for this is provided below. Text highlighted in RED will need to be updated.

Subject:
Request to raise GetSnapshotBlock and PutSnapshot Block API Limits

Body:
AWS Support - We are currently with Clumio to back up EC2 instances / EBS volumes. I would like to request an increase to the default GetSnapshotBlk and PutSnapshotBlk API limits to 4000 RPS for the following regions:

- {REGION NAME}

- {REGION NAME}

Backup operations will have RPOs ranging from 1 - 24 hours with multiple read/write operations executed in parallel. The daily change rate of EC2 instances / EBS volumes is expected to be between {CHANGE RATE IN GB}. Clumio will spread out API traffic associated with the backup/restore operations throughout a configured backup window.

Calculating TPS to meet your RTO

When designing a disaster recovery plan for EC2, achieving your Recovery Time Objective (RTO) is crucial. One key factor in meeting your RTO is determining the required Transactions Per Second (TPS) for restoring EC2 assets efficiently using the PutSnapshotBlock API. This article will help guide you calculate the desired TPS to meet your RTO requirements.

Required inputs

  • Total size of the environment in KiB
  • 1024^x to perform byte conversion
  • 512 KiB Block Size for PutSnapshotBlock as a constant
  • Converting your RTO time into seconds

Example calculation

Below is an example of a 500TiB environment with a 24 hour RTO

  1. Convert TiB to KiB for the total size of the environment. Since the size is given in TiB, we will use the 1024 to the power of 3 to convert this value to KiB. You can adjust the power value to convert between byte units accordingly.
    1. 500 TiB * 1024^3 = 536,870,912,000 KiB
  2. Convert RTO to seconds. As the RTO is 24hours, we will need to multiply by 60 twice to account for the unit conversion between hours and minutes.
    1. 24 60 60 = 86,400 seconds
  3. Calculate the total amount of blocks. This equation is using the total size of the environment, divided by the 512KiB block size constant.
    1. 536,870,912,000 KiB / 512 KiB = 1,048,576,000 blocks
  4. Divide the total number of blocks by the RTO to get the TPS value.
    1. 1,048,576,000/86,400 = 12,140 TPS

Questions?

Reach out to [email protected] with any questions.