Data Backup

Building robust infrastructure requires regular data backups to ensure continuity and security. Whether there is an unexpected outage or a planned migration (like a migration to OpenSearch, Pulsar, or Kafka), backups are critical components of data infrastructure.

Below we outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.

Keep in mind that more than one approach can be employed simultaneously.

#1 – FOR MAXIMUM FLEXIBILITY AND LOW STORAGE COSTS: STORE THE ORIGINAL DATA IN FILE FORMAT

Benefits
Flexibility. Newer versions of programs sometimes introduce breaking changes where certain data would need to be re-indexed. If a new type of database is chosen, then run the same transformation on your backups as the real-time data.
Cost to store. At $0.0245 per GB per month with free inbound network traffic, this will be the cheapest option. Network cost will only occur when a backup is performed and data sent out.

Drawbacks
Cost to restore. If a restore is performed, you will incur the cost of server resources to transform the data. Network charges are around $0.15 per GB restored.
Time to restore. The restore process starts at the beginning of the process rather than in the middle or later. Depending on the amount to restore and server resources, a restore could take a day or more.

#2 – AVOID TRANSFORMATION OF DATA DURING A RESTORE: STORE THE TRANSFORMED, READY TO INGEST DATA IN FILE FORMAT

Benefits
Cost to restore. If a restore is performed, the data will not need to be transformed again saving money on CPU, but network charges would be higher.

Drawbacks
Flexibility. If a newer version is released with breaking changes related to the ingestion, the backups would need to be updated. If a new type of database is chosen, the backups would need to undergo a one-time transformation.
Cost to store. The size of the data would likely increase around 50% with extra information from the transformation. Same storage and transfer rates apply.
Time to restore. The restore process starts at about the middle of the process. Depending on the amount to restore and server resources, a restore could take half a day or more.

#3- KEEP COSTS DOWN AND RESTORE TIME SHORT: STORE A REPLICA OF THE DATABASE OFFLINE

Benefits
Cost to restore. No network fees will be charged, and the data will stay in place. Once turning the replica database online, the cluster will recover to the point at which the replica was made. The data missing since the backup was made will be charged as outgoing network traffic.
Time to restore. In a few minutes, the cluster will be back online with the data from the backup. In a few more minutes, the cluster will have caught up to the present depending on server resources and time since the backup was made.

Drawbacks
Flexibility. If a new version introduces breaking changes, the database will need to be restored, data dumped, database updated and a special one-time task of transforming and ingesting the data must occur. If a new database type is chosen, the data must be restored, dumped and transformed to the new format. This adds weeks, sometimes months, to upgrade or transition.
Cost to store. Storing data is more expensive, typically $0.05 per GB per month. However, there is no added cost for the server resources themselves.

#4 – WHEN SECONDS MATTER MOST: RUN A SEPARATE, LIVE CLUSTER

Benefits
Time to restore. Restoring the data will only take a few seconds. The load balancer will transfer all traffic to the second cluster without any intervention.
Cost to restore. There are no extra costs to restoring.

Drawbacks
Flexibility. If a new version introduced breaking changes, data would need to be dumped, transformed, cluster updated and ingested. If a new database type was chosen, data would need to be dumped, transformed and sent to the new database.
Cost to store. Running a second, live cluster will double the cost of the hosting services.

A FINAL CONSIDERATION: INCREASE REPLICATION WITHIN A CLUSTER

Increasing replication within a cluster will increase the amount of data stored and add failover for individual servers crashing. Additionally, searches can perform better if the replication transfers the load across more servers.

24x7 Data Engineering Support & Consulting

Visit our OpenSearch page for more details on our support services.

Data Backup

Data Backup

Data Backup

#1 – FOR MAXIMUM FLEXIBILITY AND LOW STORAGE COSTS: STORE THE ORIGINAL DATA IN FILE FORMAT

#2 – AVOID TRANSFORMATION OF DATA DURING A RESTORE: STORE THE TRANSFORMED, READY TO INGEST DATA IN FILE FORMAT

#3- KEEP COSTS DOWN AND RESTORE TIME SHORT: STORE A REPLICA OF THE DATABASE OFFLINE

24x7 Data Engineering Support & Consulting

24x7 Data Engineering Support & Consulting

24x7 Data Engineering Support & Consulting

Discover more from