Updated July 2020
Building robust infrastructure in a modern company means laying the appropriate foundation for data backups to meet your needs now and in the future. Below we outlined the four primary ways for backing up data and their benefits and drawbacks to help you decide on which approach best meets your company’s needs.
Keep in mind that more than one approach can be employed simultaneously.
#1 – FOR MAXIMUM FLEXIBILITY AND LOW STORAGE COSTS: STORE THE ORIGINAL DATA IN FILE FORMAT
Benefits
Flexibility. Newer versions of programs sometimes introduce breaking changes where certain data would need to be re-indexed. If a new type of database is chosen, then run the same transformation on your backups as the real-time data.
Cost to store. At $0.0245 per GB per month with free inbound network traffic, this will be the cheapest option. Network cost will only occur when a backup is performed and data sent out.
Drawbacks
Cost to restore. If a restore is performed, you will incur the cost of server resources to transform the data. Network charges are around $0.15 per GB restored.
Time to restore. The restore process starts at the beginning of the process rather than in the middle or later. Depending on the amount to restore and server resources, a restore could take a day or more.
#2 – AVOID TRANSFORMATION OF DATA DURING A RESTORE: STORE THE TRANSFORMED, READY TO INGEST DATA IN FILE FORMAT
Benefits
Cost to restore. If a restore is performed, the data will not need to be transformed again saving money on CPU, but network charges would be higher.
Drawbacks
Flexibility. If a newer version is released with breaking changes related to the ingestion, the backups would need to be updated. If a new type of database is chosen, the backups would need to undergo a one-time transformation.
Cost to store. The size of the data would likely increase around 50% with extra information from the transformation. Same storage and transfer rates apply.
Time to restore. The restore process starts at about the middle of the process. Depending on the amount to restore and server resources, a restore could take half a day or more.
#3- KEEP COSTS DOWN AND RESTORE TIME SHORT: STORE A REPLICA OF THE DATABASE OFFLINE
Benefits
Cost to restore. No network fees will be charged, and the data will stay in place. Once turning the replica database online, the cluster will recover to the point at which the replica was made. The data missing since the backup was made will be charged as outgoing network traffic.
Time to restore. In a few minutes, the cluster will be back online with the data from the backup. In a few more minutes, the cluster will have caught up to the present depending on server resources and time since the backup was made.
Drawbacks
Flexibility. If a new version introduces breaking changes, the database will need to be restored, data dumped, database updated and a special one-time task of transforming and ingesting the data must occur. If a new database type is chosen, the data must be restored, dumped and transformed to the new format. This adds weeks, sometimes months, to upgrade or transition.
Cost to store. Storing data is more expensive, typically $0.05 per GB per month. However, there is no added cost for the server resources themselves.
#4 – WHEN SECONDS MATTER MOST: RUN A SEPARATE, LIVE CLUSTER
Benefits
Time to restore. Restoring the data will only take a few seconds. The load balancer will transfer all traffic to the second cluster without any intervention.
Cost to restore. There are no extra costs to restoring.
Drawbacks
Flexibility. If a new version introduced breaking changes, data would need to be dumped, transformed, cluster updated and ingested. If a new database type was chosen, data would need to be dumped, transformed and sent to the new database.
Cost to store. Running a second, live cluster will double the cost of the hosting services.
A FINAL CONSIDERATION: INCREASE REPLICATION WITHIN A CLUSTER
Increasing replication within a cluster will increase the amount of data stored and add failover for individual servers crashing. Additionally, searches can perform better if the replication transfers the load across more servers.
Have questions about Kafka, Elasticsearch, or OpenSearch?
Get in touch with our expert engineers who have assisted hundreds of companies with Apache Kafka, Elasticsearch, OpenSearch, and supporting technologies.