Data loss, though unlikely, can happen to even the most established companies. This risk is
reduced at Voltade because we do not store personally identifiable information and data that
users enter into calculators. However, we still treat data loss seriously and have developed a
backup strategy to recover from potential loss of data.
Data loss can occur due to a range of issues: improper environment isolation, faulty database
migration scripts, erroneous database queries, bugs in application code, AWS/Azure/other clouds'
zone/region failure, malicious code execution, or account compromise. Ensuring service
continuity and reliability requires robust backup processes and well-defined recovery
capabilities.
To measure our ability to recover from a disruptive event, Voltade considers four critical
metrics:
Recovery Time Objective (RTO)
Definition: The maximum duration of service downtime we aim not to
exceed before the system is recovered.
Voltade's Target: 6 hours for complete downtime
Recovery Time Capability (RTC)
Definition: The actual measured or guaranteed capability for how
quickly we can bring services back online, based on our infrastructure, automation, and
tested procedures.
Voltade's Current Capability: 1-3 hours, depending on the complexity of
the incident.
Recovery Point Objective (RPO)
Definition: The maximum amount of data that could be lost before the
service is restored (i.e., how far "back in time" we might have to go to recover data).
Voltade's Target: 12 hours of data
Recovery Point Capability (RPC)
Definition: Our actual, proven capability to restore data to a specific
point in time, governed by how frequently backups are performed and how quickly they can
be accessed.
Voltade's Current Capability: 10 minutes to 1 hour of data due to our
point in time regular snapshots
To minimise downtime and data loss, we ensure that backup data is easily accessible and can be
restored quickly:
1. Automated Backup Service
-
Voltade utilises the automated backup services provided by our database platform,
including point-in-time recovery (PITR), snapshot-based backups, and read-replica
failovers.
-
Failover to a read-replica provides the lowest possible RTO and helps maintain an
RTC at or below our target. It also reduces potential data loss, thereby improving
our RPC.
2. Offline Backups (Multi-Cloud)
-
In a scenario where AWS is completely unavailable or data stored on AWS is
irrecoverable, Voltade relies on offline backups stored on another cloud service
(e.g., Google Cloud) as a last resort.
-
These offline backups protect us from catastrophic failures and ensure we can
restore critical data if primary backups become inaccessible, preserving both our
RPO and RPC goals.
Through these strategies, Voltade continuously refines its RTO, RPO, RTC, and RPC metrics to
ensure rapid recovery and minimal data loss in the event of a disruption. We regularly test
these capabilities to confirm alignment with our objectives and to identify areas for
improvement.