Disaster Recovery
Overview
Disaster recovery (DR) is essential for business continuity during unexpected disruptions. This page summarizes the main concerns and components of DR for software production.
What is Disaster Recovery?
Disaster recovery covers the strategies and processes to restore essential software services after events like hardware failures, cyber attacks, natural disasters, or human error. The goal is to recover systems, applications, and data quickly, minimizing downtime and data loss.
Core Components
Effective DR planning includes:
- Regular backups of critical data/configurations
- Redundant infrastructure to avoid single points of failure
- Documented recovery procedures for disaster scenarios
- Frequent testing to ensure readiness
Recovery Objectives
DR plans should define:
- RTO (Recovery Time Objective): Max time to restore services
- RPO (Recovery Point Objective): Max acceptable data loss
These set expectations for restoration and data protection.
Organizations should evaluate their DR needs against industry-standard disaster recovery tiers (Tier 0-7), which provide a framework for understanding different levels of data protection, recovery capabilities, and associated costs.
Beyond Restoration
DR must also ensure:
- Data integrity during/after recovery
- Security controls remain effective
- Compliance with regulations
A strong DR plan reduces risk, protects reputation, and helps meet SLAs.
JFrog Platform Disaster Recovery
The JFrog Platform is critical for software delivery, so high availability is vital.
Effective DR for JFrog combines a robust multi-site deployment with a clear DR playbook for different scenarios.
If limited to a single site, use JFrog SaaS for built-in regional DR and a 99.9% SLA.
A multi-site deployment runs JFrog Platform across multiple locations, connected by access federation (for users, permissions, security) and repository federation (for artifacts and metadata). If one site fails, another can take over, minimizing downtime and data loss.
A DR playbook is essential for guiding recovery from:
- Site/data center outages
- Data corruption or deletion
- Security incidents or ransomware
- Network failures
See the DR playbook for details.
Combining geo-distributed multi-site deployment with a scenario-based DR playbook keeps the JFrog Platform available, secure, and compliant during disruptions.
Architecture
A robust JFrog Platform should use two or more geographically distinct sites with access and repository federation.
Sites can be JFrog SaaS (recommended), self-hosted, or hybrid.
Multi-Site DR Options
- JFrog SaaS: Active-passive, Active-active
- Self-Hosted: Active-passive, Active-active
Single Site DR
JFrog SaaS (Recommended): Single-site SaaS includes regional DR with 24h RTO and 1h RPO.
Self-Hosted: RTO depends on restore time from backup; RPO depends on backup frequency. Regular backups are needed:
- Artifactory Backup: Backup guide
A clear DR playbook is also required. See the DR playbook for more.