Disaster Recovery

Overview

Disaster recovery (DR) is essential for business continuity during unexpected disruptions. This page summarizes the main concerns and components of DR for software production.

What is Disaster Recovery?

Disaster recovery covers the strategies and processes to restore essential software services after events like hardware failures, cyber attacks, natural disasters, or human error. The goal is to recover systems, applications, and data quickly, minimizing downtime and data loss.

Core Components

Effective DR planning includes:

Regular backups of critical data/configurations
Redundant infrastructure to avoid single points of failure
Documented recovery procedures for disaster scenarios
Frequent testing to ensure readiness

Recovery Objectives

DR plans should define:

RTO (Recovery Time Objective): Max time to restore services
RPO (Recovery Point Objective): Max acceptable data loss

These set expectations for restoration and data protection.

Organizations should evaluate their DR needs against industry-standard disaster recovery tiers (Tier 0-7), which provide a framework for understanding different levels of data protection, recovery capabilities, and associated costs.

Beyond Restoration

DR must also ensure:

Data integrity during/after recovery
Security controls remain effective
Compliance with regulations

A strong DR plan reduces risk, protects reputation, and helps meet SLAs.

JFrog Platform Disaster Recovery

The JFrog Platform is critical for software delivery, so high availability is vital.

Effective DR for JFrog combines a robust multi-site deployment with a clear DR playbook for different scenarios.

If limited to a single site, use JFrog SaaS for built-in regional DR and a 99.9% SLA.

A multi-site deployment runs JFrog Platform across multiple locations, connected by access federation (for users, permissions, security) and repository federation (for artifacts and metadata). If one site fails, another can take over, minimizing downtime and data loss.

A DR playbook is essential for guiding recovery from:

Site/data center outages
Data corruption or deletion
Security incidents or ransomware
Network failures

See the DR playbook for details.

Combining geo-distributed multi-site deployment with a scenario-based DR playbook keeps the JFrog Platform available, secure, and compliant during disruptions.

Architecture

A robust JFrog Platform should use two or more geographically distinct sites with access and repository federation.

Sites can be JFrog SaaS (recommended), self-hosted, or hybrid.

Multi-Site DR Options

JFrog SaaS: Active-passive, Active-active
Self-Hosted: Active-passive, Active-active

Single Site DR

JFrog SaaS (Recommended): Single-site SaaS includes regional DR with 24h RTO and 1h RPO.

Self-Hosted: RTO depends on restore time from backup; RPO depends on backup frequency. Regular backups are needed:

Artifactory Backup: Backup guide

A clear DR playbook is also required. See the DR playbook for more.