Fault tolerance - hardware redundancy. The capability of any system to continue functioning after some part of the system has failed.
High availability - making sure that our systems are up (measured in uptime) - try to maintain the 5 9’s (99.999%)
Load balancing - distributes incoming requests across various server in the incoming requests based on location
NIC teaming - Connecting multiple NICs in tandem to increase bandwidth in smaller increments.
Port aggregation - joining multiple network device ports together for increased bandwidth and redundancy.
Clustering - connecting systems together with the intent of delivering network services from the cluster to increase responsiveness and capacity. This solution also increases availability and redundancy.
Power management
Battery backups/UPS - An uninterruptible power supply (UPS) protects your computer in the event of a power sag or power outage. A UPS essentially contains a big battery that provides AC power to your computer regardless of the power coming from the AC outlet. It does not provide enough power for you to continue working.
Power generators - A generator provides power redundancy. Provides electricity if the power utility fails.
Dual power supplies - Secondary source of power in the event that primary power fails.
Redundant circuits - specialized electrical hardware like rack-mounted AC distribution boxes. AC distribution system can supply multiple dedicated AC circuits to handle any challenging setups/systems.
Recovery
Cold sites (weeks/months to get back up and running)
Building is available, but you may not have any hardware or software in place or configured
You need to buy resources (or ship them in), and then configure/restore the network
Recovery is possible, but slow and time consuming
Warm sites (24 hrs - 7 days to get back up and running)
Building & equipment is available
Software may not be installed and latest data is not available
Recovery is fairly quick, but not everything from original site is available for employees
Hot sites (high cost)
Building, equipment, and data is available
Software and hardware is configured
Basically, people can just walk into the new facility and get back to work
Downtime is minimal w/nearly identical service level maintained
Backups
Full - Complete backup is the safest and most comprehensive, time consuming & costly
Differential - Only backups data since the last backup
Incremental - Backup only data changed since last incremental backup - need all backups for full restore
Snapshots - read-only copy of data frozen in time (VM’s)
MTTR (mean time to repair) - measures the average time it takes to repair a network device
MTBF (mean time between failures) - measures the average time between failures of a network device
SLA requirements - quality availability, specific responsibilities - are agreed upon between the service provider and the service user.