February 22, 2024

Think Twice: High-Stakes Change Control at PayPal

PayPal experienced 971 seconds (16:11) of ATB (availability to the business) downtime, leading to over 350,000 failed transactions and $400k in lost revenue. The root cause was determined to be a faulty software change with the extended length of the outage being caused by inadequate rollback planning. In response, they launched a mandatory, recurring training program for software engineers called “Think Twice”. Below are my notes from that program.


Change Control

Failed changes are the largest contributor to downtown with undocumented changes causing 10x the impact of documented changes.

safe - doesn’t break anything
effective - does what it says its intend to do
follows policy - documented, has approvals & isolation procedures

Keys needed for successful change control

Change Approver

This is the person that reviews all of the actual code changes & accompanying document
They are ultimately on the hook for any issues that occurring during or resulting from the change

Change Planning & Execution

Change Validation

« All Posts