Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift

AI Policy & Governance, CDT AI Governance Lab Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift In addition to CDT, this report was authored by Emaan Bilal and Dylan Hadfield-Menell of the Algorithmic Alignment Group at the Massachusetts Institute of Technology (MIT). General-purpose AI models are increasingly adapted by downstream developers for specialized uses — fro...

Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift

Facts Only

Executive Summary

Full Take