Skip to content
AI Policy & Governance, CDT AI Governance Lab Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift In addition to CDT, this report was authored by Emaan Bilal and Dylan Hadfield-Menell of the Algorithmic Alignment Group at the Massachusetts Institute of Technology (MIT). General-purpose AI models are increasingly adapted by downstream developers for specialized uses — fro...
Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift | Huntaegis