AI Policy & Governance, CDT AI Governance Lab
Out of Tune: Fine-Tuning Foundation Models Leads to Unpredictable Safety Drift
In addition to CDT, this report was authored by Emaan Bilal and Dylan Hadfield-Menell of the Algorithmic Alignment Group at the Massachusetts Institute of Technology (MIT).
General-purpose AI models are increasingly adapted by downstream developers for specialized uses — fro...
