Running AI evaluations in production is not a one-time script. It is a continuous operational discipline. This includes scheduling compute-intensive jobs, managing concurrency, tracking experiments across model versions, enforcing resource quotas of workloads competing for precious accelerator resources, and surviving cluster restarts without losing state.
If you have operated production machine l...
