OpenAI: support for Reinforcement Fine-tuning available to verified orgs
by justanotheratom on 5/8/2025, 9:03:00 PM
my question for anyone who knows:
Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.
my question for anyone who knows:
Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.