Hacker News Clone

OpenAI: support for Reinforcement Fine-tuning available to verified orgs

by justanotheratom on 5/8/2025, 9:00:31 PM with 1 comments

by justanotheratom on 5/8/2025, 9:03:00 PM
my question for anyone who knows:
Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.