bytez
Search
Feed
Models
Agent
Devs
API Dashboard
docs
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment | Read Paper on Bytez