RLHF
ai-mlReinforcement Learning from Human Feedback, a training technique used to align language models with human preferences
Pronunciation
Correct
R-L-H-F
/ɑːr ɛl eɪtʃ ɛf/
Spelled out letter by letter as an initialism. There is no accepted word-form pronunciation.
Source: arxiv.org(official spec)