a model-assisted safety pipeline with GPT4
GPT4 are using a model-assisted safety pipeline. These safety pipelines are used to fine-tune the model's behavior, just as in previous GPT models (GPT-3.5, 3.0). It uses reinforcement learning and human feedback (RLHF) to fine-tune the model's behavior, resulting in better responses than before. Nevertheless, even with reinforcement learning and human feedback (RLHF), models in GPT4 can still b..
2023.03.20