a model-assisted safety pipeline with GPT4

2023. 3. 20. 14:26GPT-4

GPT4 are using a model-assisted safety pipeline.
These safety pipelines are used to fine-tune the model's behavior, just as in previous GPT models (GPT-3.5, 3.0). 

It uses reinforcement learning and human feedback (RLHF) to fine-tune the model's behavior, resulting in better responses than before.

Nevertheless, even with reinforcement learning and human feedback (RLHF), models in GPT4 can still be vulnerable to unsafe inputs. 
They may still exhibit unwanted behavior on both safe and unsafe inputs. 

 

Bing chat using GPT-4 is already using such ‘rule-based reward models’


This unwanted behavior can occur when the reward model does not have specific instructions for the labelers. 

It can happen when the data collection part is unspecified in its instructions to the labelers. 

Given unsafe inputs, the model can generate undesirable content, such as This means that could be unwittingly exposed to undesirable content, for example, advice about committing a crime.

Models can also become overly cautious about safe inputs, rejecting harmless requests or over-hedging. 
Our approach to safety consists of two main components.


It consists of safety-related RLHF training prompts and rule-based reward models (RBRMs/RBRMs).
rule-based reward models (RBRMs/RBRMs) have some limitations but can generate safer and more socially appropriate conversations for conversations with AI-enabled chatbots.

'GPT-4' 카테고리의 다른 글

GPT-4 OpenAI의 새롭게 향상된 챗봇 새 버전 발표  (0) 2023.03.15