a model-assisted safety pipeline with GPT4

a model-assisted safety pipeline with GPT4

2023. 3. 20. 14:26ㆍGPT-4

GPT4 are using a model-assisted safety pipeline.
These safety pipelines are used to fine-tune the model's behavior, just as in previous GPT models (GPT-3.5, 3.0).

It uses reinforcement learning and human feedback (RLHF) to fine-tune the model's behavior, resulting in better responses than before.

Nevertheless, even with reinforcement learning and human feedback (RLHF), models in GPT4 can still be vulnerable to unsafe inputs.
They may still exhibit unwanted behavior on both safe and unsafe inputs.

Bing chat using GPT-4 is already using such ‘rule-based reward models’

This unwanted behavior can occur when the reward model does not have specific instructions for the labelers.

It can happen when the data collection part is unspecified in its instructions to the labelers.

Given unsafe inputs, the model can generate undesirable content, such as This means that could be unwittingly exposed to undesirable content, for example, advice about committing a crime.

Models can also become overly cautious about safe inputs, rejecting harmless requests or over-hedging.
Our approach to safety consists of two main components.

It consists of safety-related RLHF training prompts and rule-based reward models (RBRMs/RBRMs).
rule-based reward models (RBRMs/RBRMs) have some limitations but can generate safer and more socially appropriate conversations for conversations with AI-enabled chatbots.

'GPT-4' 카테고리의 다른 글

GPT-4 OpenAI의 새롭게 향상된 챗봇 새 버전 발표 (0)	2023.03.15

GPT-4 OpenAI의 새롭게 향상된 챗봇 새 버전 발표 2023.03.15

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

ChatGPT GPT-4 OpenAI Parameters, Paper

ChatGPT GPT-4 OpenAI Parameters, Paper

태그

최근글

댓글

공지사항

아카이브

'GPT-4' 카테고리의 다른 글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역