Hacker News
new
|
ask
|
show
|
jobs
Refusal in Language Models Is Mediated by a Single Direction
(arxiv.org)
117 points
by
fagnerbrack
4 days ago
|
45 comments
Loading...