Comment on Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

LodeMike@lemmy.today ⁨1⁩ ⁨week⁩ ago

I guarantee you it’s not the model doing that. Maybe its a secondary model trained to detect stuff but not the one just generating tokens.

source
Sort:hotnewtop