The dark deep side of DeepSeek: Fine-tuning attacks against the safety alignment of CoT-enabled models.
Submitted 1 year ago by Cat@ponder.cat to technology@lemmy.zip
https://arxiv.org/abs/2502.01225
Submitted 1 year ago by Cat@ponder.cat to technology@lemmy.zip
https://arxiv.org/abs/2502.01225