It was a nice sleight of hand on their part. There is a lot of misleading information about all of it since they only release pre-training details on DeepSeek-V3 model, but not DeepSeek-R1. But the media reported on it as it was one and the same without any distinction.
Based on reports, the parent company had access to more GPUs than reported amount used. Hard to tell if they were utilized though.
rumba@lemmy.zip 1 week ago
Yeah, whatever the case, They were all trained on data from the public. The very least they can do is make the models available to the public.