Comment on Meta asks the US government to block OpenAI’s switch to a for-profit

<- View Parent
brucethemoose@lemmy.world ⁨1⁩ ⁨week⁩ ago

A small startup called Arcee AI actually “distilled” logits from several other models (Llama, Mistral) and used the data to continue train Qwen 2.5 14B (which itself is Apache 2.0). It’s called supernova medius, and it’s quite incredible for a 14B model… SOTA as far as I know, even with their meager GPU resources.

A company called upstage “expands” models to larger parameter counts by continue training them. Look up the SOLAR series.

And quite notably, Nvidia continue trained Llama 3.1 70B and published the weights as Nemotron 70B. It was the best 70B model for awhile, and may still be in some areas.

And some companies like Cohere continuously train the same model slowly, and offer it over API, but occasionally publish the weights to promote them.

source
Sort:hotnewtop