Comment

Comment on Nvidia unveils new GPU designed for long-context inference

brucethemoose@lemmy.world ⁨2⁩ ⁨months⁩ ago

Jamba (hybrid transformers/space state) is a killer model folks are sleeping on. It’s actually coherent at long context, fast, has good world knowledge, even/grounded, and is good at RAG Its like a straight up better Cohere model IMO, and a no brainer to try for many long context calls.

TBH I didn’t try Falcon H1 much when it seemed to break at long context for me. I think most folks (at least publicly) are sleeping on hybrid SSMs because support in llama.cpp is not great. For instance, context caching does not work.

…Not sure about many others, toy models aside. There really aren’t too many to try.

source

Sort:hotnew top