brucethemoose
@brucethemoose@lemmy.world
- Comment on Nvidia unveils new GPU designed for long-context inference 2 days ago:
Jamba (hybrid transformers/space state) is a killer model folks are sleeping on. It’s actually coherent at long context, fast, has good world knowledge, even/grounded, and is good at RAG Its like a straight up better Cohere model IMO, and a no brainer to try for many long context calls.
TBH I didn’t try Falcon H1 much when it seemed to break at long context for me. I think most folks (at least publicly) are sleeping on hybrid SSMs because support in llama.cpp is not great. For instance, context caching does not work.
…Not sure about many others, toy models aside. There really aren’t too many to try.
- Comment on Nvidia unveils new GPU designed for long-context inference 2 days ago:
Doubling down on flash attention (my interpretation of this) is quite risky, as there are more efficient attention mechanisms seeping into bigger and bigger models.
Deepseek’s MLA is a start. Jamba is already doing hybrid GQA/Mamba attention, and a Qwen3 update is rumored to be using something exotic as well.
In English, this seems like they’re selling the idea of the software architecture not changing much, when that doesn’t seem to be the case.
- Comment on Bethesda planning a Starfield space gameplay revamp to make it more rewarding 2 weeks ago:
Oh you must mod the stink out of FO4.
Is there even much of a Starfield modding scene?
- Comment on Zuckerberg's Huge AI Push Is Already Crumbling Into Chaos 3 weeks ago:
But they are putting the horse before the cart. These APIs and models are unsexy commodities, and Meta doesn’t have anything close to something they can charge for. Even OpenAI and Anthropic can barely justify it these days.
Others building on top Llama get them there, though. Which all the Chinese companies recognize now and are emulating: they can open the model, let it snowball with communal development to wipe out closed competitors, then offer products on top of it.
What’s conspicuous is that (at least some) in Meta recognized this. But Zuck is so fickle he won’t stick with any good idea.
- Comment on Zuckerberg's Huge AI Push Is Already Crumbling Into Chaos 3 weeks ago:
Yeah the article is pretty bad… But the missing context is Zuckerberg let a lot of devs go, and the lab that actually built something neat (Llama 1-4) has all but been dismantled.
The new hires reek of tech bro and big egos butting together, especially the (alleged) talk to close source their next models. ‘TBD Lab’ is supposedly tasked with the next Llama release, but I am not holding my breath.
- Comment on Civilization 7's latest update has "hit mods harder than usual", but for a good reason 3 weeks ago:
Aside:
“We wanted to acknowledge that this update hit mods harder than usual,” community manager Sarah Engel wrote on the game’s Discord.
I despise Discord. Every single niche I love is now locked behind a bunch of unsearchable banter in closed, often invite-only apps.
- Comment on Civilization 7's latest update has "hit mods harder than usual", but for a good reason 3 weeks ago:
Yeah, I think early access is a great model. Certainly better than “release it and (maybe) fix it later”
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 3 weeks ago:
I mean, I’m a local AI evangelist and have made a living off it. The energy use of AI thing is total nonsense, as much as Lemmy doesn’t like to hear it.
I keep a 32B or 49B loaded pretty much all the time.
You are right about the theft vs social media thing too, even if you put it a little abrasively. Why people are so worked up in the face of machines like Facebook and Google is mind boggling.
…But AI is a freaking bubble, too.
Look at company valuations vs how shit isn’t working, and how much it costs.
Look around the ML research community. They all know Altman and his infinite scaling to AGI pitch is just a big fat tech bro lie. AI is going to move forward as a useful tool through making it smaller and more efficient, but transformers LLMs with randomized sampling are not just going to turn into real artificial intelligence if enough investors thrown money at these closed off enterprises.
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 3 weeks ago:
The irony is Zuck shuttered the absolute best asset they have: the Llama LLM team.
Cuz, you know, he’s a fickle coward who would say and do anything to hide his insecurity.
- Comment on Tencent doesn’t care if it can buy American GPUs again – it already has all the chips it needs 3 weeks ago:
I think the underlying message is making/serving AI isn’t a mythical goldmine: it’s becoming a dirt cheap commodity, and a tool for companies to use.
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 3 weeks ago:
This is all based on the assumption that AI will need exponential power.
It will not.
-
AI is a bubble.
-
Even if it isn’t, fab capacity is limited.
-
The actual ‘AI’ market is racing to the bottom with smaller, task focused models.
-
A bunch of reproduced papers (like bitnet) that reduce power exponentially are just waiting for someone to try a larger test.
-
Alltogether… inference moves to smartphones and PCs.
This is just the finance crowd parroting Altman. Not that the US doesnt need a better energy grid like China, but the justification is built on lies that just aren’t going to happen.
-
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 4 weeks ago:
because there’s seemingly not enough power infrastructure
This is overblown. I mean, if you estimate TSMC’s entire capacity and assume every data center GPU they make is full TDP 100% of the time (which is not true), the net consumption isn’t that high. The local power/cooling infrastructure things are more about corpo cost cutting.
Altman’s preaching that power use will be exponential is a lie that’s already crumbling.
But there is absolutely precedent for underused hardware flooding the used markets, or getting cheap on cloud providers. Honestly this would be incredible for the local inference community, as it would give tinkerers (like me) actually affordable access to experiment with.
- Comment on Please don't promote Wayland 4 weeks ago:
Yeah, not to speak of stuff that doesn’t work/work well in X, and it’s bizarre quirks.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 4 weeks ago:
Not a lot? The quirk is they’ve hyper specialized nodes around AI.
The GPU boxes are useful for some other things, but they will be massively oversupplied, and they mostly aren’t networked like supercomputer clusters.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 4 weeks ago:
I mean, hardware prices will fall if there’s a crash, like they did with crypto GPU mining.
I am salivating over this. Bring out the firesale A100s.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 4 weeks ago:
Ohhh yes. Altmans promotion for it was the Death Star coming up from behind a planet.
Maybe something on the corporate side, like big players not seeing a return of their investment.
Ohhh, it is. The big corporate hosters arent making much money and burning cash, and it’s not getting any better as specialized open models eat them from the bottom up.
- Comment on GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it. 4 weeks ago:
Nah, I tried them. For the size, they suck, mostly because there’s a high chance they will randomly refuse anything you ask them unless it STEM or Code.
…And there are better models if all you need is STEM and Code.
- Comment on GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it. 4 weeks ago:
Meanwhile, the chinese and other open models are are killing it. GLM 4.5 is sick. Jamba 1.7 is a great sleeper model for stuff outside coding and STEM. The 32Bs we have like EXAONE and Qwen3 (and finetuned experiments) are mad for 20GB files, and crowding out APIs. There are great little MCP models like Jan too.
Are they AGI? Of course not. They are tools, and that’s what was promised; but the improvements are real.
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 1 month ago:
, especially since something like a Mixture of Experts model could be split down to base models and loaded/unloaded as necessary.
It doesn’t work that way. All MoE experts are ‘interleaved’ and you need all of them loaded at once, for every token. Some API servers can hotswop wholes models, but its not fast, and rarely done since LLMs are pretty ‘generalized’ and tend to serve requests in parallel on API servers.
The closest to what you’re thinking of is LoRAX (which basically hot-swaps Loras efficiently). But it needs an extremely specialized runtime derived from its associated paper, hence people tend to not use it since it doesn’t support quantization and some other features as well: github.com/predibase/lorax
There is a good case for pure data processing, yeah… But it has little integration with LLMs themselves, especially with the API servers generally handling tokenizers/prompt formatting.
But, all of its components need to be localized
They already are! Local LLM tooling and engines are great and super powerful compared to ChatGPT (which offers no caching, no raw completion, primitive sampling, hidden thinking, and so on).
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 1 month ago:
SGLang is partially a scripting language for prompt building leveraging its caching/logprobs output, for doing stuff like filling in fields or branching choices, so it’s probably best done in that. It also requires pretty beefy hardware for the model size (as opposed to backends like exllama or llama.cpp that focus more on tight quantization and unbatched performance), so I suppose theres not a lot of interest from more local tinkerers?
It would be cool, I guess, but ComfyUI does feel more geared for diffusion. Image/video generation is more multimodel and benefits from dynamically loading/unloading/swapping all sorts of little submodels, loras and masks, applying them, piping them into each other and such.
LLM running is more monolithic: you have the 1 big model, maybe a text embeddings model as part of the same server, and everything else is just processing strings to build the prompts which one does linearly om python or whatever. Stuff like CFG and Loras do exist, but aren’t used much.
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 1 month ago:
Not specifically. Ultimately, ComfyUI would build prompts/API calls, which I tend to do in Python scripts.
I tend to use Mikupad or Open Web UI for more general testing.
There are some neat tools with ‘lower level’ integration into LLM engines, like SGlang (which leverages caching and constrained decoding) to do things one can’t do over standard APIs: docs.sglang.ai/frontend/frontend.html
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 1 month ago:
I mean, I run Nemotron and Qwen every day, you are preaching to the choir here :P
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 1 month ago:
AI is a tool (sorry)
This should be a bumper sticker. Also, thanks for this, bookmarking 404, wish I had the means to subscribe.
My hope is that the “AI” craze culminates in a race to the bottom where we end up in a less terrible state: local models on people’s phones, reaching out to reputable websites for queries and redirection.
And this would be way better for places like 404, as they’d have to grab traffic individually and redirect users there.
- Comment on Confirmed - China bans NVIDIA chips and accelerates its total independence from US technology 2 months ago:
Yeah honestly the Nvidia ban was stupid.
Everyone in the AI research space was saying it, but no, our old policymakers are captured by Altman, Musk and tech bros who would burn anything for their two years of pure anticompetitiveness.
The running joke is that the Nvidia ban was the best thing to ever happen to Chinese research, as it made them thrifty, while big US companies are lazily burning huge GPU farms scaling up and… not improving anything.
- Comment on X's new 'encrypted' XChat feature seems no more secure than the failure that came before it 3 months ago:
I have to wonder who this appeals to?
Most are already trapped in something established like Discord, WeChat, FB Messenger. As said, security isn’t a strong point, and there’s no engagement angle.
I guess if you already spend tons of time on X it’s kinda convenient?
- Comment on EA never grasped Dragon Age's value as an RPG, says Inquisition writer 3 months ago:
Side note, but even with all their troubles/turnover, I still love RPS’s hint of bite in their news writing (outside the columns).
- Comment on Tiny Corp heralds world's first AMD GPU driven via USB3 — eGPUs tested on Apple Silicon, with Linux and Windows also supported 3 months ago:
Tinycorp generates these headlines every once in awhile, but as far as I can tell no one uses it. At least not in the tinkerer space I can see.
It’d be cool if they can eat away at PyTorch, XLA and whatever else… Some day…
- Comment on Researchers unveil LegoGPT, an AI model that designs physically stable Lego structures from text prompts and currently supports eight standard brick types 3 months ago:
A pretty long time.
Niche models are tons of fun though.
- Comment on Consumers make their voices heard as Microsoft's huge venture flatlines in popularity 4 months ago:
Comment from the source:
Microsoft poisoned their own well with all the changes they have been forcing on users lately. The update nagging, resetting the default browser to edge, the the ads in windows features, and integrating bing into the start menu have all trained users that when Microsoft starts pushing something new, it probably isn’t great and should just be ignored, like ads in phone apps.
That ^. So much that.
Also, the copilot llm itself sucks. Local models are neat within their limitations, and they’d be even better if Microsoft made them trainable/customizable, did better RAG, or whatever, but they just shoved a bad thing down user’s throats, and now they’ve poisoned another well.
- Comment on Microsoft removes Windows 11 24H2 official support on 8th 9th 10th Gen Intel CPUs 6 months ago:
As crazy as that is, playing devil’s advocate, Comet Lake is basically the aging Skylake architecture.
Ice Lake though? WTF.