brucethemoose
@brucethemoose@lemmy.world
- Comment on Opera wants you to pay $19.90 per month for its new AI browser 4 days ago:
They probably don’t know.
- Comment on Opera wants you to pay $19.90 per month for its new AI browser 4 days ago:
Yep.
Vivaldi is basically the real Opera now, including some of its devs IIRC.
- Comment on DeepSeek-V3.2 released 1 week ago:
…Or are they an LLM? I mean, the handle is BroBot, and the emojii makes me suspicious, lol.
- Comment on DeepSeek-V3.2 released 1 week ago:
Deepseek is only bad via the chat app, and whatever prefilter (or finetune?) they censor it with.
The model itself (via API or run locally) isn’t too bad. Obviously there are CCP mandated gaps, but its not as tankie as you’d think.
- Comment on DeepSeek-V3.2 released 1 week ago:
With sparse attention, very interesting. It seems GQA is a thing of the past.
GLM 4.6 is reportedly about to drop too.
- Comment on OpenMW 0.50.0 for Morrowind has a first Release Candidate with gamepad support and a gamepad UI 1 week ago:
The git repo appears to be abandoned, with the newest progress being in small forks:
- Comment on Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card 1 week ago:
To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).
You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.
- Comment on Nvidia unveils new GPU designed for long-context inference 4 weeks ago:
Jamba (hybrid transformers/space state) is a killer model folks are sleeping on. It’s actually coherent at long context, fast, has good world knowledge, even/grounded, and is good at RAG Its like a straight up better Cohere model IMO, and a no brainer to try for many long context calls.
TBH I didn’t try Falcon H1 much when it seemed to break at long context for me. I think most folks (at least publicly) are sleeping on hybrid SSMs because support in llama.cpp is not great. For instance, context caching does not work.
…Not sure about many others, toy models aside. There really aren’t too many to try.
- Comment on Nvidia unveils new GPU designed for long-context inference 4 weeks ago:
Doubling down on flash attention (my interpretation of this) is quite risky, as there are more efficient attention mechanisms seeping into bigger and bigger models.
Deepseek’s MLA is a start. Jamba is already doing hybrid GQA/Mamba attention, and a Qwen3 update is rumored to be using something exotic as well.
In English, this seems like they’re selling the idea of the software architecture not changing much, when that doesn’t seem to be the case.
- Comment on Bethesda planning a Starfield space gameplay revamp to make it more rewarding 1 month ago:
Oh you must mod the stink out of FO4.
Is there even much of a Starfield modding scene?
- Comment on Zuckerberg's Huge AI Push Is Already Crumbling Into Chaos 1 month ago:
But they are putting the horse before the cart. These APIs and models are unsexy commodities, and Meta doesn’t have anything close to something they can charge for. Even OpenAI and Anthropic can barely justify it these days.
Others building on top Llama get them there, though. Which all the Chinese companies recognize now and are emulating: they can open the model, let it snowball with communal development to wipe out closed competitors, then offer products on top of it.
What’s conspicuous is that (at least some) in Meta recognized this. But Zuck is so fickle he won’t stick with any good idea.
- Comment on Zuckerberg's Huge AI Push Is Already Crumbling Into Chaos 1 month ago:
Yeah the article is pretty bad… But the missing context is Zuckerberg let a lot of devs go, and the lab that actually built something neat (Llama 1-4) has all but been dismantled.
The new hires reek of tech bro and big egos butting together, especially the (alleged) talk to close source their next models. ‘TBD Lab’ is supposedly tasked with the next Llama release, but I am not holding my breath.
- Comment on Civilization 7's latest update has "hit mods harder than usual", but for a good reason 1 month ago:
Aside:
“We wanted to acknowledge that this update hit mods harder than usual,” community manager Sarah Engel wrote on the game’s Discord.
I despise Discord. Every single niche I love is now locked behind a bunch of unsearchable banter in closed, often invite-only apps.
- Comment on Civilization 7's latest update has "hit mods harder than usual", but for a good reason 1 month ago:
Yeah, I think early access is a great model. Certainly better than “release it and (maybe) fix it later”
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 1 month ago:
I mean, I’m a local AI evangelist and have made a living off it. The energy use of AI thing is total nonsense, as much as Lemmy doesn’t like to hear it.
I keep a 32B or 49B loaded pretty much all the time.
You are right about the theft vs social media thing too, even if you put it a little abrasively. Why people are so worked up in the face of machines like Facebook and Google is mind boggling.
…But AI is a freaking bubble, too.
Look at company valuations vs how shit isn’t working, and how much it costs.
Look around the ML research community. They all know Altman and his infinite scaling to AGI pitch is just a big fat tech bro lie. AI is going to move forward as a useful tool through making it smaller and more efficient, but transformers LLMs with randomized sampling are not just going to turn into real artificial intelligence if enough investors thrown money at these closed off enterprises.
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 1 month ago:
The irony is Zuck shuttered the absolute best asset they have: the Llama LLM team.
Cuz, you know, he’s a fickle coward who would say and do anything to hide his insecurity.
- Comment on Tencent doesn’t care if it can buy American GPUs again – it already has all the chips it needs 1 month ago:
I think the underlying message is making/serving AI isn’t a mythical goldmine: it’s becoming a dirt cheap commodity, and a tool for companies to use.
- Comment on AI experts return from China stunned: The U.S. grid is so weak, the race may already be over 1 month ago:
This is all based on the assumption that AI will need exponential power.
It will not.
-
AI is a bubble.
-
Even if it isn’t, fab capacity is limited.
-
The actual ‘AI’ market is racing to the bottom with smaller, task focused models.
-
A bunch of reproduced papers (like bitnet) that reduce power exponentially are just waiting for someone to try a larger test.
-
Alltogether… inference moves to smartphones and PCs.
This is just the finance crowd parroting Altman. Not that the US doesnt need a better energy grid like China, but the justification is built on lies that just aren’t going to happen.
-
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 1 month ago:
because there’s seemingly not enough power infrastructure
This is overblown. I mean, if you estimate TSMC’s entire capacity and assume every data center GPU they make is full TDP 100% of the time (which is not true), the net consumption isn’t that high. The local power/cooling infrastructure things are more about corpo cost cutting.
Altman’s preaching that power use will be exponential is a lie that’s already crumbling.
But there is absolutely precedent for underused hardware flooding the used markets, or getting cheap on cloud providers. Honestly this would be incredible for the local inference community, as it would give tinkerers (like me) actually affordable access to experiment with.
- Comment on Please don't promote Wayland 1 month ago:
Yeah, not to speak of stuff that doesn’t work/work well in X, and it’s bizarre quirks.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 1 month ago:
Not a lot? The quirk is they’ve hyper specialized nodes around AI.
The GPU boxes are useful for some other things, but they will be massively oversupplied, and they mostly aren’t networked like supercomputer clusters.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 1 month ago:
I mean, hardware prices will fall if there’s a crash, like they did with crypto GPU mining.
I am salivating over this. Bring out the firesale A100s.
- Comment on The New Yorker Asks: Is the A.I. Boom Turning Into an A.I. Bubble? 1 month ago:
Ohhh yes. Altmans promotion for it was the Death Star coming up from behind a planet.
Maybe something on the corporate side, like big players not seeing a return of their investment.
Ohhh, it is. The big corporate hosters arent making much money and burning cash, and it’s not getting any better as specialized open models eat them from the bottom up.
- Comment on GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it. 1 month ago:
Nah, I tried them. For the size, they suck, mostly because there’s a high chance they will randomly refuse anything you ask them unless it STEM or Code.
…And there are better models if all you need is STEM and Code.
- Comment on GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it. 1 month ago:
Meanwhile, the chinese and other open models are are killing it. GLM 4.5 is sick. Jamba 1.7 is a great sleeper model for stuff outside coding and STEM. The 32Bs we have like EXAONE and Qwen3 (and finetuned experiments) are mad for 20GB files, and crowding out APIs. There are great little MCP models like Jan too.
Are they AGI? Of course not. They are tools, and that’s what was promised; but the improvements are real.
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 2 months ago:
, especially since something like a Mixture of Experts model could be split down to base models and loaded/unloaded as necessary.
It doesn’t work that way. All MoE experts are ‘interleaved’ and you need all of them loaded at once, for every token. Some API servers can hotswop wholes models, but its not fast, and rarely done since LLMs are pretty ‘generalized’ and tend to serve requests in parallel on API servers.
The closest to what you’re thinking of is LoRAX (which basically hot-swaps Loras efficiently). But it needs an extremely specialized runtime derived from its associated paper, hence people tend to not use it since it doesn’t support quantization and some other features as well: github.com/predibase/lorax
There is a good case for pure data processing, yeah… But it has little integration with LLMs themselves, especially with the API servers generally handling tokenizers/prompt formatting.
But, all of its components need to be localized
They already are! Local LLM tooling and engines are great and super powerful compared to ChatGPT (which offers no caching, no raw completion, primitive sampling, hidden thinking, and so on).
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 2 months ago:
SGLang is partially a scripting language for prompt building leveraging its caching/logprobs output, for doing stuff like filling in fields or branching choices, so it’s probably best done in that. It also requires pretty beefy hardware for the model size (as opposed to backends like exllama or llama.cpp that focus more on tight quantization and unbatched performance), so I suppose theres not a lot of interest from more local tinkerers?
It would be cool, I guess, but ComfyUI does feel more geared for diffusion. Image/video generation is more multimodel and benefits from dynamically loading/unloading/swapping all sorts of little submodels, loras and masks, applying them, piping them into each other and such.
LLM running is more monolithic: you have the 1 big model, maybe a text embeddings model as part of the same server, and everything else is just processing strings to build the prompts which one does linearly om python or whatever. Stuff like CFG and Loras do exist, but aren’t used much.
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 2 months ago:
Not specifically. Ultimately, ComfyUI would build prompts/API calls, which I tend to do in Python scripts.
I tend to use Mikupad or Open Web UI for more general testing.
There are some neat tools with ‘lower level’ integration into LLM engines, like SGlang (which leverages caching and constrained decoding) to do things one can’t do over standard APIs: docs.sglang.ai/frontend/frontend.html
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 2 months ago:
I mean, I run Nemotron and Qwen every day, you are preaching to the choir here :P
- Comment on The Media's Pivot to AI Is Not Real and Not Going to Work 2 months ago:
AI is a tool (sorry)
This should be a bumper sticker. Also, thanks for this, bookmarking 404, wish I had the means to subscribe.
My hope is that the “AI” craze culminates in a race to the bottom where we end up in a less terrible state: local models on people’s phones, reaching out to reputable websites for queries and redirection.
And this would be way better for places like 404, as they’d have to grab traffic individually and redirect users there.