I think the opposite is much more true. A surveillance state is an AI state. Surveillance predates AI by quite a bit.
The AI State is a Surveillance State.
Submitted 1 week ago by Tea@programming.dev to technology@lemmy.zip
https://www.techpolicy.press/the-ai-state-is-a-surveillance-state/
Comments
possiblylinux127@lemmy.zip 1 week ago
chicken@lemmy.dbzer0.com 1 week ago
Any AI model trained on government data is a central repository of data, and the more information consolidated within the model, the greater a violation of privacy rights it becomes. Prediction errors and statistical mistakes threaten our daily lives, as mirages in the desert of abstract data, so-called “hallucinations” can create false justifications for selective or targeted enforcement.
That does seem bad, though I guess I kind of assumed the government already had such consolidated searchable information on people, given all the spying they’ve been known for doing. It makes sense that you shouldn’t trust them with any information because there’s no telling what it will be used for.
obbeel@lemmy.eco.br 1 week ago
Not only the big players extract data from the common citizen, but it also enforces information upon them. AI will make people interact through exchange of knowledge less, and concentrate all the “talk” and information on the hands of few. I think this is a big problem, especially as we near the quantum computation era. How can individuals and smaller organizations possibly compete in AI quality on that scenario? But maybe hardware power won’t be the greatest force in Artificial Intelligence.
pcalau12i@lemmygrad.ml 1 week ago
Eh, individuals can’t compete with corpos not just because they have access to more data but because making progress in AI requires a large team of well-educated researchers and sufficient capital to be able to experiment with vast technology. It’s a bit like expecting an individual or small business to be able to compete with smartphone manufacturers. It really is not feasible not simply because smartphone manufacturers are using dirty practices but because producing smartphones requires an enormous amount of labor and capital and simply cannot be physically carried out by an individual.
This criticism might be more applicable to a medium-sized business like DeepSeek that is not really “small” but smaller than the others (and definitely not a single individual) and still big enough to still compete, and we can see they still could compete just fine despite the current situation.
The truth is that both USA and China recognize all purely AI-generated work as de facto public domain. That means anything ChatGPT or whatever spits out, no matter what their licensing says, is absolutely free to use however you wish and you will win in court if they try to stop you. There is a common myth that training AI on synthetic data will always be negative. It’s actually only sometimes true if you train the AI on its own synthetic data, but through a process they call “distillation” you can train a less intelligent AI on synthetic data from a more intelligent AI and it will actually improve its performance.
That means any AI made by big companies can be distilled into any other AI to improve its performance. This is because you effectively have access to all the data the big companies have access to but indirectly through the synthetic data their AI can produce. For example, if for some reason you curated the information the AI was trained on so it never encountered the concept of a dog, it simply wouldn’t know what a dog is. If it encountered it a lot, it would know what a dog is and could explain it if you asked. Hence, that information is effectively accessible indirectly by simply asking the AI for it.
If you use distillation then you should can make effectively your own clones of any big company’s AI model and it’s perfectly legal. Not only that, but you can make improvements to it as well. You aren’t just cloning models, but you have the power to modify them. during this distillation process.
Imagine if the initial model was trained using a particular technique that is rather outdated and you believe you’ve invented a new method that if re-trained would produce a smarter AI, but you simply lack access to the original data. What you can instead do is generate a ton of synthetic data from the AI and then train your new AI using the new method on that synthetic data. Your new AI will have access to most of the same information but now trained on a superior technique.
We have seen some smaller companies already take pre-existing models and use distillation to improve them, such as DeepSeek taking the Qwen models and distilling R1 reasoning techniques into them to improve their performance.
obbeel@lemmy.eco.br 1 week ago
I think it’s important to come up with other forms of generating synthetic data that doesn’t come from distilling other models. Translating documents, OCRing old documents and using Digital Twins to train visual models come to mind. I’ve never successfully trained any model text-related, but I think the quality of the original text should be critical in how it will perform.