“AI” models are, essentially, solvers for mathematical system that we, humans, cannot describe and create solvers for ourselves.
For example, a calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly. A language, thought? Or an image classifier? That is not possible to create by hand.
With “AI” instead of designing all the logic manually, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for some incredibly complex system.
If we were to try to make a regular calculator that way and all we were giving the model was “2+2=4” it would memorize the equation without understanding it. That’s called “overfitting” and that’s something people being AI are trying their best to prevent from happening. It happens if the training data contains too many repeats of the same thing.
However, if there is no repetition in the training set, the model is forced to actually learn the patterns in the data, instead of data itself.
Essentially: if you’re training a model on single copyrighted work, you’re making a copy of that work via overfitting. If you’re using terabytes of diverse data, overfitting is minimized. Instead, the resulting model has actual understanding of the system you’re training it on.
Corporations have been trying to control more and more of what users do and how they do it for longer than AI has been a “threat”. I wouldn’t say AI changes anything. At most, maybe, it might accelerate things a little. But if I had to guess, the corpos are already moving as fast as they can with locking everything down for the benefit of no one, but them.