Model collapse is when generative AI models train on AI-generated content (as opposed to content created by humans) and deteriorate as they forget the human-generated data they learned from and begin copying patterns they’ve already seen.
In an email to Cosmos, machine learning researcher Ilia Shumailov, who co-authored a paper on the topic, uses the following analogy:
With each generation of synthetic data, outliers disappear and outputs less accurately reflect reality until what’s left is nonsense.
Don’t train AI on AI-generated content, sure. Except it’s already proliferating across the internet.
“Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah,” Ross Anderson, a security expert and co-author of Shumailov’s paper, wrote.
Any model that started spitting out utter nonsense would probably be shut down by whatever tech company put it out there in the first place.
However, Aditi Raghunathan, a computer scientist at Carnegie Mellon University, told The Atlantic that the real danger would be less obvious flaws. For example, biases that creep in as AI caters to the majority.
So, you know, just one more thing to worry about when not worrying about singularity.