Model collapse is when generative AI models train on AI-generated content (as opposed to content created by humans) and deteriorate as they forget the human-generated data they learned from and begin copying patterns they’ve already seen.
In an email to Cosmos, machine learning researcher Ilia Shumailov, who co-authored a paper on the topic, uses the following analogy:
- A model receives a data set containing 90 yellow objects and 10 blue ones.
- Because there are more yellow objects, it begins to turn the blue objects greenish.
- Over time, it forgets the blue objects exist.
With each generation of synthetic data, outliers disappear and outputs less accurately reflect reality until what’s left is nonsense.
A solution might sound simple at first
Don’t train AI on AI-generated content, sure. Except it’s already proliferating across the internet.
“Just as we’ve strewn the oceans with plastic trash and filled the atmosphere with carbon dioxide, so we’re about to fill the Internet with blah,” Ross Anderson, a security expert and co-author of Shumailov’s paper, wrote.
So will AI inevitably go haywire?
Any model that started spitting out utter nonsense would probably be shut down by whatever tech company put it out there in the first place.
However, Aditi Raghunathan, a computer scientist at Carnegie Mellon University, told The Atlantic that the real danger would be less obvious flaws. For example, biases that creep in as AI caters to the majority.
So, you know, just one more thing to worry about when not worrying about singularity.