Generative AI systems are trained by letting them surf the web to scrape content. Apple allows publishers to opt out of its scraping, and a new report says that many of the biggest websites have specifically opted out of Apple Intelligence training.
This includes both Facebook and Instagram, as well as many high-profile news and media sites like The New York Times and The Atlantic …
Large language models like ChatGPT are trained by giving them access to millions of words of source material, ranging from news stories to user comments.
In Apple’s case, the company has for years been using Applebot to train Siri and surface Spotlight suggestions. More recently, the company has also been using Applebot to train Apple Intelligence.
The practice is controversial, as AIs are effectively using copyrighted material to generate their own versions of it. For more niche topics, where source material is scarce, they have even been found to regurgitate entire …