Tumblr and WordPress users may soon find their data being used to train artificial intelligence (AI) models, according to a report. The blog sites’ parent company, Automattic, is said to have struck deals with OpenAI and Midjourney to sell user-generated content that will reportedly be used to help train the AI. While the details of the deals and data-sharing practices remain unclear at this time, it has raised questions about data privacy and the ethics of companies that share their users’ data with third parties.
Internal communications from Automattic employees reviewed by 404 Media confirmed the deal with the AI companies and revealed details of those practices. In his report, the publication confirmed that Automattic’s deal with OpenAI and Midjourney could be announced soon. Additionally, it appears that data collection for AI firms has already begun. Meanwhile, an internal post made by product manager Cyle Gage suggests that all public content of Tumblr posts between 2014 and 2023 has been compiled.
The report also highlighted a specific message suggesting that private and deleted user content was also automatically compiled along with public data. It was not clear whether this data set has already been shared with the AI firms or not. Moreover, since such an incident puts the personal information of the entire user base at risk, it also raises a question about the company’s ethical policy and data safety infrastructure.
Automattic on Tuesday issued a statement stating, “AI is rapidly transforming almost every aspect of our world, including the way we create and consume content. At Automattic, we’ve always believed in a free and open network and individual choice. Like other technology companies, we are closely monitoring these developments, including how to work with AI companies in a way that respects the preferences of our users.”
The post details several things the company is doing for its users, including blocking AI platform bots, a setting to discourage search engines from indexing WordPress and Tumblr sites, and an opt-out guarantee for users who don’t want to share data with the third party . “There is currently no law that requires robots to follow these preferences,” the post said.
The mechanism for opting out of data sharing is also a bit unclear. While the company said in the post that AI firms will honor opt-out settings and even remove past content from users who recently opted out, the report claims the reality is more complicated.
The report found an internal document dated February 23 where an employee asked if the company had any confidence that the data partner would honor the opt-out decision made by users. Andrew Spittle, Automattic’s head of AI, reportedly responded: “We will request that the content be deleted and removed from any future training. I believe the partners will respect that based on our conversations with them so far. I don’t think they gain much overall by keeping him.”
The response was noted to be vague and did not confirm whether Automattic had an agreement for the same, according to the report. Also, the whole line of reasoning seems to be based on the assumption that AI firms won’t gain much from keeping user data. It should be noted that the practice of sharing data with third parties is not new, and most social media platforms own the rights to user-generated public content on the platform. However, making such deals without disclosing them to users could potentially expose private information to companies that use the same data to train AI systems.
For details on the latest launches and news from Samsung, Xiaomi, Realme, OnePlus, Oppo and more at Mobile World Congress in Barcelona, visit our MWC 2024 hub.
https://www.gadgets360.com/ai/news/tumblr-wordpress-sell-user-data-train-openai-midjourney-ai-models-5144417#rss-gadgets-all