Close
Image News

OpenAI and Midjourney using Tumblr and WordPress data

Discover how your Tumblr & WordPress content could shape the future of AI. OpenAI & Midjourney may soon learn from your posts. What does this mean for privacy and control?

OpenAI and Midjourney using Tumblr and WordPress data
Javier Rodriguez
  • PublishedFebruary 28, 2024

Emerging partnerships in AI ⁢data training

Recent reports indicate⁢ that ‌Automattic, the parent company of Tumblr⁢ and WordPress, is on the verge of finalizing agreements with AI powerhouses OpenAI and Midjourney. These deals are poised ⁣to provide a wealth of user-generated content to assist in the refinement of the AI firms’ learning‌ algorithms.‌ AI Media Cafe ‍has learned that the specifics of the data to be shared remain⁢ somewhat ambiguous, but there ⁢are concerns that Automattic may have initially planned to include sensitive information not intended for such transactions.

Concerns ⁢over data privacy

According to an internal memo from Tumblr’s product manager, Cyle Gage, there was preparation to dispatch⁢ data that⁢ should have remained confidential, including private interactions and content from premium partners. Automattic’s engineers‍ are reportedly compiling a list of ⁤post IDs that‌ should be omitted from⁣ the data transfer, although it is not confirmed whether the ⁢data has already been shared ⁤with the AI entities.

In response to inquiries, Automattic has⁤ publicly stated its intention to share only⁤ publicly available content from users⁢ who have⁣ not exercised⁣ their ‍right to opt‍ out. However, they also acknowledged that current ⁣legal frameworks do not obligate AI web crawlers to respect⁣ such preferences.

Automattic’s commitment to user preferences

Automattic has emphasized its commitment to user ⁣privacy, stating that any partnerships will honor opt-out settings. Furthermore, the company has announced plans to ⁤introduce⁣ a new tool that‌ would enable users ⁣to prevent third parties, including⁤ AI firms, from utilizing their ⁤data for machine learning purposes. ⁤This tool, as ​reviewed by 404⁣ Media, would place users on ⁣a disallowed⁤ list,‌ effectively blocking data crawlers, with provisions‍ to update partners about new opt-outs and requests for content removal from past⁣ and ​future AI​ training.

The ⁢language​ used, which suggests Automattic ⁣will “ask” AI companies to delete‌ the data, raises questions about the enforceability of such requests.⁢ Automattic’s AI lead, Andrew Spittle, has conveyed a commitment to ongoing advocacy for the exclusion of content based ‌on current user preferences, expressing confidence that AI partners will respect these⁢ wishes.

The broader context of AI ⁣data ⁣deals

Deals involving AI data training are becoming‌ increasingly valuable ​for online platforms seeking to navigate the challenging digital publishing environment. For instance, Google recently entered into an agreement with Reddit to utilize its extensive repository of user-generated content, coinciding with Reddit’s preparations for an initial public offering. Similarly, OpenAI has been actively forming partnerships to ⁤amass datasets from various sources to enhance its AI models.

As the landscape of⁣ AI and data privacy continues to evolve, ‍companies like Automattic‌ are finding themselves at the intersection of technological advancement and user ⁢trust. The development of tools and partnerships⁣ that respect user preferences will​ be critical⁢ in maintaining that trust while contributing⁣ to the growth of AI capabilities.

Javier Rodriguez
Written By
Javier Rodriguez

Javier Rodriguez is a distinguished Spanish journalist renowned for his profound interest in technology and artificial intelligence. With a career spanning several years, Rodriguez has established himself as a leading voice in the tech journalism landscape.