OpenAI and Midjourney using Tumblr and WordPress data
Discover how your Tumblr & WordPress content could shape the future of AI. OpenAI & Midjourney may soon learn from your posts. What does this mean for privacy and control?
Emerging partnerships in AI data training
Recent reports indicate that Automattic, the parent company of Tumblr and WordPress, is on the verge of finalizing agreements with AI powerhouses OpenAI and Midjourney. These deals are poised to provide a wealth of user-generated content to assist in the refinement of the AI firms’ learning algorithms. AI Media Cafe has learned that the specifics of the data to be shared remain somewhat ambiguous, but there are concerns that Automattic may have initially planned to include sensitive information not intended for such transactions.
Concerns over data privacy
According to an internal memo from Tumblr’s product manager, Cyle Gage, there was preparation to dispatch data that should have remained confidential, including private interactions and content from premium partners. Automattic’s engineers are reportedly compiling a list of post IDs that should be omitted from the data transfer, although it is not confirmed whether the data has already been shared with the AI entities.
In response to inquiries, Automattic has publicly stated its intention to share only publicly available content from users who have not exercised their right to opt out. However, they also acknowledged that current legal frameworks do not obligate AI web crawlers to respect such preferences.
Automattic’s commitment to user preferences
Automattic has emphasized its commitment to user privacy, stating that any partnerships will honor opt-out settings. Furthermore, the company has announced plans to introduce a new tool that would enable users to prevent third parties, including AI firms, from utilizing their data for machine learning purposes. This tool, as reviewed by 404 Media, would place users on a disallowed list, effectively blocking data crawlers, with provisions to update partners about new opt-outs and requests for content removal from past and future AI training.
The language used, which suggests Automattic will “ask” AI companies to delete the data, raises questions about the enforceability of such requests. Automattic’s AI lead, Andrew Spittle, has conveyed a commitment to ongoing advocacy for the exclusion of content based on current user preferences, expressing confidence that AI partners will respect these wishes.
The broader context of AI data deals
Deals involving AI data training are becoming increasingly valuable for online platforms seeking to navigate the challenging digital publishing environment. For instance, Google recently entered into an agreement with Reddit to utilize its extensive repository of user-generated content, coinciding with Reddit’s preparations for an initial public offering. Similarly, OpenAI has been actively forming partnerships to amass datasets from various sources to enhance its AI models.
As the landscape of AI and data privacy continues to evolve, companies like Automattic are finding themselves at the intersection of technological advancement and user trust. The development of tools and partnerships that respect user preferences will be critical in maintaining that trust while contributing to the growth of AI capabilities.