Reddit says it’s made $203M so far by licensing its data

[ad_1]

Reddit’s chances toward getting listed on the stock market are more tied to relationships with AI vendors like OpenAI than one might expect.

In its IPO prospectus filed today With the US Securities and Exchange Commission, Reddit has repeatedly emphasized how much it stands to gain from data licensing agreements with companies that train AI models on more than a billion posts and more than 16 billion comments. – And how much has been achieved.

“In January 2024, we entered into certain data licensing arrangements with an aggregate contract value of $203.0 million and terms of two to three years,” the prospectus reads. “We expect to generate a minimum of $66.4 million in revenue during the year ending December 31, 2024 with the balance thereafter.”

Now, it’s a mystery as of now which AI vendors are licensing data from Reddit. Earlier this week, Bloomberg and Reuters informed of That’s a “large unnamed AI company” – possibly google – Licensing agreement worth about $60 million was signed on annual basis. But OpenAI would be no surprise customer, especially given that OpenAI CEO Sam Altman owns 8.7% bet in Reddit (making him the third-largest shareholder) and at one time was a member of the company’s board of directors.

Why is Reddit data valuable? As Reddit explains, AI models “learn” essays, code, emails, articles, and more from examples, and vendors like OpenAI scour the web for millions to billions of these examples to add to their training sets. Some examples are in the public domain. Others are not, or – in the case of Reddit content – fall under restrictive licenses that require citations or specific forms of compensation.

Reddit had previously not allowed access to its data for AI training purposes. But last year its trend reversed, their argument is Its data shouldn’t be like this – in the words of CEO Steve Huffman – “[given] To some of the biggest companies in the world, for free.”

“Reddit data is a fundamental part of current AI technology and the building of many large language models,” the prospectus continues. “We believe that Reddit’s vast storehouse of conversation data and knowledge will continue to play a role in training and improving large language models. As our content refreshes and grows daily, we hope models will want to reflect these new ideas and update their training using Reddit data.

Content creators from stock media libraries to news publishers are increasingly turning to data licensing agreements with AI vendors as chatbots like OpenAI’s ChatGPT threaten to drain traffic. A recent model from The Atlantic found That, if a search engine like Google integrates AI into search, it will answer a user’s query 75% of the time without requiring a click-through to its website.

In turn, vendors have been motivated to pursue licensing agreements as they face a flood of lawsuits alleging that they have no right to train their models on the data without permission or payment. There is no legal basis. Recently, The New York Times accused OpenAI is effectively using its actions to create news publisher competitors, causing harm to its business.

There are agreements with OpenAI Shutterstock Also included are publishers axel springerOwner of Politico and Business Insider. are licensed informed of However, to be quite small – more than $5 million per year.

[ad_2]

Thanks For Reading

Leave a Comment Cancel Reply