OpenAI’s content deal with the FT is an attempt to avoid more legal challenges – and an AI ‘data apocalypse’

OpenAI’s new “strategic partnership” and licensing agreement with the Financial Times (FT) follows similar deals between the US tech company and publishers such as Associated Press, German media giant Axel Springer and French newspaper Le Monde.

OpenAI will licence the FT’s content to use as training data for its products, including successors to its AI chatbot ChatGPT. The AI systems developed by OpenAI are exposed to this data to help them improve their performance in terms of use of language, context and accuracy. The FT will receive an undisclosed payment as part of the deal.

This is happening against a global backdrop of legal challenges by media companies alleging copyright infringement over the use of their content to train AI products. The most high-profile of these is a case brought by the New York Times against OpenAI. There is also a fear among tech companies that, as they build more and more advanced products, the internet will no longer have enough high-quality data to train these AI tools.

So, what will this deal mean for the FT? There’s still a lack of detail on partnerships like this one, apart from the fact the FT will be paid for its content. However, there are hints of other potential benefits.

In a statement, the FT Group’s chief-executive, John Ridding, emphasised that the paper was committed to “human journalism”. But he also acknowledged that the news business can’t stay still: “We’re keen to explore the practical outcomes regarding news sources and AI through this partnership … We value the opportunity to be inside the development loop as people discover content in new ways.”

The FT has previously said it would “experiment responsibly” with AI tools, and train journalists to use generative AI for “story discovery”.

OpenAI is probably keen to announce this partnership because it hopes it will help solve the most acute problems facing its flagship products. The first is that these generative AI tools sometimes make things up, a phenomenon known as hallucination. Using reliable content from the FT and other trusted sources should help with that.

The second problem is that it could help offset the legal scrutiny that OpenAI faces. Signing official deals with news sources provides the tech company with some reputational damage control, as it shows them trying to make good with the world of journalism. It also potentially provides more legal security going forward.

Varavin88 / Shutterstock

The licensed content from the FT – and other media sources – could provide ChatGPT and the upcoming GPT-5 with more specific, referenced responses to users. Gemini, Google’s ChatGPT competitor, already attempts to do this by providing Google searches that support the claims it makes. Getting results directly from the source means OpenAI has more reliable evidence to search through and be trained on.

This appears to follow the trend of “retrieval-augmented generation” (RAG) that is becoming more popular in the AI world. RAG is a technique whereby a large language model (the technology that sits behind AI chatbots such as ChatGPT) can be provided with a database of knowledge which can be searched to support what the chatbot already knows. This is a bit like taking an exam with a textbook open in front of you.

This helps reduce the risk of hallucination, where the AI authoritatively produces a response that looks real but is actually made up. Having access to a database of trusted journalism helps offset the reliability problems with AI products as a result of them being trained on the open internet.

Partnership programme

There’s a subtext to this global media partnerships programme that isn’t about the law or ethics. OpenAI needs more and more data as time goes on to keep delivering big improvements through upgrades to its AI products. Yet these products are running out of high-quality training data from the open internet.

This is, at least in part, because there is now a proliferation of content made by AI on the web. This potentially undermines OpenAI’s continual need to prove to its partners, governments and investors that it can deliver big improvements to its flagship products.

The New York Times lawsuit maintains that products such as ChatGPT threaten the business of media companies. Whatever the outcome of this case, it is in OpenAI’s interests to keep its sources of training data, including media companies, productive and economically viable. The success of ChatGPT, at least for now, is very much tied to the success of the people and organisations producing the data that makes it useful.

PR from the AI industry has done much to foster the idea of inevitability: that AI, in the form of products such as ChatGPT, will transform industries – and people’s lives in general. Yet technology fails all the time. The FT deal highlights the dynamic tension that exists between AI and the industries it is changing. ChatGPT now needs the trustworthy journalism that its own generative capabilities and training methods have helped to undermine.

The idea that generative AI has poisoned the internet is nothing new. Some AI researchers have likened the spread of AI-generated junk on the internet to how radioactive contamination of metals forced steel manufacturers in the 1950s to go diving for steel from wrecked ships that had been manufactured before the nuclear age. This pre-nuclear steel was needed for certain uses, such as in particle accelerators and Geiger counters.

In a similar way, for OpenAI and companies like it, training its products on data “scraps” does not seem like a viable way forward.

Source link

OpenAI’s content deal with the FT is an attempt to avoid more legal challenges – and an AI ‘data apocalypse’

Partnership programme

Leave a Reply Cancel reply

Categories

Latest News

Local and landscape scale factors influence pollinators at solar parks – The Applied Ecologist

Trapped by a Facebook job advert, freed by a stranger in a taxi

Office of Public Affairs | Turkey-Based Global Director of Sham Charity Arrested and Charged with Conspiring to Provide Material Support to Hamas

Middle East: Civilian death toll climbs in Gaza, Lebanon and West Bank as UN chief demands ‘the fighting must stop’

Safer and more transparent AI

South Africa’s privacy laws are evolving, but vulnerable people are being left behind

Pages

Enjoy this blog? Please spread the word :)

Partnership programme

Related Posts

Leave a Reply Cancel reply

Enjoy this blog? Please spread the word :)