The generative AI revolution has a dirty little secret: its cost, both in monetary terms and the impact on the environment, is huge.
OpenAI CEO Sam Altman has previously said that GPT-4, the large language model (LLM) underpinning the premium version of ChatGPT, cost $100 million to train. The training data for OpenAI’s LLMs comes from a massive slice of the internet that the language model has to pore through to form the basis of its “knowledge.” Training GPT-3, the previous iteration of OpenAI’s LLM, reportedly produced more than 550 tonnes of carbon dioxide equivalent. Get into a 20-question conversation with ChatGPT, and you’ve wasted 500 milliliters of water—equivalent to a standard bottle.
Despite challenges from competitors to train open-source alternatives, generative AI remains a brute force game: Throw as much computing power and data at the problem as possible, and see what happens.
That approach is, of course, terrible for the environment and limits access to the most powerful LLMs to those who can afford to pay the astronomical amounts involved. So researchers have tried to develop a series of workarounds, including shortcuts designed to more efficiently train and validate LLMs. The workarounds are designed to try and reduce the amount of time, effort, and money spent on training AI models before use. Such workarounds could be vital for individuals or organizations who want to develop their own specialized LLMs outside those from the main big tech companies.
The problem? Those workarounds might reduce the computing power required to train an AI model from scratch, dropping the price of developing your own model and reducing the environmental risks associated with creating it—but it becomes much less effective, according to new research. “Training models to get to even reasonable performances is usually very expensive,” says Jean Kaddour, a PhD student at University College London and one of the authors behind the research.
The researchers decided to compare the different shortcut methods—which fall broadly into three categories: ignoring some training data, skipping irrelevant data, and more efficiently optimizing data—to see if they actually work.
The different methods of making training more efficient models were tested against T5 and BERT, two popular LLMs. Both T5 and BERT use the transformer approach, a method of architecting AI models that was first put forward in 2017, and which other LLMs, such as ChatGPT’s GPT-3 and GPT-4, also use. The T5 and BERT models were trained using the shortcut methods, which were assessed over six, 12, and 24 hours of training.
“The punchline is that in most cases, these methods—which are often quite a bit more complicated and require more implementation efforts—in our experiments didn’t really result in a significant improvement,” says Oscar Key, one of Kaddour’s coauthors and an academic at University College London. The detailed results differed depending on the type of shortcut deployed, but while one of the methods resulted in a small improvement over the standard (computationally heavier) approach, many supposed shortcuts resulted in significantly worse performance.
“I think this puts a big dent in the claims that we’ll have lots of specifically trained LLMs to do specialized work within smaller organizations and for individuals,” says Catherine Flick, a researcher at De Montfort University in the U.K. who was not involved in the study. “For organizations wanting to limit their climate impact, they’re not going to be able to get the performance they need without the costs in energy use to train new models.”
Flick believes that the findings also highlight a more significant problem. “It shows the need to start regulating this area in terms of energy use too, as the big AI companies will not be stopping with their existing models either—and each new iteration will be trained on more and more data,” she says.
The ability of large language models to produce intelligible, relevant results is already a hot topic of discussion among those who use them most frequently, with regular complaints about how ChatGPT is producing less relevant answers to questions posed by users. It’s unlikely many people who want to train their own AI models will be willing to sacrifice utility for a computationally lighter approach if doing so comes at the expense of the model’s effectiveness.
Users may therefore want to rely on pre-trained models, suggests Kaddour, such as Meta’s recently released Llama 2. While Llama 2 itself was computationally complex, with up to 70 billion parameters and two trillion tokens in its pretraining—40% more data than the first version of the LLM—it does centralize the training down to one instance and can be used widely by others.
“For the first time it’s commercially licensed, so even businesses can use these models now for their commercial applications,” says Kaddour. “Therefore, although the pretraining was very expensive, it can now be amortized across lots of different individuals.”
(5)