Meta’s Challenge to OpenAI—Give Away a Massive Language Model

0 comments

May 18, 2022

Meta’s Challenge to OpenAI—Give Away a Massive Language Model

Meta is giving away some of the family jewels: That’s the gist of an announcement from the company formerly known as Facebook this week. In a blog post on the Meta AI site, the company’s researchers announced that they’ve created a massive and powerful language AI system and are making it available free to all researchers in the artificial-intelligence community. Meta describes the move as an effort to democratize access to a powerful kind of AI—but some argue that not very many researchers will actually benefit from this largesse. And even as these models become more accessible to researchers, many questions remain about the path to commercial use.

Large language models are one of the hottest things in AI right now. Models like OpenAI’s GPT-3 can generate remarkably fluid and coherent text in just about any format or style: They can write convincing news articles, legal summaries, poems, and advertising copy, or hold up their end of conversation as customer-service chatbots or video-game characters. GPT-3, which broke the mold with its 175 billion parameters, is available to academic and commercial entities only via OpenAI’s application and vetting process.

Meta’s Open Pretrained Transformer (known as OPT-175B) matches GPT-3 with 175 billion parameters of its own. Meta is offering the research community not only the model itself, but also its codebase and extensive notes and logbooks about the training process. The model was trained on 800 gigabytes of data from five publicly available data sets, which are described in the “data card” that accompanies a technical paper posted by the Meta researchers to the ArXiv online preprint server.

Joelle Pineau, director of Meta AI Research Labs, tells IEEE Spectrum that she expects researchers to make use of this treasure trove in several ways. “The first thing I expect [researchers] to do is to use it to build other types of language-based systems, whether it’s machine translation, a chatbot, something that completes text—all of these require this kind of state-of-the-art language model,” she says. Rather than training their own language models from scratch, Pineau says, they can build applications and run them “on a relatively modest compute budget.”

Joelle PineauMeta

The second thing she expects researchers to do, Pineau says, is “pull it apart” to examine its flaws and limitations. Large language models like GPT-3 are famously capable of generating toxic language full of stereotypes and harmful bias; that troubling tendency is a result of training data that includes hateful language found in Reddit forums and the like. In their technical paper, Meta’s researchers describe how they evaluated the model on benchmarks related to hate speech, stereotypes, and toxic-content generation, but Pineau says “there’s so much more to be done.” She adds that the scrutiny should be done “by community researchers, not inside closed research labs.”

The paper states that “we still believe this technology is premature for commercial deployment,” and says that by releasing the model with a noncommercial license, Meta hopes to facilitate the development of guidelines for responsible use of large language models “before broader commercial deployment occurs.”

Within Meta, Pineau acknowledges that there’s a lot of interest in using OPT-175B commercially. “We have a lot of groups that deal with text,” she notes, that might want to build a specialized application on top of the language model. It’s easy to imagine product teams salivating over the technology: It could power content-moderation tools or text translation, could help suggest relevant content, or could generate text for the creatures of the metaverse, should it truly come to pass.

There have been other efforts to make an open-source language model, most notably from EleutherAI, an association that has released a 20-billion-parameter model in February. Connor Leahy, one of the founders of EleutherAI and founder of an AI startup called Conjecture, calls Meta’s move a good step for open science. “Especially the release of their logbook is unprecedented (to my knowledge) and very welcome,” he tells Spectrum in an email. But he notes that Meta’s conditional release, making the model available only on request and with a noncommercial license, “falls short of truly open.” EleutherAI doesn’t comment on its plans, but Leahy says the group will continue working on its own language AI, and adds that OPT-175B will be helpful for some of its research. “Open research is synergistic in that way,” he says.

“Security through obscurity is not security, as the saying in the computer-security world goes. And studying these models and finding ways to integrate their existence into our world is the only feasible path forward.”
—Connor Leahy, EleutherAI

EleutherAI is a something of an outlier in AI research in that it’s a self-organizing group of volunteers. Much of today’s cutting-edge AI work is done within the R&D departments of big players like Meta, Google, OpenAI, Microsoft, Nvidia, and other deep-pocketed companies. That’s because it takes enormous amount of energy and compute infrastructure to train big AI systems.

Meta claims that its training of OPT-175 required 1/7th the carbon footprint of that required for training GPT-3, yet as Meta’s paper notes, that’s still a significant energy expenditure. The paper says that OPT-175B was trained on 992 80-gigabyte A100 GPUs from Nvidia, with a carbon-emissions footprint of 75 tons, as compared to an estimated carbon budget of 500 tons for GPT-3 (that figure has not been confirmed by OpenAI).

Meta’s hope is that by offering up this “foundation model” for other entities to build on top of, it will at least reduce the need to build huge models from scratch. Deploying the model, Meta says in its blog post, requires only 16 Nvidia 32GB V100 GPUs. The company is also releasing smaller scale versions of OPT-175B that can be used by researchers who don’t need the full-scale model or by those who are investigating the behavior of language models at different scales.

Maarten Sap, a researcher at the Allen Institute for Artificial Intelligence (AI2) and in incoming assistant professor at Carnegie Mellon University’s Language Technologies Institute, studies large language models and has worked on methods to detoxify them. In other words, he’s exactly the kind of researcher that Meta is hoping to attract. Sap says that he’d “love to use OPT-175B,” but “the biggest issue is that few research labs actually have the infrastructure to run this model.” If it were easier to run, he says, he’d use it to study toxic language risks and social intelligence within language models.

While Sap applauds Meta for opening up the model to the community, he thinks it could go a step further. “Ideally, having a demo of the system and an API with much more control/access than [OpenAI’s API for GPT-3] would be great for actual accessibility,” he says. However, he notes that Meta’s release of smaller versions is a good “second-best option.”

Whether models like OPT-175B will ever become as safe and accessible as other kinds of enterprise software is still an open question, and there are different ideas about the path forward. EleutherAI’s Leahy says that preventing broad commercial use of these models won’t solve the problems with them. “Security through obscurity is not security, as the saying in the computer-security world goes,” says Leahy, “and studying these models and finding ways to integrate their existence into our world is the only feasible path forward.”

Meanwhile, Sap argues that AI regulation is needed to “prevent researchers, people, or companies from using AI to impersonate people, generate propaganda or fake news, or other harms.” But he notes that “it’s pretty clear that Meta is against regulation in many ways.”

Sameer Singh, an associate professor at University of California, Irvine, and a research fellow at AI2 who works on language models, praises Meta for releasing the training notes and logbooks, saying that process information may end up being more useful to researchers than the model itself. Singh says he hopes that such openness will become the norm. He also says he supports providing commercial access to at least smaller models, since such access can be useful for understanding models’ practical limitations.

“Disallowing commercial access completely or putting it behind a paywall may be the only way to justify, from a business perspective, why these companies should build and release LLMs in the first place,” Singh says. “I suspect these restrictions have less to do with potential damage than claimed.”