This article is part of our coverage of the latest in AI research.

In early May, Meta released the Open Search Transformer (OPT-175B), a large language model (LLM) that can perform a variety of tasks. Large-scale language models have become one of the hottest areas of research in the field of artificial intelligence in the last few years.

OPT-175B is the latest entrant in the OpenAI-powered LLM arms race GPT-3, a deep neural network with 175 billion parameters. GPT-3 showed that LLMs can perform many tasks without further training and see only a few examples (training with zero or a few shots). Microsoft later integrated GPT-3 into several of its products, showcasing not only LLM’s scientific but also commercial promises.

Greetings humanoids

Subscribe now to a weekly summary of our favorite artificial intelligence stories

What makes the OPT-175B unique is Meta’s commitment to “openness,” as the model’s name suggests. Meta made the model available to the public (with some reservations). He also publishes many details about the learning and development process. In a post published on AI Meta Blogthe company described its launch of the OPT-175B as “Democratizing Access to Large-Scale Language Models”.

Meta’s move towards transparency is commendable. However, competition for large language models has reached a point where it can no longer be democratized.

The Meta edition of the OPT-175B has some key features. It includes both pre-trained models and the code needed to learn and use LLM. Pre-trained models are especially useful for organizations that do not have computational resources to train the model (neural network training is much more intensive than resource use). It will also help reduce the massive carbon footprint caused by the computational resources needed to train large neural networks.

Like the GPT-3, the OPT is available in a variety of sizes, ranging from 125 million to 175 billion parameters (models with more parameters have more training capacity). At the time of writing, all models up to the OPT-30B are available for download. The full 175 billion parameter model will be made available to selected researchers and institutions who complete an application form.

According to the Meta AI blog, “To maintain integrity and prevent abuse, we are launching our model under a non-commercial license to focus on cases of research use. Access to the model will be provided to academic researchers; those related to organizations in government, civil society and academia; together with industrial research laboratories around the world. “

In addition to the models, Meta has released a complete diary that provides a detailed technical schedule of the process of developing and training large language models. Published articles usually include only information about the final model. The diary provides valuable information on “how many calculations were used to train the OPT-175B and the human costs required when the basic infrastructure or the training process itself becomes unstable in scale,” according to Meta.

In a blog post, Meta said that large language models are accessible primarily through “paid APIs” and that limited access to LLMs has “limited ability for researchers to understand how and why these large language models work, hampering progress in efforts to improve their stability and mitigation of known issues such as bias and toxicity. “

This is a blow to OpenAI (and via the Microsoft extension), which launched GPT-3 as a black box API service instead of making the weights and source code of its model available to the public. Among the reasons given by OpenAI for not making GPT-3 public is controlling the abuse and development of malicious applications.

Meta believes that by making the models available to a wider audience, it will be in a better position to study and prevent any harm they may cause.

Here is how Meta describes the effort: “We hope that the OPT-175B will bring more voices to the brink of creating large language models, help the community collectively design responsible release strategies and add an unprecedented level of transparency and openness to the development of great language models in this area. ”

However, it is worth noting that “transparency and openness” is not the equivalent of “democratizing large language models”. The costs of training, configuring and managing large language models remain prohibitive and are likely to increase in the future.

According to a blog post by Meta, its researchers have been able to significantly reduce the cost of learning large language models. The company says the model’s carbon footprint has been reduced to one-seventh of the GPT-3. The experts I spoke to before calculated the cost of training GPT-3 up to $ 27.6 million.

This means that the training of the OPT-175B will still cost several million dollars. Fortunately, the pre-trained model will eliminate the need for model training, and Meta says it will provide the code base used to train and implement the full model, “using only 16 NVIDIA V100 GPUs.” This is the equivalent of Nvidia DGX-2, which costs about $ 400,000, not a small amount for a research laboratory with limited money or a single researcher. (According to a paper which provides more details about the OPT-175B, Meta trained its own model with 992 80GB A100 GPU, which are significantly faster than the V100.)

The Meta AI diary further confirms that learning large language models is a very complex task. The OPT-175B’s history is fraught with server crashes, hardware failures, and other complications that require high-tech staff. The researchers also had to restart the training process several times, change the hyperparameters and change the loss functions. All this leads to additional costs that small laboratories cannot afford.

Language models such as OPT and GPT are based on transformer architecture. One of the key features of transformers is their ability to process large serial data (eg text) in parallel and on a scale.

In recent years, researchers have shown that by adding more layers and parameters to transformer models, they can improve their performance in language tasks. Some researchers believe that reaching higher levels of intelligence is just a matter of scale. Accordingly, money-rich research labs such as Meta AI, DeepMind (owned by Alphabet) and OpenAI (supported by Microsoft) are moving towards creating larger and larger neural networks.

Last year, Microsoft and Nvidia created a Language model with 530 billion parameters called Megatron-Turing (MT-NLG). Google introduced last month Trail Language Model (PaLM), LLM with 540 billion parameters. And there are rumors that OpenAI will release GPT-4 in the next few months.

However, larger neural networks also require greater financial and technical resources. And while larger language models will have new bells and whistles (and new failures), they will inevitably centralize power in the hands of a few wealthy companies, making it even more difficult for smaller research laboratories and independent researchers to work on large language models.

On the commercial side, large technology companies will have an even greater advantage. Implementing large language models is very expensive and challenging. Companies like Google and Microsoft have special servers and processors that allow them to run these models on a large scale and in a profitable way. For smaller companies, the overhead costs of launching their own version of the LLM such as GPT-3 are prohibitive. Just like most companies use cloud hosting services instead of creating their own servers and data centers, non-standard systems as the GPT-3 API will gain more traction as large language models become more popular.

This in turn will further centralize AI in the hands of large technology companies. More AI research laboratories will need to partner with high-tech partnerships to fund their research. And this will give big technologies more power to decide the future directions of AI research (which will probably be in line with their financial interests). This may be at the cost of research areas that do not have a short-term return on investment.

The bottom line is that as we celebrate Meta’s move to bring transparency to LLM, let’s not forget that the very nature of large language models is undemocratic and in favor of the companies that publish them.

This article was originally written by Ben Dixon and published by Ben Dixon at TechTalks, a publication that looks at technology trends, how they affect the way we live and do business, and the problems they solve. But we are also discussing the downside of technology, the darker effects of new technology, and what to look out for. You can read the original article here.

Previous articleWhy Tesla was dropped from the S&P 500’s ESG index
Next articleApple wants to increase production outside China, the report said