Original author(s) | OpenAI[1] |
---|---|
Initial release | June 11, 2020 (beta) |
Repository | |
Predecessor | GPT-2 |
Successor | GPT-4 |
Type | Autoregressive transformer language model |
Website | openai |
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model released in 2020 that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
The architecture is a decoder-only transformer network with a 2048-token-long context and then-unprecedented size of 175 billion parameters, requiring 800GB to store. The model was trained using generative pre-training; it is trained to predict what the next token is based on previous tokens. The model demonstrated strong zero-shot and few-shot learning on many tasks.[2]
It is the third-generation language prediction model in the GPT series, successor to GPT-2 created by OpenAI, a San Francisco-based artificial intelligence research laboratory.[3] GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[4] is part of a trend in natural language processing (NLP) systems of pre-trained language representations.[1]
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks.[5] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]: 34 David Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[6] An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency equivalent to that of a human.[7]
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.[8]
Further information: GPT-2 § Background |
According to The Economist, improved algorithms, powerful computers, and an increase in digitized data have fueled a revolution in machine learning, with new techniques in the 2010s resulting in "rapid improvements in tasks" including manipulating language.[9] Software models are trained to learn by using thousands or millions of examples in a "structure ... loosely based on the neural architecture of the brain".[9] One architecture used in natural language processing (NLP) is a neural network based on a deep learning model that was first introduced in 2017—the transformer.[10] GPT-n models are transformer-based deep learning neural network architectures. There are a number of NLP systems capable of processing, mining, organizing, connecting and contrasting textual input, as well as correctly answering questions.[11]
On June 11, 2018, OpenAI researchers and engineers posted their original paper on generative models—language models—artificial intelligence systems—that could be pre-trained with an enormous and diverse corpus of text via datasets, in a process they called generative pre-training (GP).[2] The authors described how language understanding performances in natural language processing (NLP) were improved in GPT-n through a process of "generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task." This eliminated the need for human supervision and for time-intensive hand-labeling.[2]
In February 2020, Microsoft introduced its Turing Natural Language Generation (T-NLG), which was claimed to be the "largest language model ever published at 17 billion parameters."[12] It performed better than any other language model at a variety of tasks which included summarizing texts and answering questions.
The construct of “learning styles” is problematic because it fails to account for the processes through which learning styles are shaped. Some students might develop a particular learning style because they have had particular experiences. Others might develop a particular learning style by trying to accommodate to a learning environment that was not well suited to their learning needs. Ultimately, we need to understand the interactions among learning styles and environmental and personal factors, and how these shape how we learn and the kinds of learning we experience.
– Text generated by Mike Sharples[13]
On May 28, 2020, an arXiv preprint by a group of 31 engineers and researchers at OpenAI described the development of GPT-3, a third-generation "state-of-the-art language model".[1][5] The team increased the capacity of GPT-3 by over two orders of magnitude from that of its predecessor, GPT-2,[14] making GPT-3 the largest non-sparse language model to date.[1]: 14 [3] Because GPT-3 is structurally similar to its predecessors,[1] its greater accuracy is attributed to its increased capacity and greater number of parameters.[15] GPT-3's capacity is ten times larger than that of Microsoft's Turing NLG, the next largest NLP model known at the time.[5]
Lambdalabs estimated a hypothetical cost of around $4.6 million US dollars and 355 years to train GPT-3 on a single GPU in 2020.[16], with lower actual training time by using more GPUs in parallel.
Sixty percent of the weighted pre-training dataset for GPT-3 comes from a filtered version of Common Crawl consisting of 410 billion byte-pair-encoded tokens.[1]: 9 Other sources are 19 billion tokens from WebText2 representing 22% of the weighted total, 12 billion tokens from Books1 representing 8%, 55 billion tokens from Books2 representing 8%, and 3 billion tokens from Wikipedia representing 3%.[1]: 9 GPT-3 was trained on hundreds of billions of words and is also capable of coding in CSS, JSX, and Python, among others.[4]
Dataset | # tokens | Proportion within training |
---|---|---|
Common Crawl | 410 billion | 60% |
WebText2 | 19 billion | 22% |
Books1 | 12 billion | 8% |
Books2 | 55 billion | 8% |
Wikipedia | 3 billion | 3% |
Since GPT-3's training data was all-encompassing, it does not require further training for distinct language tasks.[4] The training data contains occasional toxic language and GPT-3 occasionally generates toxic language as a result of mimicking its training data. A study from the University of Washington found that GPT-3 produced toxic language at a toxicity level comparable to the similar natural language processing models of GPT-2 and CTRL. OpenAI has implemented several strategies to limit the amount of toxic language generated by GPT-3. As a result, GPT-3 produced less toxic language compared to its predecessor model, GPT-1, although it produced both more generations and a higher toxicity of toxic language compared to CTRL Wiki, a language model trained entirely on Wikipedia data.[17]
On June 11, 2020, OpenAI announced that users could request access to its user-friendly GPT-3 API—a "machine learning toolset"—to help OpenAI "explore the strengths and limits" of this new technology.[18][19] The invitation described how this API had a general-purpose "text in, text out" interface that can complete almost "any English language task", instead of the usual single use-case.[18] According to one user, who had access to a private early release of the OpenAI GPT-3 API, GPT-3 was "eerily good" at writing "amazingly coherent text" with only a few simple prompts.[20] In an initial experiment 80 US subjects were asked to judge if short ~200 word articles were written by humans or GPT-3. The participants judged correctly 52% of the time, doing only slightly better than random guessing.[1]
On November 18, 2021, OpenAI announced that enough safeguards had been implemented that access to its API would be unrestricted.[21] OpenAI provided developers with a content moderation tool that helps them abide by OpenAI's content policy.[22] On January 27, 2022, OpenAI announced that its newest GPT-3 language models, collectively referred to as InstructGPT, was now the default language model used on their API. According to OpenAI, InstructGPT produced content that was better aligned to user intentions by following instructions better, generating fewer made-up facts, and producing somewhat less toxic content.[23]
Because GPT-3 can "generate news articles which human evaluators have difficulty distinguishing from articles written by humans,"[5] GPT-3 has the "potential to advance both the beneficial and harmful applications of language models."[1]: 34 In their May 28, 2020 paper, the researchers described in detail the potential "harmful effects of GPT-3"[5] which include "misinformation, spam, phishing, abuse of legal and governmental processes, fraudulent academic essay writing and social engineering pretexting".[1] The authors draw attention to these dangers to call for research on risk mitigation.[1]: 34
GPT-3 is capable of performing zero-shot and few-shot learning (including one-shot).[1]
In June 2022, Almira Osmanovic Thunström wrote that GPT-3 was the primary author on an article on itself, that they had submitted it for publication,[24] and that it had been pre-published while waiting for completion of its review.[25]
On March 15, 2022, OpenAI made available new versions of GPT-3 and Codex in its API with edit and insert capabilities under the names "text-davinci-003" and "code-davinci-002".[26] These models were described as more capable than previous versions and were trained on data up to June 2021.[27] On November 30, 2022, OpenAI began referring to these models as belonging to the "GPT-3.5" series,[27] and released ChatGPT, which was fine-tuned from a model in the GPT-3.5 series.[28]
GPT-3's builder, OpenAI, was initially founded as a non-profit in 2015.[51] In 2019, OpenAI did not publicly release GPT-3's precursor model, breaking from OpenAI's previous open-source practices, citing concerns that the model would perpetuate fake news. OpenAI eventually released a version of GPT-2 that was 8% of the original model's size.[52] In the same year, OpenAI restructured to be a for-profit company.[53] In 2020, Microsoft announced the company had exclusive licensing of GPT-3 for Microsoft's products and services following a multi-billion dollar investment in OpenAI. The agreement permits OpenAI to offer a public-facing API such that users can send text to GPT-3 to receive the model's output, but only Microsoft will have access to GPT-3's source code.[8]
Large language models, such as GPT-3, have come under criticism from a few of Google's AI ethics researchers for the environmental impact of training and storing the models, detailed in a paper co-authored by Timnit Gebru and Emily M. Bender in 2021.[54]
The growing[when?] use of automated writing technologies based on GPT-3 and other language generators, has raised concerns regarding academic integrity[55] and raised the stakes of how universities and schools will gauge what constitutes academic misconduct such as plagiarism.[56]
GPT was built with data from the Common Crawl dataset, a conglomerate of copyrighted articles, internet posts, web pages, and books scraped from 60 million domains over a period of 12 years. TechCrunch reports this training data includes copyrighted material from the BBC, The New York Times, Reddit, the full text of online books, and more.[57] In its response to a 2019 Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation from the United States Patent and Trademark Office (USPTO), OpenAI argued that "Under current law, training AI systems [such as its GPT models] constitutes fair use," but that "given the lack of case law on point, OpenAI and other AI developers like us face substantial legal uncertainty and compliance costs."[58]