GPT-J
GPT-J or GPT-J-6B is an open-source large language model developed by EleutherAI in 2021. As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters. The model is available on GitHub, but the web interface no longer communicates with the model. Development stopped in 2021.
Architecture
GPT-J is a GPT-3-like model with 6 billion parameters. Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing tasks by predicting how a piece of text will continue.Its architecture differs from GPT-3 in three main ways.
- The attention and feedforward neural network were computed in parallel during training, allowing for greater efficiency.
- The GPT-J model uses rotary position embeddings, which has been found to be a superior method of injecting positional information into transformers.
- GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.
It was trained on the Pile dataset, using the Mesh Transformer JAX library in JAX to handle the parallelization scheme.
Performance
GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task.When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 on a variety of tasks. It even outperforms the 175 billion parameter GPT-3 on code generation tasks. With fine-tuning, it outperforms an untuned GPT-3 on a number of tasks.
Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.
Applications
The untuned GPT-J is available on EleutherAI's website, NVIDIA's Triton Inference Server, and NLP Cloud's website. Cerebras and Amazon Web Services offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced. CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset. NovelAI's Sigurd and Genji-JP 6B models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.
EleutherAI has received praise from Cerebras, GPT-3 Demo, NLP Cloud, and Databricks for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.