Stata
Stata is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology.
Stata was initially developed by Computing Resource Center in California and the first version was released in 1985. In 1993, the company moved to College Station, Texas and was renamed Stata Corporation, now known as StataCorp. A major release in 2003 included a new graphics system and dialog boxes for all commands. Since then, a new version has been released once every two years. The current version is Stata 19, released in April 2025.
Technical overview and terminology
User interface
From its creation, Stata has always employed an integrated command-line interface. Starting with version 8.0, Stata has included a graphical user interface which uses menus and dialog boxes to give access to many built-in commands. The dataset can be viewed or edited in spreadsheet format. From version 11 on, other commands can be executed while the data browser or editor is opened.Data structure and storage
Until the release of version 16, Stata could only open a single dataset at any one time. Stata allows for flexibility with assigning data types to data. Itscompress command automatically reassigns data to data types that take up less memory without loss of information. Stata utilizes integer storage types which occupy only one or two bytes rather than four, and single-precision rather than double-precision is the default for floating-point numbers.Stata's proprietary output language is known as SMCL, which stands for Stata Markup and Control Language and is pronounced "smickle".
Stata's data format is always tabular in format. Stata refers to the columns of tabular data as variables.
Data format compatibility
Stata can import data in a variety of formats. This includes ASCII data formats and spreadsheet formats.Stata's proprietary file formats have changed over time, although not every Stata release includes a new dataset format. Every version of Stata can read all older dataset formats, and can write both the current and most recent previous dataset format, using the
saveold command. Thus, the current Stata release can always open datasets that were created with older versions, but older versions cannot read newer format datasets.Stata can read and write SAS XPORT format datasets natively, using the fdause and fdasave commands.
Some other econometric applications, including gretl, can directly import Stata file formats.
History
The development of Stata began in 1984, initially by William Gould and later by Sean Becketti. The software was intended to compete with statistical programs for personal computers such as SYSTAT and MicroTSP. Written in the C programming language, Stata was released for MS-DOS in 1985 with 44 commands. Since then, versions of Stata have been released for systems running Unix variants like Linux distributions, Windows, and MacOS. All Stata files are platform-independent.| append | dir | infile | plot | spool |
| beep | do | input | query | summarize |
| by | drop | label | regress | tabulate |
| capture | erase | list | rename | test |
| confirm | exit | macro | replace | type |
| convert | expand | merge | run | use |
| correlate | format | modify | save | |
| count | generate | more | set | |
| describe | help | outfile | sort |
There have been 19 major releases of Stata between 1985 and 2025 and additional code and documentation updates between major releases. In its early years, extra sets of Stata programs were sometimes sold as "kits" or distributed as Support Disks. With the release of Stata 6 in 1999,
updates began to be delivered to users via the web.Hundreds of commands have been added to Stata in its 37-year history. Certain developments have proved to be particularly important and continue to shape the user experience today, including extensibility, platform independence, and the active user community.
Extensibility
Theprogram command was implemented in Stata 1.2, giving users the ability to add their own commands. ado-files followed in Stata 2.1, allowing a user-written program to be automatically loaded into memory. Many user-written ado-files are submitted to the Statistical Software Components Archive hosted by Boston College. StataCorp added an ssc command to allow community-contributed programs to be added directly within Stata. More recent editions of Stata allow users to call Python scripts using commands, as well as allowing Python IDEs like Jupyter Notebooks to import Stata commands. Although Stata does not support R natively, there are user-written extensions to use R scripts in Stata.User community
A number of important developments were initiated by Stata's active user community. The Stata Technical Bulletin, which often contains user-created commands, was introduced in 1991 and issued six times a year. It was relaunched in 2001 as the peer-reviewed Stata Journal, a quarterly publication containing descriptions of community-contributed commands and tips for the effective use of Stata. In 1994, a listserv began as a hub for users to collaboratively solve coding and technical issues; in 2014, it was converted into a web forum. In 1995, Statacorp began organizing user and developer conferences that meet annually. Only the annual Stata Conference held in the United States is hosted by StataCorp. Other user group meetings are held annually in the United States, the UK, Germany, and Italy, and less frequently in several other countries. Local Stata distributors host User Group meetings in their own countries.| Version | Release date | Select new or enhanced features |
| 1.0 | January 1985 |
|
| 1.1 | February 1985 | |
| 1.2 | May 1985 | keep |
| 1.3 | August 1985 | program |
| 1.4 | August 1986 | infile |
| 1.5 | February 1987 | anovalogit, probit |
| 2.0 | June 1988 | |
| 2.1 | September 1990 | reshape |
| 3.0 | March 1992 | logistic, ologit, oprobit, clogit, mlogittobit, cnreg, rreg, qreg, weibull, eregepitabpweights |
| 3.1 | August 1993 | mvreg, sureg, heckman, nlreg, areg, canonnbregmlcodebook |
| 4.0 | January 1995 | xtregglm |
| 5.0 | October 1996 | xtgee, xtprobitprais, newey, intregfracpolyst extended |
| 6.0 | January 1999 | mlarima, archst rewritten |
| 7.0 | December 2000 | frailtyxtabondnlogitroc |
| 8.0 | January 2003 | manova |
| 8.1 | July 2003 | ml |
| 8.2 | October 2003 | |
| 9.0 | April 2005 | |
| 9.1 | September 2005 | |
| 9.2 | April 2006 | |
| 10.0 | June 2007 | |
| 10.1 | August 2008 | |
| 11.0 | July 2009 | margins postestimation command |
| 11.1 | June 2010 | |
| 11.2 | March 2011 | |
| 12.0 | July 2011 | |
| 12.1 | January 2012 | |
| 13.0 | June 2013 | |
| 13.1 | October 2013 | |
| 14.0 | April 2015 | |
| 14.1 | October 2015 | |
| 14.2 | September 2016 | |
| 15.0 | June 2017 | |
| 15.1 | November 2017 | |
| 16.0 | June 2019 | |
| 16.1 | February 2020 | |
| 17.0 | April 2021 | tables command |
| 18.0 | April 2023 |