Arquivo.pt


Arquivo.pt, formerly known as the Portuguese Web Archive, is a web archive that preserves Web content dating back to 1996. It is a service of the Fundação para a Ciência e Tecnologia and was founded at the Fundação para a Computação Científica Nacional on the 8th November 2007.
Arquivo.pt collects regularly all the websites that are part of the Portuguese Web, in other words, all the websites with the .pt top level domain, as well as all the websites of the national interest. The preserved content is available one year after its collection for any user on the Arquivo.pt website.
As of March 2025, Arquivo.pt stores over 21 billion webpages from 47 million websites, totaling 1.4 petabytes of data.

History

The original idea of archiving the Portuguese Web started in 2001 with the project, developed by the XLDB investigation group at the Science Faculty of the University of Lisbon and it was supported by FCCN, where it collected about 57 million pieces of content, mainly textual. From this project, started.
On the 8th November 2007, the project for the Portuguese Web Archive was created at FCCN, after it as combined the resources and skills acquired at the previous project. The project was led by Daniel Coelho Gomes from 2007 to 2025. At the beginning of 2008, the project team made their first web crawl of.pt websites. The project had a 2-year maturity. Meanwhile, it was transformed as a permanent service of FCT.

Services

Search and access

Arquivo.pt makes available a search tool of web pages from an inserted URL. This functionality allows the users to access different versions of the same page from different dates. Moreover, this functionality is also compatible with full-text search.
On the 24th of March 2021, Arquivo.pt introduced an image search feature, known as Dionisius. This tool allows users to search for images archived from the web, dating back to 1996. Users can find images that are no longer available on the live web and can also locate the original web pages where these images were published.
The page access can be made automatically with the use of APIs which was introduced in 2012.

ArchivePageNow

In 2022, Arquivo.pt launched . This functionality allows the users to archive a web page at the intended moment. Afterwards, the archived web pages stay available for search.

Arquivo404

In 2022, Arquivo.pt developed the, an algorithm that allows web pages with the 404 error to contain a hyperlink directed to the preserved page at Arquivo.pt.

Others

  • - extracts the links contained in documents and archived the corresponding pages
  • - makes available data about information archived in the web
  • - collection of websites no longer on the live web

Arquivo.pt Awards

Since 2018, the Arquivo.pt Awards is organized with the sponsor President of Portugal and with a partnership with the Público newspaper, where the best investigative works using the features of Arquivo.pt are awarded.

Awards and recognitions