List of web archiving initiatives
This article contains a list of web archiving initiatives worldwide. For easier reading, the information is divided in three tables: web archiving initiatives, archived data, and access methods.
Some of these initiatives may or may not make use of several web archiving file formats and/or their own proprietary file formats.
This Wikipedia page was originally generated from the results obtained for the research paper A survey on web archiving initiatives, published by the team at the time.
Archived data
| Name | Archived Contents | Disk Space Occupied | Archive Format | TLD/Broad Crawls | Selective Crawls | Comments |
| EU Web Archive | WARC | .EU | Y | .EU 250 websites in europa.eu domain and subdomains, crawled once per quarter + ad hoc crawls on request of website owners. Status Feb 2019. | ||
| Australia's Web Archive | 11000 | 600 | WARC | .AU | Y | .AU crawls : 10.15 billion files. Selective crawls : 755 million files. AGWA : 525 million files. |
| Our digital island, a Tasmanian Web Archive | 0.336 | HTTrack | Y | Preserves online content related to Tasmania. ODI has operated since its inception under the assumption that web sites fall within the definition of 'Book' in the Tasmanian Library Act 1984. Thus, no permission to capture from publishers is required. | ||
| Webarchive Austria | 4095 | 164 | ARC | .AT, .wien, .tirol | Y | A copy of the data is stored in a high security data storage unit. |
| Deutsche Nationalbibliothek | WARC | .DE | Y | Only one experimental TLD crawl. | ||
| DILIMAG | 0.03 | 0.996 | ARC | Project from 2007-03-01 until 2010-12-23. The project DILIMAG for collecting, describing and archiving of digital German literary magazines. | ||
| Bibliothèque et Archives nationales du Québec | 167 | 31 | ARC/WARC | Y | Harvesting began in 2009. Selective crawls of Quebec websites. | |
| Government of Canada Web Archive | 1750 | 70 | ARC/WARC | .GC.CA | Y | Web archiving at Library and Archives Canada began in 2005 and concentrated on collecting the federal government web presence and capturing the federal elections, the Olympics, and Canadian commemorative events. Thematic web collections of Canadiana research interest have been curated as an ongoing program activity since 2009. |
| Web Information Collection and Preservation - WICP | .GOV.CN | Y | Harvest of the web pages about the events that have great influence on the society, economy and so on, and the sites in 'gov.cn' domain. | |||
| Croatian Web Archive | 231 | 13 | Mirror, WARC | .HR | Y | Since 2004 selective harvesting over 5000 web resources. Since 2011 annual harvesting of national.hr domain as well as thematic harvesting. All archived content is publicly available via HAW website. |
| Webarchiv | 9412 | 350 | ARC/WARC | .CZ | Y | Harvesting began in 2001. |
| / The Danish web archive | 36000 | 634 | ARC/WARC | .DK | Y | +36 billion objects:
|
| Estonian Web Archive | 874 | 56 | ARC/WARC | .EE | Y | Archive consists selective, event and topical crawls since 2010. Whole national domain crawls are done yearly since 2015. Besides TLD.ee, Estonia related web content is harvested from other TLD-s like.eu,.org,.com etc. |
| Finnish Web Archive | 4300 | 300 | ARC/WARC / .json / .mp4 | .FI, .AX | Y | Also crawls content hosted on machines physically located in Finland, independently from their domain. |
| BnF - Web Legal Deposit | 48 000 | 1 800 | ARC/WARC | .FR + all sites hosted in France | Y | BnF is making copies of all sites in the .FR TLD, as well as all sites hosted and produced in France, ignoring both the Robots exclusion standard and the licenses of the documents. |
| BnL Web-Archive | 543 | 41 | WARC | .LU | Y | The BnL conducts 2 domain crawls per year, as well as event-based and selective crawls. |
| Ina (Institut National de l'Audiovisuel) | 105800 | 2359 | DAFF | Y | As of 2021-03-08 DAFF handles full content deduplication, so the size on disk takes into account compression and deduplication; the equivalent disk storage in compressed ARC format would be approximately 10 PB | |
| E-diaspora (Télécom ParisTech, FMSH) | 1030 | 13 | DAFF | Y | DAFF handles full content deduplication, so the size on disk takes into account compression and deduplication; the equivalent disk storage in compressed ARC format would be approximately 51 TB | |
| Internet Memory Foundation | 180 | WARC | Can be done by partners | Y | Formerly European Archive. Collaborate with Internet Memory Research, which provides the ArchiveTheNet Service. Selective crawls, Domain crawls, expect to grow to 1PB in 2012. New datacenter and a new crawler in 2012. | |
| Bibliotheksservice-Zentrum Baden-Württemberg | 9 | WARC | Y | Websites of about 20 cities, municipalities, districts + their associated corporations, and state libraries are collected by BSZ in commission within various Archive-It collections. Public access. Data storage: San Francisco as well as backup with Baden-Wuerttemberg storage infrastructure. | ||
| Web archive of the German Bundestag | Y | German Federal Parliament. Selective. At regular intervals or at certain events are snapshots of www.bundestag.de and other web presences of the German Bundestag made. These are available in the web archive to date available. | ||||
| Iceland | ||||||
| Palestine Web Archive | ARC/WARC | .PS | Y | .PS crawls : Pilots Crawls. Selective crawls | ||
| Web Archiving Project, The National Diet Library, Japan | 12670 | 1313 | WARC | Y | as of March 2023 15 TB of selective crawls based on permission. Started the web archiving of official institution sites based on the legislation from April 2010. | |
| National Library of Korea - OASIS | 24 | Y | Requires consent before archiving. Targets 56,401 Websites. Web archiving is managed under Digital resource management systems. In 2011 web archiving system will be rebuilt. | |||
| Koninklijke Bibliotheek | 407 | 36 | WARC | Y | Selective crawls of ca. 20.400 sites | |
| New Zealand Web Archive | 4300 | 260 | ARC/WARC | .NZ | Y | .NZ crawls : 4+ billion URLS. Selective crawls 33,500 websites. Legal deposit covers born digital material. |
| The National Library of Norway | ||||||
| Arquivo.pt | 21 118 | 1 455 | ARC/WARC | Focused on .PT but also other domains | Y | .PT domain crawls and integration of external collections since 2007 and daily crawls of a selection of online publications of since 2010. Selective crawls related to national events such as elections or international content related to science such as websites about Research & Development projects funded by the European Union. |
| Web archive of Cacak | 0.255 | 0.013 | HTTrack | Y | Selective crawls of 130 sites related to the city of Cacak. Collaboration with the Webarchiv team from the National Library of the Czech Republic. | |
| Web Archive Singapore | WARC | .SG | Y | Selective crawls of Singapore-related sites and .SG domain archiving. | ||
| Digital Resources | 1 921 | 89 | WARC | .SK + other TLDs with Slovacical content | Y | Harvesting of the Slovak web started in 2015. Since then ULB has performed six full-domain harvests, multiple selective crawls and thematic crawls. |
| Slovenian Web Archive | 30 | WARC | Selective crawls since 2007, national domain crawls since 2014. | |||
| Archivo de la Web Española | 2539 | 117 | WARC | .ES | Y | Domain .ES crawls : 2.421 million files in collaboration with Internet Archive. Selective crawls : 119 mil files. About 30 news media sites crawled every day. Not launched publicly yet. |
| PADICAT: The Web Archive of Catalonia | 620 | 32,5 | ARC/WARC | .CAT | Y | In accordance with the general trend, the archive model is a hybrid system consisting: Mass compilation of open-access digital resources published on the Internet ; Systematic archiving of the web site output of Catalan organizations; Fostering of lines of research through themed integration of the digital resources pertaining to specific events in Catalan public life |
| 21 | 0.8 | ARC | Y | |||
| Sweden | 5700 | 360 | Multipart MIME | .se, Swedish.nu and geolocation for other tld's | Y | Bulk crawls approximately twice a year. Selective crawls of about 140 newspapers every day. |
| Aleph Archives | >10000000 | >25 | Native HTML, WARC, WARC2, ARC and HTTrack to WARC migration tools | Y | Enterprise-grade automatic web archiving platform for online capture and preservation. Support eDiscovery with powerful and qualitative technology. Aimed to corporations, institutions and agencies seeking to capture, preserve and leverage their Web content; dynamic websites, wikis, social media, forums, comments, disclaimers, and ads, for compliance, marketing or pure preservation purposes. | |
| Web Archive Switzerland | 80 | ARC, WARC | Y | Mainly selected.ch crawls | ||
| NTU Web Archiving System, NTUWAS | 200 | 14 | Y | |||
| Web Archive Taiwan | ||||||
| The UK Web Archive | 20.6 | WARC | Y | Selective crawls with previous permission. Now also conducting wholesale UK domain-scale crawls under Non-Print Legal Deposit legislation, enacted April 2013. This content will only be available on premises controlled by one of the six legal deposit libraries. The UKWA is a spin-off from the UK Web Archiving Consortium that ended in 2007. | ||
| Hanzo Archives | 7 | WARC | Y | Commercial web archiving services and appliances, for government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA. | ||
| UK Government Web Archive | 1000 + | 150 | ARC WARC post July 2017 | Between 2003 - 2005 the Internet Archive undertook the technical side of web archiving on behalf of The UK Government Web Archive. Between 2005 - July 2017 the technical side of the web archiving service was contracted out to the Internet Memory Foundation. From July 2017 MirrorWeb took over the contract and moved the entire archive to the cloud. The UK Government Web Archive was part of the UK Web Archiving Consortium from 2004 - 2009. | ||
| Internet Archive | 690000 | 21000 | Worldwide | Y | Provides the Archive-it service and leads the Archive-access project. Collection is mirrored at Bibliotheca of Alexandrina in Egypt. | |
| Columbia University Libraries Web Resources Collection Program | 723 | 50.4 | ARC/WARC | Y | Selective crawls with permission or notification. Thematic collections in: Human rights; New York City built environment; New York City religions; Resistance. Also capture Columbia University web domain. | |
| North Carolina State Government Web Site Archives | 51.5 | 3.8 | WARC | Y | ||
| Latin American Web Archiving Project | Y | |||||
| Web Archiving Project for the Pacific Islands | 5.5 | ARC/WARC | Y | Includes sites of 18 countries. | ||
| Library of Congress Web Archives | 7741 | 420 | ARC/WARC | Y | Formerly MINERVA. Selective crawls with notification and permission; primarily event and thematic collections. | |
| Harvard University Library: the Web Archive Collection Service | 19 | 0.661 | ARC | Y | Selective crawls with no previous authorization. | |
| Web Archiving Service from California Digital Library | 216 | 25.2 | ARC/WARC | Can be done by partners | Y | Provides Web Archiving Service to partners worldwide. Was developed at the California Digital Library. |
| Bentley Historical Library Web Archives | 34.5 | 2.6 | ARC/WARC | Y | WAS service since 2010. | |
| University of Texas at San Antonio Web Archives | 26 | 1.135 | ARC/WARC | Y | University administration, faculty and student sites; as well as selective captures on San Antonio and South Texas subject areas, including San Antonio organizations; San Antonio Online Journals and Blogs; Tejano and Conjunto music; Gay, Lesbian, Bisexual, Transgender and Queer Related Web sites in Texas, San Antonio and the Rio Grande Valley; Immigration/Borderlands; Mexican Cooking Blogs; San Antonio Restaurants; Renewable Energy in Texas; Rio Grande Valley Organizations; and Rio Grande Watershed and Texas Water Issues. | |
| AUEB Web Archive | 3 | WARC | aueb.gr | N | The amount of data crawled from the domain aueb.gr ranges between 10GB and 14.9GB. The data is stored on disk compressed and requires between 8.8GB and 9.7GB, resulting in space savings between 12% and 35%. In the case of new crawl, we can only store on disk the Web pages that change since the previous crawl. Consequently, we crawled 13.1GB from the domain aueb.gr, but we only stored on disk 1.6GB, resulting in space savings of 88%. | |
| World Bank Web Archives | 0.143 | HTTrack | no, so far | Y | 450 sites with historical or research value have been harvested since 2007, each archived before being taken offline or before a major upgrade. | |
| University of North Texas CyberCemetery | 0.887 | WARC | .gov | Y | ||
| Bibliotheca Alexandrina's Internet Archive | 80000 | 1000 | ARC/WARC | Egyptian news and politics | Y | |
| York University Digital Library | 0.435 | WARC | yorku.ca + faculty requests | Y | ||
| Netherlands Institute for Sound and Vision web archive | ARC/WARC | Y | Among other av-heritage, Sound and Vision is tasked with archiving programmes broadcast by Dutch Public Broadcasters. Therefore, an important part of the web archive consists of websites of public broadcaster related to these programmes. Furthermore, websites are archived that do not have a direct link to the collection, but that are of interest in a broader, media-historical way. Examples are websites of commercial broadcasters. | |||
| Kentucky Department for Libraries and Archives | 3 | 0.3007 | WARC | Y | ||
| University of California, San Francisco Library | 12.5 | 0.587 | ARC/WARC | Y | Websites requested by staff and faculty, and growing list attempting to capture all UCSF websites as comprehensively as possible. | |
| Ivy Plus Libraries Confederation | 347 | 16 | ARC/WARC | Y | Selective crawls with notification. Thematic collections in politics and political protests, architecture, composers, design, gaming, geology, webcomics, documentary films, art, religion, sexuality, climate change, and more. | |
| Malaysian Government Web Archive | 10 | WARC | .GOV.MY | Y | Crawls only Malaysian public sector websites only. View is by subject, i.e. administration, economy, security, and social. | |
| National Library of Medicine | 122 | 9.1 | WARC | Y | - | |
| Smithsonian Libraries and Archives | 10 | WARC | Y | |||
| Common Crawl | 300 000 | 10 000 | ARC/WARC | worldwide | Y | Additional data products such as a graph of the web, and parquet indexes of urls and hosts. |
| 6700 | 1120 | WARC, FFV1, FLAC, JSONL | Multiple | Y | Global archive spanning user-generated content, obsolete web platforms, and interface artifacts. Indexes include defunct CMS exports, blog comment trees, forum structures, and visual UI states. Selective crawls emphasize digital ephemera recovery and platform shutdown captures. Data verified across five mirrored nodes. Status: Active. |
Access methods
| Name | URL history | Meta-data search | Full-text search | Memento Compliance | Comments |
| EU Web Archive | Y | Y | Y | Freely accessible to all via | |
| Australia's Web Archive | Y | Y | Y | No | Selected sites are publicly available through a directory structure. Domain harvests are not. The PANDORA Archive is indexed and searchable through the NLA's single search service Trove. The Australian Domain Harvests are full-text indexed but are not currently publicly available. The Australian Government Web Archive is searchable by URL and full-text indexes through its portal. |
| Our digital island, a Tasmanian Web Archive | Y | Y | N | No | Presents thumbnails generated through Html To Image supplemented in HTTrack. Information is organized in directory: A-Z Subject listing, A-Z Title listing. |
| Webarchive Austria | Y | N | Y | No | Possible to search for versions either by URL or in fulltext. The websites are only accessible on special terminals at the Austrian National Library. Has bookmarking feature which allows to save versions online and recall them at the library webarchive terminals. |
| Deutsche Nationalbibliothek | Y | Y | Y | No | Only accessible in the reading rooms of the German National Library. The metadata is included in the publicly accessible library catalogue. |
| DILIMAG | Y | Y | N | No | Metadata are publicly available, for the archived versions provides free or restricted access depending on the right holders agreement. Full-text search is implemented in the new version. |
| Bibliothèque et Archives nationales du Québec | Y | N | N | No | Provides access according to partner policy. |
| Government of Canada Web Archive | Y | Y | Y | Proxy | Library and Archives Canada makes its federal government web archives publicly accessible. Indices are available for discovering Canadian federal web resources alphabetically by authoring organization and by URL. Full text indexing is based on Lucene. |
| Web Information Collection and Preservation - WICP | Y | No | Archive content is only available in intranet in National Library of China. Some collections are publicly available, with meta-data search and browsable by collection. | ||
| Croatian Web Archive | Y | Y | Y | Proxy | Full open access. |
| Webarchiv | Y | N | N | N | Due to copyright restrictions, only a limited number of archived websites for which agreements were signed with the publishers is available online. For other resources you can find out whether a given website was archived and the number of harvested versions. Unlimited access to all resources in Webarchiv is available from public terminals in the National Library. |
| Netarkivet.dk | Y | N | Y | No | Online access granted only to researchers through a Citrix login to free text search based on Solr and a proxy solution that accesses an archive through the Wayback. It has established a framework for running batch jobs with the possibility of data mining. |
| Estonian Web Archive | Y | Y | N | No | Public access to archived content is allowed only with a permission of the copyright owner. Full archive is accessible merely to the web archive personnel. |
| Finnish Web Archive | Y | N | 15% of material. | No | URL search but on-site access to content. Full-text search is available to 15% of material. |
| BnF - Web Legal Deposit | Y | N | 15% of the collection | No | Accessible to authorized users through the reading rooms of the BnF Research Library located in Paris and Avignon and in partner libraries in regions and overseas territories. Wayback was customized and interface was translated to French. Full Text search only available on specific collections. Builds special collection galleries based on a selection from the archive on a given topic. |
| Ina (Institut National de l'Audiovisuel) | Y | Y | Y | No | Full text indexing is based on Lucene. To accommodate results from frequent crawls clustering is operated to handle similar versions of pages |
| E-diaspora | Y | N | N | No | 1381 sites are currently crawled to build an archive on migrants usage of the web, social studies researchers have launched a long run project based on this archive is handling crawls and storage |
| Internet memory Foundation | Y | Y | Y | No | Provides access and search services according to partners policy. |
| Bibliotheksservice-Zentrum Baden-Württemberg | Y | Y | Y | Native | Archived websites accessible via Archive-It; integrated in the SWB union catalog. Full open access for major part of snapshots, some restricted by IP. |
| Web archive of the German Bundestag | Y | N | N | No | Web archive itself are snapshots of www.bundestag.de and other websites. Navigation is possible by clicking on the years. |
| Iceland | |||||
| Palestine Web Archive | N | Y | N | No | Still in development and pilots |
| Web Archiving Project, The National Diet Library, Japan | Y | Y | Y | Native | All the archived websites are available on the premises. 85% of them is also accessible on the Internet with the permission of webmasters. |
| National Library of Korea - OASIS | Y | Y | Y | No | 100% of the archive is indexed. Enables search by topic classification. Search available. |
| Koninklijke Bibliotheek | Y | N | N | No | The web archive is accessible on terminals in the KB reading rooms to full members. |
| New Zealand Web Archive | Y | Y | Y | Native | Domain harvests: available to selected staff using Pywb and limited to URL searches. Selective harvests: each website is described in the catalogue and can be viewed by the public via the Internet by clicking on the link to the archived copy. A small subset of the selective harvests are accessible using full-text search. |
| The National Library of Norway | N | Y | No | Sites are integrated in the Catalog. Left bar enables facet navigation with drill-down. | |
| Arquivo.pt - the Portuguese web-archive | Y | Y | Y | A . is also supported. Archived data can be mined through an Hadoop platform or . | |
| Web archive of Cacak | N | N | N | No | Plans to develop a search engine in the future. One bad characteristic of HTTrack is that it renames files during the archiving, so the original structure of the website is lost, as well file names. |
| Y | Y | Y | No | The collection is viewable at the National Library, Singapore with selected content cleared by copyright owners available online. | |
| Digital Resources | Y | Y | N | No | It is possible to find out whether a website was archived and how many harvested versions exist. Due to the copyright restrictions only a limited number of archived websites is publicly available. The access to other archived resources is available locally in the University Library in Bratislava. |
| Slovenian Web Archive | Y | N | Y | No | The archive of selective crawls is publicly accessible. Use is possible by browsing and full-text search. National domain crawls are not accessible yet but will be in the future. |
| Archivo de la Web Española | Y | Y | Y | No | Plan to provide access on-site in the short-medium term. |
| PADICAT: The Web Archive of Catalonia | Y | Y | Y | No | Full open access. |
| Basque Digital Heritage Archive | Y | Y | Y | No | |
| Sweden | Y | N | N | No | Public access through dedicated machines in the library building. |
| Aleph Archives | Y | Y | Y | No | Enterprise-grade automatic web archiving platform for online capture and preservation. Support eDiscovery with powerful and qualitative technology. Aimed to corporations, institutions and agencies seeking to capture, preserve and leverage their Web content; dynamic websites, wikis, social media, forums, comments, disclaimers, and ads, for compliance, marketing or pure preservation purposes. |
| Web Archive Switzerland | Y | Y | Y | No | Web Archive Switzerland is the collection of the Swiss National Library containing websites with a bearing on Switzerland. Web Archive Switzerland has been integrated in e-Helvetica, the access system of the Swiss National Library, giving access to the entire digital collection. So you can do full text searching of a part of the Web Archive. But the archived versions of websites can only be viewed in the reading rooms of the Swiss National Library and of our partner libraries who help us build the collection of Swiss websites. But you can view the metadata of the archived versions from anywhere. |
| NTU Web Archiving System, NTUWAS | Y | Y | Y | No | Presents page thumbnails, archived pages mapped to geographical locations. |
| Web Archive Taiwan | Y | Y | Y | No | |
| PageFreezer | Y | Y | Y | No | Enterprise Class On Demand service to archive and replay websites, blogs, Ajax, Flash, video, audio & social media for litigation protection, eDiscovery and regulatory compliance with FDA, FINRA, FSA, SEC, SOX, Federal Rules of Evidence and records management laws. Used by government agencies and public listed corporations in Pharmaceutical, Food, Finance, Healthcare and Retail industry. |
| The UK Web Archive | Y | Y | N | ||
| Hanzo Archives | Y | Y | Y | No | Commercial web archiving services and appliances. Access includes full-text search, annotations, redaction, URL/History, archive policy and temporal browsing, and configurable metadata schema for advanced e-discovery applications. Used in government and corporations whose compliance or legal obligations / needs extend to their websites, intranet, and social media. Many 'dark' archives across Europe and USA. |
| UK Government Web Archive | Y | Y | Y | Full text search is operational on the UK Government Web Archive. Users can browse the collection using a full A-Z list of all sites | |
| Y | Y | Y | Full text search is operational on the EU Exit Web Archive | ||
| Internet Archive | Y | Y | Y | URL history is available for all archived data. Meta-data and full-text search only for selected crawls. Until 2002 had a mining platform for research composed by Alexa Shell Perl Tools av_tools and p2 platform for parallel processing. It was replaced by a simpler access and direct method that enables automatic access to files but no platform for processing. | |
| Columbia University Libraries Web Resources Collection Program | Y | Y | Y | No | Accessible through Archive-it service. |
| North Carolina State Government Web Site Archives | Y | Y | Y | No | Accessible through Archive-it service. |
| Latin American Web Archiving Project | Y | Y | Y | No | Content can be accessed via full-text search, or by browsing by country or by specialized sample collection. |
| Web Archiving Project for the Pacific Islands | Y | Y | Y | No | Supported by Archive-it service. |
| Library of Congress Web Archives | Y | Y | N | Proxy | Access provided via . Records in MODS format. |
| Harvard University Library: the Web Archive Collection Service | Y | Y | Y | No | |
| Web Archiving Service from California Digital Library | Y | Y | Y | No | Access for private study, scholarship and research. Most archives built with WAS have not yet been published because it is up to the partners to decide if they want to provide access. There are 16 partners using the service and they have created over 80 web archives, only 30 are publicly accessible. NutchWAX performance did not permit full archive search. Upcoming transition to SOLR will permit both full archive and collection-specific full text search. |
| Bentley Historical Library Web Archives | Y | Y | Y | No | Powered by the WAS from the California Digital Library. Access is public but usage is restricted for private study, scholarship and research. |
| University of Texas at San Antonio Web Archives | Y | Y | Y | Native | Accessible through Archive-it service and the Texas Archival Repositories Online database |
| AUEB Web Archive | Y | Y | Y | No | |
| World Bank Web Archives | Y | Y | Y | No | URL history provided via open access to collection via standard web browser. Full text search is only available within each individual site. Search on metadata is available via advanced search within Web Archives collection. |
| University of North Texas CyberCemetery | N | Y | Y | No | |
| Tamiment Library and Robert F. Wagner Labor Archives at New York University | Y | Y | Y | No | Access is provided through the WAS service as well as through finding aids that are searchable through NYU's finding aids portal. |
| York University Digital Library | Y | Y | Y | ||
| Netherlands Institute for Sound and Vision web archive | Y | Y | N | Selected sites for which agreements have been made are publicly available. Full text indexing is done with Elasticsearch, the front-end is built in Drupal. | |
| Kentucky Department for Libraries and Archives | Y | Y | Y | No | Full open access |
| University of California, San Francisco Library | Y | Y | Y | Native | Both capture and access for archived content are provided by the Archive it service, so all capabilities are same as for Archive-It |
| Ivy Plus Libraries | Y | Y | Y | No | Accessible through Archive-It service. |
| Malaysian Government Web Archive | Y | Y | Y | No | Open Access |
| National Library of Medicine | Y | Y | Y | Access is provided through Archive-It | |
| Smithsonian Libraries and Archives | Y | Y | Y | Access is provided through Archive-It | |
| Common Crawl | Y | Y | N | No | In addition to direct download, most of our archive is also available in the Internet Archive Wayback. |
| Y | Y | Y | Native | Full-text index across legacy markup, archived code fragments, and emulated interface states. Supports URL history reconstruction and metadata-based query expansion. Public search tools include URL timeline view and UI emulator access. Complies with the Decentralized Archival Ethics Accord. |