WP:WEBARCHIVES

This is an engineering document for those building bots needing help with URL formats, etc. See Help:Archiving a source for info about using these archive services.

List of known web archive services in-use on English Wikipedia. Sorted roughly by number of uses from most to least. The Wayback Machine is about 80% of the total. Data initially compiled by User:GreenC as of March 2017. Updates and corrections welcome.

Archive services[edit]

Internet Archive Wayback Machine

Article: Wayback Machine
Domain: archive.org, waybackmachine.org
Launched: 2001
Date range: 1996-
Hostname: <none>, web, wayback, liveweb, www, www.web, classic-web, web-beta, replay, replay.web, web.wayback
Path: <none>, web
Timestamp: Number 1 digit; 4–14 digits. Or "*". Or "?". Or combination. May also contain trailing chars like "re_" for (?), "if_" for frames and "im_" for images. If timestamp missing returns best available page.
Examples:

Oldest:

Newest:

http://web.archive.org/web/2/http..

Index:

http://web.archive.org/web/*/http..

Submit page:

Interval between manual captures: 45 minutes.
Prefix searching (except after the question mark in query strings)

Archive.Today

Article: archive.today
Domain: .today, .is, .fo, .li, .vn, .md, .ph
Launched: 2012
Hostname: <none>, www
Path: <none>
Timestamp: 4–14 digits; or digits + characters (see example)
Examples:

http://archive.is/20130101/http..
http://archive.is/2013.04.17-12:08:20/http..
http://archive.is/http.. (index page)

Submit page: http://archive.today/?run=1&url=... (requires JavaScript)
Interval between manual captures: 1 week.
Prefix searching with preview screenshots
Page history with preview screenshots

Archive.Today represents captured pages as a static snapshot, rendered by the Archive.Today server, and uses a fixed-width layout. Page resources such as JavaScript and CSS files are not retained separately. For example, styling from a separate CSS file is converted to inline CSS styling, embedded in the HTML source code.

Archived pages are initially served through their short URL format, an identifier with five case-sensitive alphanumerical characters and four characters on early captures from 2012. To obtain the long URL format with time stamp and the source URL, click "share" in the top menu or append "/share" to the URL. The full URL is listed in the window.

If a redirect page is saved, Archive.Today stores both the URL of the redirect page and the URL of the redirect target. The archived page can be found by entering either URL.

Additional restrictions

As of 2023, copies of Archive.org pages can only be saved once. This restriction applies as well to the digital library (archive.org/details/) which is subject to change, not only the Wayback Machine where pages are, besides infrequent exclusions, not subject to change anyway.

If a "Welcome to nginx!" page appears, it apparently either means the user has hit a rate limit or the site is doing maintenance work.

Web Citation (WebCite)

Article: WebCite
Domain: webcitation.org

Deprecated—no longer accepting new archive requests

Hostname: <none>, www
Path: base62ID, query, cache, getfile.php, <number>
Timestamp: None. Uses &date=2012-06-01+21:40:03 in query?url ; the short ID is base62 which converts to unix time
Examples:

National Archives UK

Domain: nationalarchives.gov.uk
Hostname: webarchive, yourarchives
Path: <none>
Timestamp: 4–14 digits
Examples:

Australian Web Archive

Article: Australian Web Archive (Trove)
Domain: nla.gov.au
Hostname: webarchive
Path: see examples. First-level path can contain "awa" or "wayback". Second-level path might contain a pandora.nla.gov.au URL with a third-level destination URL which may or may not have a scheme (http://). Or the second-level path might be the final destination URL. A special archive index page URL is also available as "/tep/"
Timestamp: Two types (20120727-0512, 20120326012340)
Examples:

Note: The Australian Web Archive incorporates the Pandora archive as well as the Australian Government Web Archive and the National Library of Australia's archive of the .au domain.

Note: No memento access

NLA Australia (deprecated)

Deprecated. Integrated into Australian Web Archive (Trove) above.

Domain: nla.gov.au
Hostname: pandora, trove, tep, webarchive, content.webarchive
Path: see examples. The /pan/ regex should be /pan/[0-9]{4,7}/
Timestamp: Three types (20120727-0512, S2000-Dec-5, 20120326012340)
Examples:

Note: Not to be confused with non-webarchive URLs that appear similar:

Note: No memento access

Ghost Archive

Domain: ghostarchive.org
Launched: ~2021
Hostname: [none]
Path: archive, varchive/<YouTube_video_ID>, iarchive/<Instagram_username>[/<Instagram_post_ID (optional)>]
Timestamp: 4 to 14 digits
Examples:

Note: to convert short-form to long:

For regular web pages:

https://ghostarchive.org/longurl/o9UcZ -> https://ghostarchive.org/archive/20210728022510/https://rms-support-letter.github.io/

For video pages, e.g. YouTube:

http://ghostarchive.org/vlongurl/UhCiGY75wVw -> https://ghostarchive.org/varchive/youtube/20100711020000/UhCiGY75wVw

To find the earliest and latest archive available use a timestamp of "1990" or "3000" e.g.

https://ghostarchive.org/archive/1990/https://rms-support-letter.github.io/ will find the earliest archived copy of that webpage, while
https://ghostarchive.org/archive/3000/https://rms-support-letter.github.io/ will find the latest.

The /archive/ path long form will work for all types of archives, for example https://ghostarchive.org/archive/20210728022510/https://youtube.com/watch?v=UhCiGY75wVw will redirect to the video, and https://ghostarchive.org/archive/20210728022510/https://www.instagram.com/p/BMUccRPg4Ow/ will redirect to the image.

Prefix searching with preview screenshots
Page history with preview screenshots

Ghost Archive uses the WARC ("webarchive") format to store saved pages, meaning the verbatim content of the page resources can be recreated. When opened, Ghost Archive uses the Webrecorder system to simulate the page as realistically as possible. Alternatively, the page can be viewed in "noscript", meaning as static HTML in its finished rendered state. This mode does not require JavaScript and is compatible with older browsers and loads faster, however some page features that rely on JavaScript such as pagination and collapsible menus are not available.

Due to Instagram's strict rate limiting, archival of Instagram profiles might fail and result in a blank page. If the archival of a YouTube video failed, an "Archiving error" page is displayed and the archival of the same video can not be retried.

Along with YouTube videos, their metadata is saved: publication date, descripption, and the URL of the channel in the /@ or /c format, whichever is available.

If the archived page redirects to a different URL, only the target URL is displayed. This means the archived page can not be opened by entering the URL of the redirecting page.

Megalodon.jp

Site name: web gyotaku
Date range: 2007-
Article: Megalodon (website)
Domain: megalodon.jp
Examples: https://megalodon.jp/2023-0522-0234-30/https://gstreamer.freedesktop.org:443/download/
No prefix searching

Similar to Archive.Today, Megalodon.jp represents archived pages as a static HTML snapshot. However, pictures are converted into BASE64 data: URLs inside the resulting HTML data, and there is no fixed width like Archive.Today .

Megalodon lets the user decide whether to save the desktop or mobile version of a page, meaning the version that appears to desktop computer and laptop users, or to smartphone users.

Using https://megalodon.jp/(full URL) (example: https://megalodon.jp/https://gstreamer.freedesktop.org:443/download/ ) can check if Megalodon archived any copy of a particular URL. http and https are treated separately.

If the archived page is a redirect to a different URL, only the URL prior to the redirect is saved. In that case, the archived page can not be opened by entering the target URL (example).

FreezePage

Domain: freezepage.com
Hostname: <none>, www
Path: <none>
Timestamp: <none> (only available via web scrape)
Examples:

Note: If the account ID which created the snapshot expires for lack of activity (no login to freezepage), the snapshot is deleted from freezepage.com

Note: No memento access

Library of Congress

Domain: loc.gov
Hostname: webarchive
Path: all, lcwa####
Timestamp: 4–14 digits
Examples:

Arquivo.pt (Portugal)

Domain: arquivo.pt
Hostname: <none>
Path: wayback, wayback/wayback, noFrame/replay
Timestamp: 4–14 digits ... might contain "mp_" see example
Examples:

Stanford University web archive

Domain: stanford.edu
Hostname: swap, sul-swap-prod
Path: <none>
Timestamp: 4–14 digits
Examples:

https://sul-swap-prod.stanford.edu/19940102000000/http://slacvm.slac.stanford.edu/FIND/slac.html

Archive-It

Domain: archive-it.org
Hostname: wayback
Path: "all", a 3–5 digit number; "org-" followed by a 3–4 digit number
Timestamp: 4–14 digits; "0" or "1" for oldest; "2" for newest; "*" for index
Examples:

Oldest:

Newest:

Index:

BibAlex

Domain: bibalex.org:80
Hostname: web.archive, web.petabox
Path: web
Timestamp: 4–14 digits
Examples:

WikiWix

Domain: wikiwix.com
Hostname: archive
Path: cache
Timestamp: 4–14 digits
Examples:

Note: Does not support Memento

Note: API access added in March 2018. By appending &apiresponse=1 to the end of the URL. (https://archive.wikiwix.com/cache/?url=http://www.linterweb.fr&apiresponse=1). This may require encoding of any other & in the url= section

Note: Supports &title argument at end of URL not part of the source URL (similar to &apiresponse). Gives the name of the Wikipedia article the link is being used in (optional).

National Archives US

Domain: webharvest.gov
Hostname: <none>
Path: <variable>
Timestamp: 4–14 digits
Examples:

http://webharvest.gov/peth04/20041022004143/http://www.ftc.gov/os/statutes/textile/alerts/dryclean

National Archives Iceland

Domain: vefsafn.is
Hostname: wayback
Path: wayback
Timestamp: 4–14 digits
Examples:

http://wayback.vefsafn.is/wayback/20110318105639/http://www.twitter.com/yagirldwoods

Europa Archives, Ireland (deprecated)

Deceased. In May 2018 all archives were moved to collections.internetmemory.org then as of September 2018, all archives were moved again to Archive-It [1]

Domain: europarchive.org
Hostname: collection
Path: nli
Timestamp: 4–14 digits
Move example:

Perma CC

Domain: perma-archives.org, perma.cc
Hostname: <none>
Path: <none>, warc
Timestamp: 4–14 digits for perma-archives.org, or snapshot ID
Examples:

Proni Web Archives (deprecated)

Deceased. As of October 2018, all archives moved to Archive-It [2]

Domain: proni.gov.uk
Hostname: webarchive
Path: <none>
Timestamp: 4–14 digits
Examples:

http://webarchive.proni.gov.uk/20111213123846/http

Move example:

Original: http://webarchive.proni.gov.uk/20100218151844/http://www.berr.gov.uk/
New: http://wayback.archive-it.org/11112/20100218151844/http://www.berr.gov.uk/

Parliament UK

Domain: parliament.uk
Hostname: webarchive
Path: <none>
Timestamp: 4–14 digits
Examples:

http://webarchive.parliament.uk/20160204060058tf_/http://www.parliament.uk/about/living-heritage/building/palace/big-ben/

UK Web Archive (British Library)

Domain: webarchive.org.uk
Hostname: www
Path: wayback/archive
Timestamp: 4–14 digits with possibility of "mp_" at end
Examples:

Libraries and Archives Canada (deprecated)

Deceased. As of May 2018 all archives moved to webarchive.bac-lac.gc.ca [3]

Domain: collectionscanada.gc.ca
Hostname: www
Path: archivesweb, webarchives
Timestamp: 4–14 digits
Examples:

http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
http://www.collectionscanada.gc.ca/archivesweb/20060209004933/http

Note: Not to be confused with other close URL variants. Only capture "/webarchives/" or "/archivesweb/"
Move example:

Original: http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
New: http://webarchive.bac-lac.gc.ca:8080/wayback/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48

Libraries and Archives Canada (www.bac-lac.gc.ca)

As of September 2022 the new website is https://library-archives.canada.ca/eng - most of the www.bac-lac.gc.ca may no longer be available but some links still work see: [4]

Domain: bac-lac.gc.ca:8080
Hostname: webarchive, www
Path: wayback
Timestamp: 4–14 digits
Examples:

http://webarchive.bac-lac.gc.ca:8080/wayback/20051228174058/http://nationalatlas.gov/

Note: Formerly collectionscanada.gc.ca see above. As of September 2022, links from collectioncanada.gc.ca were entirely removed by LAC.[5]

Catalonian Archive

Domain: padi.cat(:8080)?
Hostname: www, (none)
Path: wayback
Timestamp: 4–14 digits
Examples:

Web Archives Singapore

Domain: nlb.gov.sg
Hostname: eresources
Path: webarchives/wayback
Timestamp: 4–14 digits
Examples:

Note: Not to be confused with other close URL variants. Only capture "/webarchives/wayback/"

Slovenian Archives (Spletni)

Domain: nuk.uni-lj.si:8080
Hostname: nukrobi2 (may change)
Path: wayback
Timestamp: 4–14 digits
Examples:

http://nukrobi2.nuk.uni-lj.si:8080/wayback/20160203130917/http

Estonia Archives

Domain: digar.ee
Hostname: veebiarhiiv
Path: a
Timestamp: 4–14 digits
Examples:

http://veebiarhiiv.digar.ee/a/20131014091520/http://rakvere.kovtp.ee/en_GB/twin-cities

Bavarian Archives

Domain: bib-bvb.de
Hostname: langzeitarchivierung
Path: wayback
Timestamp: 4–14 digits
Examples:

http://langzeitarchivierung.bib-bvb.de/wayback/20121004142737/http://www.schwabenkrieg.historicum-archiv.net/

York University Digital Library

Domain: yorku.ca
Hostname: digital.library
Path: wayback
Timestamp: 4–14 digits
Examples:

https://digital.library.yorku.ca/wayback/20160129214328/http://en.cijnews.com/?p%3D10033

National Library of Israel

Domain: wayback.nli.org.il
Path: <none>
Timestamp: 14 digits
Format: http://wayback.nli.org.il/{14_digits}/{URL}

Appears to have poor coverage.

Other[edit]

Memento Web

https://timetravel.mementoweb.org/memento/2010/http://www.muslimdirectory.co.uk/displayresults.php?PHPSESSID=f0fb8b41d8758983e7d43cddb556b9df&businesstype=1&orgtype=&country=UK&city=Cardiff

Note: Redirects to an external archive service based on cached data in the Memento database which can fluctuate and/or be inaccurate due to the cache going out of sync with the client service.

Google Cache (ephemeral)

http://webcache.googleusercontent.com/search?q=cache:http://www.gapan.org/ruth-documents/Masters%2520Medal%2520%2520Press%2520Release.pdf

Note: Links quickly expire.

Note: Cannot be accessed with Memento.

Bing Cache (ephemeral)

Only accessible through search results, not manually through an URL or search prefix.

Undocumented[edit]

https://cachedview.nl/ ^[1]
https://cachedview.com/ ^[2]
http://www.cachedpages.com/ ^[2]
https://commoncrawl.org/ ^[3]
https://www.bravenewtech.org/ "Video Vault" ^[4]^[5]
https://conifer.rhizome.org/ ^[6]
https://archive.st/ ^[7]

References[edit]

^ "Wayback Archives". Tracetools. Retrieved 27 June 2022.
^ ^a ^b Bederov, Igor S. (8 May 2022). "Web Archive as an OSINT Tool". Medium. Retrieved 27 June 2022.
^ "Examples using Common Crawl Data". Common Crawl. Retrieved 27 June 2022.
^ "RightsLab". rightslab.org. Retrieved 27 June 2022.
^ "rightslab". Twitter. Retrieved 27 June 2022.
^ "Webrecorder". webrecorder.net. Retrieved 27 June 2022.
^ "wiki.Archiveteam". archiveteam.org. Retrieved 27 June 2022.

Archive services[edit]

Internet Archive Wayback Machine

Archive.Today

Web Citation (WebCite)

National Archives UK

Australian Web Archive

NLA Australia (deprecated)

Ghost Archive

Megalodon.jp

FreezePage

Library of Congress

Arquivo.pt (Portugal)

Stanford University web archive

Archive-It

BibAlex

WikiWix

National Archives US

National Archives Iceland

Europa Archives, Ireland (deprecated)

Perma CC

Proni Web Archives (deprecated)

Parliament UK

UK Web Archive (British Library)

Libraries and Archives Canada (deprecated)

Libraries and Archives Canada (www.bac-lac.gc.ca)

Catalonian Archive

Web Archives Singapore

Slovenian Archives (Spletni)

Estonia Archives

Bavarian Archives

York University Digital Library

National Library of Israel

Other[edit]

Undocumented[edit]

See also[edit]

References[edit]