How to archive YouTube videos on Wikipedia articles[edit]

There are several articles on Wikipedia which use YouTube videos as a reference. Since YouTube videos get blocked and/ or deleted frequently, I tried to archive some YouTube videos using Wayback Machine and archive.today but Wayback doesn't let me view the video on the archived webpage and archive.today fails to archive a youtube video every time. Is there any other way to archive youtube videos for preventing link rot on wikipedia articles? — Preceding unsigned comment added by Tech2009Girl (talkcontribs) 10:56, 29 January 2022 (UTC)[reply]

@Tech2009Girl: If Wayback doesn't work try https://ghostarchive.org Rlink2 (talk) 14:15, 29 January 2022 (UTC)[reply]
"Wayback doesn't let me view the video" can you give an example? -- GreenC 16:34, 29 January 2022 (UTC)[reply]
@Rlink2: Thank you for the help -- User:Tech2009Girl
@GreenC: "Doesn't let me view" means that I'm unable to play the video. I'm not saying that the video doesn't even appear. -- User:Tech2009Girl
User:Tech2009Girl, can you provide an example? I can report it to IA and they will try to fix. It's a fairly new thing and they need feeback, example links not working. -- GreenC 16:50, 30 January 2022 (UTC)[reply]

Alexa.com[edit]

Hello, there is a note on alexa.com that it will be retiring on 1 May 2022 See https://support.alexa.com/hc/en-us/articles/4410503838999. We currently have just over 800 links to the site. Keith D (talk) 12:45, 12 March 2022 (UTC)[reply]

RfC: Wikipedia:Reliable_sources/Noticeboard#RfC:_Alexa_Internet -- GreenC 15:32, 12 March 2022 (UTC)[reply]

Finding all articles linking to a dead site?[edit]

I found a bunch of references to one dead site, but after poking around, I've found out that all the content is still there, just under a slightly tweaked website name. It's even retained the exact same URL structure as before, it's literally just the precise details of the website's name that's changed -- update that, and the links spring back to life. Is there any way to collect the articles which still cite the old URL, so I can correct them en masse using AWB? Searching the regular way shows me there are about 2.6k articles still using it (although some may have valid archives, which I'd leave untouched), but no easy way to convert that into one grand list for inputting into AWB. Buttons to Push Buttons (talk | contribs) 17:15, 18 May 2022 (UTC)[reply]

Does advanced search using insource: help in some way? For example, something like this. --Kompik (talk) 21:45, 6 January 2023 (UTC)[reply]

CiteSeerX links[edit]

I am not sure what would be the right place to ask about this - I have tried here. As far as I can tell, CiteSeerX changed the scheme they're using. For example, Template:CiteSeerX offers 10.1.1.34.2426 as an example. The link is dead, but looking in the Wayback Machine, we can see that it was the paper William D. Harvey , Matthew L. Ginsberg: Limited Discrepancy Search. In the new scheme, pid/efa56b710ff3c6d8b2666971d07c311eeb6c5b40 or pid/d8b76a9af36448b775997ef0a960e4b0fa585beb seem like the most likely candidate. Is there any chance to fix the old links in some other way than checking them one-by-one and replacing them by the new links manually? Is somebody aware of some announcement from CiteSeerX containing some details about the old identifiers and the new ones? --Kompik (talk) 10:27, 6 January 2023 (UTC)[reply]

Clicking archive.org link loads a blank page[edit]

At 2019 Military World Games, clicking the archive link in the following citation loads a blank archive.org page. Bad snapshot maybe?

<ref>((Cite web |url=https://results.wuhan2019mwg.cn/index.htm#/organisation |title=Archived copy |access-date=2019-10-19 |archive-url=https://web.archive.org/web/20191028234735/https://results.wuhan2019mwg.cn/index.htm#/organisation |archive-date=2019-10-28 |url-status=dead ))</ref>

What should be done? Is there a way to repair this? Should it be deleted? Marked with something? Thanks. –Novem Linguae (talk) 21:54, 4 April 2023 (UTC)[reply]

@Novem Linguae theres a hashtag in the link, wayback machine does not support hashtags. Notrealname1234 (talk) 00:31, 29 May 2023 (UTC)[reply]
The only way (i think) to repair this is to put the link in a another web archiving service (like archive.is) Notrealname1234 (talk) 00:32, 29 May 2023 (UTC)[reply]

Archiving hundreds of healthy (live) sources[edit]

After asking at the wrong place, I ask here: is there any need for this type of edit? Shouldn't we just archive dead or unfit sources? Unlike the former, this edit actually makes sense because it did rescue sources. What I usually do is to rescue sources manually, this prevents outdated archives, i.e. archived pages that present (very) old information compared to live sources (e.g. a page showing information from 2023 and another showing information from 2015.) SLBedit (talk) 16:13, 7 June 2023 (UTC)[reply]

Archiving at the time of usage guarantees (in most cases) that we have an archive of the source from when it was used and cited. Once it's been archived once, Archive.org will probably continue to archive it. The worst-case scenario is a source that is used but not archived, and then disappears before it can be archived. Mackensen (talk) 16:44, 7 June 2023 (UTC)[reply]
The worst case scenario I've seen is where the earliest archive of a source postdates the URL being redirected / usurped. Then there's the appearance of a source with an archive, when in fact there is neither.
I ran IABot on Monica Macovei yesterday, after spending hours manually finding archives for dead links that had been damaged in script-assisted editing, just to make sure I had found them all. It tagged zero as dead, so it's the same category edit as the one User:SLBedit linked above. On articles with hundreds of references to web sources, it's tedious to go through and check each manually, and there's no guarantee Internet Archive bot will find any dead URLs, but I always instruct it to archive sources just in case there's no archive yet. Folly Mox (talk) 16:55, 7 June 2023 (UTC)[reply]
  • WaybackMachine (archive.org) is dynamic, not a static database. Archives move and disappear, for many reasons. Because of this I wrote a bot called WP:WAYBACKMEDIC. It's verifying archive URLs still work. Example: Special:Diff/952770913/972737706 The problem with this bot it's very resource intensive (on the WaybackMachine) so it runs slow and is semi-manual, thus I don't run it very often. My opinion is archives should only be added into Wikipedia when the link is dead. In terms of adding archives into the WaybackMachine, that is already done automatically by a back-end process (not IABot). A script monitors every edit on all 300+ language wikis (including Enwiki) and when it detects a URL it makes sure this URL is added to the WaybackMachine. Nothing needs to be done with this part of the process it's being taken care of already (mostly) by invisible bots. It's a massive load on WaybackMachine disk space and bandwidth, it is being donated free of charge to Wikipedia by the Internet Archive. -- GreenC 20:06, 7 June 2023 (UTC)[reply]

Worst-case scenario in adding archives automatically: A source is archived several times throughout the years; in 2000, 2010, and 2020. A user adds that source to an article in 2023 and a bot adds a link to the 2000 archive. Source becomes dead in 2024 and the article points the reader to an outdated source. Conclusion: After an automatic archive, someone needs to make sure the article links to the proper archive, as there may be different versions of the original source. SLBedit (talk) 21:08, 8 June 2023 (UTC)[reply]

In my experience, the archival bots usually add the most recent archive version, which is actually frequently less reliable than earlier archives, since in many cases the target site will have restructured over the years and the newer archives point to empty content.
The ideal situation would probably be if bots could read the access-date= parameter and add the most recent archived version that does not postdate the access-date. Folly Mox (talk) 21:38, 8 June 2023 (UTC)[reply]
My understanding is that bots do read the access-date and do as you suggest, although I'm not entirely certain of that. Dhtwiki (talk) 23:17, 11 June 2023 (UTC)[reply]
I'm someone who reverts the addition of archive links en masse, especially when the byte count is high (>10k), when the original links are live, and when the editor has shown no interest in curating the page otherwise; and I receive plenty of questions and push back from even the most experienced editors. GreenC has explained why adding such links isn't actually making the archiving happen. When IABot runs on its own, it only adds links to archives *when the original is determined to have died*. That is sensible, and I don't know why editors shouldn't be encouraged to check that option (assuming that it is an available option). Also, when links die, it's a good time to check for website reorganization, and reset the original link, as well as check for citation relevance (e.g. has scientific, economic, or census data been superseded?). I keep thinking that there should be an RfC on limiting the addition of archive links, but, as I said, I've received too much push back, from editors thinking that they're doing good, to think that such an RfC would easily pass. Dhtwiki (talk) 23:29, 11 June 2023 (UTC) (edited 00:48, 22 September 2023 (UTC))[reply]

Is there automation for adding archive links that I have found to citations?[edit]

I frequently find myself manually finding archived content from a citation in order to verify a claim, but I don't always have time to update the citation accordingly to aid future readers. Is there a tool that automates adding the necessary three(?) parameters to a citation if I already have the archive link in hand? I would prefer this over using a bot, for the reasons given above. Orange Suede Sofa (talk) 00:20, 8 August 2023 (UTC)[reply]

I think that activating the WP:IABOT would do what you are looking for. According to the documentation, you can run the bot on demand, but a low-effort approach would be to mark a link as dead, as the highest priority task appears to be to look for dead-tagged links and replace them with archived links. There are also instructions on activating the bot here and here. Personally, I prefer to do the replacement manually as I try to take into account the url access date when selecting a particular archived version to use as the accessible URL, but that is certainly not a necessary thing and not for someone who wants to get 'er done quickly. --User:Ceyockey (talk to me) 00:43, 8 August 2023 (UTC)[reply]
To your point about preferring the manual approach, that's exactly what I'm trying to account for— I've usually already done the work of selecting the appropriate archive URL, now I just need the citation updated. If I understand the docs right, running the bot wholesale does everything from scratch; I'm looking for a way to automate the step of adding the appropriate links to the citation if I already have the archive link. In short, I've done the first part manually (finding the best archive URL) and want to automate the second half (updating the citation). Orange Suede Sofa (talk) 00:58, 8 August 2023 (UTC)[reply]
Yes, the bot does everything from scratch. Hmm -- you might try Wikipedia:AutoWikiBrowser. I have not used it for this particular use case, but it might be applicable. This would be a "semi-manual" approach, but you could quickly run through dozens of edits faster than via the main editing interface, I think. Take a look and see what you think. One tricksy thing - you will need to create a bot account with a bot password for yourself; the documentation notes this, but it's not obvious on a quick read, I think. --User:Ceyockey (talk to me) 01:24, 8 August 2023 (UTC)[reply]
Thanks; I didn't think of that. I've had AWB perms in the past, so maybe once I have a plan I can whip up something quick and apply. Regards, Orange Suede Sofa (talk) 01:35, 8 August 2023 (UTC)[reply]
I use AutoHotKey. You can set it up so that whenever you type the letters "3archive", it will automatically replace it with "|archive-url= |archive-date= |url-status". I also use it to generate empty cite webs, books, journals .. various things like that, saves a lot of typing. -- GreenC 01:49, 8 August 2023 (UTC)[reply]
Client-side scripting is a good approach too; I don't have Windows so I can't use AutoHotKey, but that page has given me enough pointers to go dig around. Thanks! Orange Suede Sofa (talk) 01:59, 8 August 2023 (UTC)[reply]

To wrap up the question for myself, and in case this helps anyone else, I created an AppleScript to take a Wayback URL from the clipboard, parse the URL for the date, and then automatically add the |archive-URL= and |archive-date= parameters to an existing citation, pre-filled and with no additional typing needed. More info here. Orange Suede Sofa (talk) 02:40, 15 August 2023 (UTC)[reply]

Mass additions of archive links for live sites[edit]

This discussion emerges from those on Billjones94's talk page, my talk page, and other previous discussions (1 again @ Billjones94, 2 again @ Billjones94, 3 @ Wikipedia:Bots, 4 @ Wikipedia talk:Link rot, 5 @ Village pump). Tags: Billjones94, Rhododendrites, Scyrme, Novem Linguae, ActivelyDisinterested, Izno, Kuzma, GreenC, Folly Mox, Dhtwiki, DMacks, and Cyberpower678. Please tag others.

I propose an addition to this page as follows:

After the words "in general, do not" in the second paragraph of the lede, insert "(with automated tools or otherwise) add archive links for live websites or".

The paragraph would then read In general, do not (with automated tools or otherwise) add archive links for live websites or delete cited information solely because the URL to the source does not work any longer.

My understanding, for the record, is that links cited on the English Wikipedia are automatically archived. Hitting the check mark in the IA Bot to add those archived links for live sites does not archive anything. It does not actually archive those pages nor does it update those archives. It just adds the links themselves to the article text. Moreover, archive links are automatically substituted for links that become dead.

What these archive links for live websites do is profoundly clutter the editor. This makes it very difficult for humans to parse. An example of this is the old version of Julius Caesar. This was a single citation (for a half of a sentence about Caesar's wife) therein:

<ref>Suetonius, ''Julius'' [https://penelope.uchicago.edu/Thayer/E/Roman/Texts/Suetonius/12Caesars/Julius*.html#1 1] ((Webarchive|url=https://archive.today/20120530163202/http://penelope.uchicago.edu/Thayer/E/Roman/Texts/Suetonius/12Caesars/Julius*.html#1 |date=30 May 2012 )); Plutarch, ''Caesar'' [https://penelope.uchicago.edu/Thayer/E/Roman/Texts/Plutarch/Lives/Caesar*.html#1 1] ((Webarchive|url=http://webarchive.loc.gov/all/20180213130122/http://penelope.uchicago.edu/Thayer/e/roman/texts/plutarch/lives/caesar%2A.html#1 |date=13 February 2018 )); Velleius Paterculus, ''Roman History'' [https://penelope.uchicago.edu/Thayer/E/Roman/Texts/Velleius_Paterculus/2B*.html#41 2.41] ((Webarchive|url=https://web.archive.org/web/20220731043323/https://penelope.uchicago.edu/Thayer/E/Roman/Texts/Velleius_Paterculus/2B%2A.html#41 |date=31 July 2022 ))</ref>

When I removed these archive links en masse, the page shortened by over 28,000 characters (probably upward of 35,000 after including all of my edits). Again, these additions are not necessary to preserve the text of the cited source. This is a live website. And if it became dead the archive URL would be automatically inserted. The costs are, however, substantial for active editors. Just finding real article text, as opposed to background mark up, in articles packed with these archive links becomes difficult.

Moreover, removing these archive links is significantly more difficult than adding them. It is almost trivial for someone to add unnecessary archive URLs. Not to pick on Billjones94 (the selection is merely because this series of discussions emerges from an edit on Roman Republic),[a] the following edits were all done within a single hour:

Billjones94 contribs log, excerpt

03:01, 16 April 2022 diff hist +5,778‎ Mohammedan SC (Dhaka) ‎ Rescuing 28 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:58, 16 April 2022 diff hist +2,468‎ Churchill Brothers FC Goa ‎ Rescuing 14 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:54, 16 April 2022 diff hist +15,029‎ Pune FC ‎ Rescuing 78 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:49, 16 April 2022 diff hist +8,917‎ Salgaocar FC ‎ Rescuing 48 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:46, 16 April 2022 diff hist +425‎ Sreenidi Deccan FC ‎ Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:45, 16 April 2022 diff hist +1,984‎ Moinuddin Khan (footballer) ‎ Rescuing 9 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:44, 16 April 2022 diff hist +1,246‎ Punjab FC ‎ Rescuing 6 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:41, 16 April 2022 diff hist +1,937‎ Sudeva Delhi FC ‎ Rescuing 10 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:39, 16 April 2022 diff hist +8,071‎ Sporting Clube de Goa ‎ Rescuing 42 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:36, 16 April 2022 diff hist +1,841‎ FC Kerala ‎ Rescuing 9 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:32, 16 April 2022 diff hist +3,159‎ Mohammed Rahmatullah ‎ Rescuing 16 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:30, 16 April 2022 diff hist +7,003‎ Kerala United FC ‎ Rescuing 36 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:27, 16 April 2022 diff hist +5,427‎ Peerless SC ‎ Rescuing 26 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:25, 16 April 2022 diff hist +3,337‎ NEROCA FC ‎ Rescuing 16 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:21, 16 April 2022 diff hist +2,285‎ Aizawl FC ‎ Rescuing 12 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:19, 16 April 2022 diff hist +1,720‎ TRAU FC ‎ Rescuing 9 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:16, 16 April 2022 diff hist +1,783‎ FC Kochin ‎ Rescuing 9 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:14, 16 April 2022 diff hist +11,674‎ Dempo SC ‎ Rescuing 63 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:10, 16 April 2022 diff hist +4,811‎ Mahindra United FC ‎ Rescuing 27 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:07, 16 April 2022 diff hist +3,616‎ South United FC ‎ Rescuing 19 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:05, 16 April 2022 diff hist +3,532‎ Hindustan Aeronautics Limited SC ‎ Rescuing 18 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

02:02, 16 April 2022 diff hist +4,435‎ ONGC FC ‎ Rescuing 25 sources and tagging 0 as dead.) #IABot (v2.0.8.7 thank Tag: IABotManagementConsole [1.2]

Not a single source was tagged as dead. The average edit added 4,657 bytes of text and in total this single hour of triggering IA Bot added 100,478 bytes to Wikipedia's servers.[b] I also firmly believe that these edits fall within the scope of WP:MEATBOT and WP:FAITACCOMPLI. Undoing them one by one after intervening edits is extremely difficult; Billjones94 has been repeatedly informed and tagged of how these mass additions are controversial with absolutely no response beyond "Thanks" on talk page edits. Nor do I believe for a second that anyone can review 11,674 bytes of additions – Dempo SC; around 3,000 bytes reviewed per minute – in the elapsed four minutes between the last edit.

Moreover, the archive links are also generated for paywalled sources hosted on Jstor (other services like Cambridge Core or Oxford Academic suffer similarly). For example, at Roman Republic:

<ref>((Cite journal |last=Steel |first=Catherine |date=2014 |title=The Roman senate and the post-Sullan "res publica" |url=https://www.jstor.org/stable/24432812 |journal=Historia: Zeitschrift für Alte Geschichte |volume=63 |issue=3 |pages=323–339 |doi=10.25162/historia-2014-0018 |jstor=24432812 |s2cid=151289863 |issn=0018-2311 |access-date=26 May 2022 |archive-date=26 May 2022 |archive-url=https://web.archive.org/web/20220526152815/https://www.jstor.org/stable/24432812 |url-status=live ))</ref>

In those cases, the archive links do not preserve anything at all. Going to the archive URL loads a single front page of the article. On my computer the image thereof does not even load, leaving a blank page with the citation at the right. Given the stability of Jstor, there are functionally no benefits to these paywalled archive links. The costs in the editability of these articles remains however. Inasmuch as nothing is added for readers, editing ought to take priority.

Concluding, I want to emphasise three things. First, link rot is a semi-solved problem in which these WP:MEATBOT-esque additions do not help. Second, the enormous volume and rapidity of these WP:FAITACCOMPLI additions make them both harmful to actual content contribution and difficult to remove. Third, many times these archive links add nothing between the website still being live and paywalled sources' archives still being paywalled. We should edit the guidelines to reflect these facts and require adding archive links for live URLs to be justified instead of accepted by default.[c][d] Ifly6 (talk) 16:11, 21 September 2023 (UTC)[reply]

Notes

  1. ^ The edit in question was this one. It tagged zero sources as dead and added 6,000 bytes to the article. Triggering it took probably like 15 seconds. Doing nothing and removing it later would have taken hours.
  2. ^ [OP] Descriptive statistics of those edits: n is 22, mean is 4567.18, median is 3434.5, max is 15029, min is 425.
  3. ^ To be clear, I [Ifly6] am not against archive links for live URLs in all cases, I think the following are examples of reasonable justifications: reasonable expectation of the source imminently becoming dead, actual evidence that the archive bypasses paywalls or the GDPR, or actual evidence of the source actually changing.
  4. ^ My [Ifly6] interpretation of In general, do not (with automated tools or otherwise) add archive links for live websites is that it should not be done without justification and addition of archive links for live websites would then become an affirmatively justified burden.
I don't think it follows we should have an RfC to add an archive link to every URL on Wikipedia (with some exceptions). That probably will fail. The feature in question here with selective articles has some utility, the question is should we continue to have this feature on enwiki and if so under what conditions - anyone can run it anytime, only certain users, only x times a day, etc.. what are the guidelines for this feature? Right now there are none, other than it has to be initiated manually which slows the user down some. -- GreenC 16:50, 21 September 2023 (UTC)[reply]

Paywalled landing pages should not be archived[edit]

Links to some previous discussions

The above discussion (Mass additions of archive links for live sites) discusses why indiscriminate archiving of live links is a problem and a broader mechanism – don't add archive links for live websites unless you have an actual and specific reason – for resolving them.

Per Folly Mox's minimalist framing of the question at Talk:Citation bot, I propose the following (with appropriate wording to be determined):

As to the utility of archive links of paywalled landing pages, they are not useful and they do not provide full text. They are currently being added if you hit the check mark in the IA Bot management console. The resulting links are largely blank. They do not archive anything or trigger anything to archive anything while introducing extremely large amounts of markup with no value. Ifly6 (talk) 05:21, 2 October 2023 (UTC)[reply]

external links: URLs that were broken due to editing errors[edit]

(crosspost from Wikipedia:Teahouse#external links: URLs that were broken due to editing errors)

hello maintainers. I made a List of ~10000 brocken URLs User:ⵓ/Worklist brocken URLs with Quarry:query/78127 (feel free to fork it)

The SQL query filters not existing top level-domains, all URLs in this List are broken. The Domain in list is in reversed order (el_to_domain_index)

Most of the cases are easy to fix (i.e remove a white space-character). In some cases I needed a URL-decoder ( meyerweb.com/eric/tools/dencoder/). More difficult cases can only be solved with the help of the version history or with the help of web archives and Google search. (i.e. https://en.wikipedia.org/w/index.php?title=2018%E2%80%9319_Ukrainian_First_League&diff=prev&oldid=1185545810 )

I fixed this kind of errors in german wikipedia so Quarry:query/77794 is clean. But I am not able to do this in english wikipedia. (talk) 17:57, 17 November 2023 (UTC)[reply]

I am looking for users who specialize in external link / link rot maintenance (talk) 07:27, 18 November 2023 (UTC)[reply]

Pew research study on link rot[edit]

I thought that this might be of interest to this community.

Peaceray (talk) 04:24, 20 May 2024 (UTC)[reply]

This Pew report is not very good surprisingly given their reputation for authority. The word "soft-404" appears nowhere in the document, yet this is one of the hardest problems in link rot detection, and accounts for a sizeable portion of all link rot. It looks like they simply checked for 404 links. They consider redirects, but these are often correct and not a problem. Many 404s can be made live again by replacing with a new URL (work done at WP:URLREQ). They discuss Wikipedia, but don't mentioned archive URLs, it's unknown if they are counting links as dead even though they have a live archive URL. Many devils in the details they pass over, so I'm not sure how useful this report is other than "many links die", which has been known for 30 years. I hope folks on Wikipedia understand this is an existential problem for our project, it's easy to imagine a wasteland in a few decades where most things are unverifiable, and a massive content deletion project begins to "clean up" per WP:V. -- GreenC 16:10, 20 May 2024 (UTC)[reply]