Jump to content

Wikipedia:Link rot/URL change requests/Archives/2024/October

From Wikipedia, the free encyclopedia


timesonline.co.uk

Old URLs for The Times don't work. While some of these have new URLs at thetimes.com, they can't be easily converted . For example, this is now here for Adele. Unfortunately, I think all of these links and the subdomains (entertainment.timesonline.co.uk, business.timesonline.co.uk, etc.) will need archives. It might be easier to do the subdomains first. Some articles already have archived links added like at Premier League. 15,000+ articles altogether. Thank you! MrLinkinPark333 (talk) 19:34, 12 September 2024 (UTC)

This is a difficult project due to a large number of soft-404s within archives:

soft404 rules for archives
  if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk":
    if url ~ "login=false":
      return "Check 6.131"
    if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/[?]CMP=":
      return "Check 6.132"
    if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/news/?([?](token=null|id=[a-zA-Z0-9]{2,10}$))?":  
      return "Check 6.137"
    if url ~ "(the)?(sunday)?(times(plus|online)?)[.]co[.]uk/[st][to][lo]/(news|news/world|tv-radio|business|travel|arts|arts/(film/reviews|tv-radio))/?$":
      return "Check 6.135"         
    if url ~ "the-tls[.]co[.]uk/tls/?$":
      return "Check 6.136"
    gsubs("://", "__T__", url)
    if url ~ "//":      
      return "Check 6.133"
    gsubs("__T__", "://", url)
    if url ~ "obituaries/?$":
      return "Check 6.134"               

..where "url" is the redirected URL the page was saved from, as indicated on the archive page ie. not the URL on wiki or the live redirect (if any).

Enwiki

  • Checked 15,686 pages and edited 13,589 pages. Moved 275 links to a new URL. Resolved 20,115 soft-404s. Removed 4 {{dead link}}. Added 6,721 {{dead link}}. Switched 28 |url-status=dead to live. Switched 1,736 |url-status=live to dead. Added 8,624 archive URLs (7,156 Wayback). Changed 593 citation metadata.
Explanation: the bot analyzed about 20,000 URLs - all dead and presenting as soft-404. Of those, about 17,000 the bot added an archive URL, dead link template or switched url-status to dead. The other 3,000 are uncertain but probably already have an archive URL and url-status=dead ie. nothing to do. The large number 6,721 {{dead link}} is unfortunate, it represents the problem noted above of archives containing soft-404. -- GreenC 19:21, 26 September 2024 (UTC)
That's too bad with the large about of dead links. If the new URLs were easy to convert, we could have swapped them over. Thank you for working on this! MrLinkinPark333 (talk) 19:25, 26 September 2024 (UTC)
Yeah this domain needed help because it was marked "Subscription" in the IABot DB (ie. skip processing), so most of them were dead with no archives. Normally I would "done" at this point, but I want to try a new experimental method for finding the live URL (it has a low probability of success) - I won't be able to start until next week. -- GreenC 13:23, 27 September 2024 (UTC)
Experimental method not working. -- GreenC 16:25, 30 September 2024 (UTC)

IABot DB

  • Checked and edited about 28,000 links which will propagate to 300+ wikis

 Done -- GreenC 16:25, 30 September 2024 (UTC)

Can this be run in Tewiki?

@User:GreenC, In Tewiki, we have more than 10,400 pages in the category CS1 errors: archive-url. Almost 99% of these are "timestamp mismatch" errors. Can you plesase run WaybackMedic_2.5 to correct the error in these pages? Thank you. __ Chaduvari (talk) 15:59, 31 July 2024 (UTC)

Ahh. I'd like to, but I am not setup for other wikis very difficult. The CS1 error: archive-url is across most wikis. Let me think about it because it's a growing problem. It might be I can process, but only some English-language templates like {{cite web}} that use English-language parameters like |archive-url=. GreenC 19:25, 31 July 2024 (UTC)
Hi GreenC, in tewiki, this template, like many others, use English parameters and templates only. This policy was kept to ensure future compatibility. Thanks. __ Chaduvari (talk) 09:29, 12 August 2024 (UTC)
User:Chaduvari, I could try some tests for Telugu Wiki. Can you help me get bot flag permissions for User:GreenC bot? I don't know where to start to ask permission. -- GreenC 18:19, 12 August 2024 (UTC)
@GreenC, you can raise the request at te:వికీపీడియా:Bot/Requests for approvals.__ Chaduvari (talk) 23:40, 12 August 2024 (UTC)
I made a request for approval. -- GreenC 02:30, 13 August 2024 (UTC)
User:Chaduvari, I have not forgotten about this. Have many other projects. Can you tell me what kinds of date formats might exist (date month year, periods or slashes etc) and what Teluga language months? Some examples. -- GreenC 16:56, 26 September 2024 (UTC)
@GreenC, you have been quick in responding to our request. In fact, we delayed in giving the bot flag.
The date formats confirm to those in enwiki. 2024-09-27 and 27 September 2024 are the most widely used ones. The month names are:
January జనవరి
February ఫిబ్రవరి
March మార్చి
April ఏప్రిల్
May మే
June జూన్
July జూలై
August ఆగస్టు
September సెప్టెంబరు
October అక్టోబరు
November నవంబరు
December డిసెంబరు
Please look for ref: "Ayodhyaverdict" at page:te:అయోధ్య వివాదంపై 2019 సుప్రీంకోర్టు తీర్పు. The archive date was incorrect in this citation. In the error message, the given Suggestion has the month name in Telugu. (Please look for the text -"మత సామరస్యాన్ని కాపాడాలని ప్రధాన మంత్రి బహిరంగ అభ్యర్థన చేసారు." I am referring to the first citation [10] after this sentence).
Thank you __ Chaduvari (talk) 00:26, 27 September 2024 (UTC)
OK. I can't see the red error message in the Wikitext, but it should be possible to scrape it from the HTML. Will investigate. Thank you. -- GreenC 01:14, 27 September 2024 (UTC)
The easiest way for me is to convert to ISO eg. |archive-date=2024-09-24. Most of the problems will probably be archive.today and webcitation.org (if any) so I would check every citation template with one of these archives and then reset the archive-date to ISO format, based on the value in the URL. -- GreenC 16:56, 26 September 2024 (UTC)

User:Chaduvari, the tracking category was reduced from 10,400 to 664 for a 94% reduction. The bot I wrote only fixes mismatches in dates. There are other types of errors tracked in that category that bot does not fix. For example citations with an |archive-date= but no |archive-url= (or other way around). Or citations with |archive-url= but no |url=. These are more complex to automatically fix. -- GreenC 04:03, 2 October 2024 (UTC)

Wow! Fantastic! @GreenC, thanks for eliminating so many errors.
Now that the errors are brought down by 94% (My estimate fell short by 5% :-)), we will take care of the |archive-url= and other errors manually.
Thank you very much. __ Chaduvari (talk) 04:53, 2 October 2024 (UTC)
In fact the number is brought down to 596! __ Chaduvari (talk) 04:54, 2 October 2024 (UTC)
User:Chaduvari: You are welcome. It can run automatically, every month or so, to keep the category in check. If you see problems it missed, that it should have caught, let me know. -- GreenC 05:17, 2 October 2024 (UTC)
Sure, GreenC ! Chaduvari (talk) 05:25, 2 October 2024 (UTC)
OK it will run each month, on the 2nd day. -- GreenC 02:35, 3 October 2024 (UTC)

foxnews.com/story

Old URLs for foxnews.com with numeric IDs either redirect to new URLs, redirect to the wrong page or are broken. Working URLs are mainly at www.foxnews.com/story/article-name

  • URL Changes:
    • With the above links, the numeric value is changed to the article title. Any punctuation marks are removed from the URL and all letters are lowercase.
    • For redirects that do not point to articles using /story/, I request trying to convert them using /story/article-name first. If that doesn't work, then I recommend archive URLs.

~3,200 articles.

Thank you! MrLinkinPark333 (talk) 20:48, 12 September 2024 (UTC)

Enwiki

  • Checked 3,248 pages and edited 2,346 pages. Moved 2,601 links to a new URL. Resolved 66 ghost redirects. Resolved 233 soft-404s. Removed 4 {{dead link}}. Added 6 {{dead link}}. Switched 900 |url-status=dead to live. Switched 10 |url-status=live to dead. Added 240 archive URLs (198 Wayback). Changed 175 citation metadata.
Analysis: converted about 3,500 to live URLs per the above rules (2,601 + 900). Another 250 or so added archive URLs. -- GreenC 18:07, 30 September 2024 (UTC)
Not bad at all! How successful were fixing the redirects to wrong pages? MrLinkinPark333 (talk) 18:10, 30 September 2024 (UTC)
It seems successful. A spot check of Disappearance of Natalee Holloway saw some. -- GreenC 21:25, 30 September 2024 (UTC)

IABot DB

  • Checked and updated about 5,700 links that propagate to 300+ wikis.

 Done -- GreenC 04:25, 2 October 2024 (UTC)

dnd.wizards.com

https://dnd.wizards.com now mostly redirects to https://www.dndbeyond.com; website was used as a primary source for various D&D articles. It looks like links that start with https://dnd.wizards.com/news/, https://dnd.wizards.com/articles/, https://dnd.wizards.com/dndstudioblog, https://dnd.wizards.com/dungeons-and-dragons, etc redirect to the D&D Beyond home page or change log. Some (like https://dnd.wizards.com/products/) redirect to similar pages on D&D Beyond but the D&D Beyond page often contains less information (such as not having the ISBN, author credits or other production info) so I think the whole lot should be marked as dead. Thanks! Sariel Xilo (talk) 22:29, 20 September 2024 (UTC)

159 pages -- GreenC 04:01, 21 September 2024 (UTC)

Enwiki

  • Checked 172 pages and edited 150 pages. Added 3 {{dead link}}. Switched 65 |url-status=live to dead. Added 169 archive URLs (159 Wayback). Changed 413 citation metadata.

IABot DB

  • Checked and fixed about 500 links which propagate to 300+ wikis

 Done -- GreenC 01:37, 7 October 2024 (UTC)

location.teamname.mlb.com

Each of the 30 MLB teams has a dead subdomain of the form <location>.<teamname>.mlb.com that should be archived, for example losangeles.angels.mlb.com. These now redirect to sites of the form mlb.com/<teamname>, and all content in the subdomains seems to be dead.

I combined the searches into 6 batches of 5 teams each, as combining all teams into one regex expression timed out the search and I didn't want to individually list the results for all 30 teams. I hope it isn't too difficult to process 30 different subdomains?

(Also, for some reason the searches counted a few pages where the text happened to contain <teamname>|mlb.com instead of <teamname>.mlb.com.)

(a regex "." means match any character thus it matched on "|" or whatever character; to search on a literal dot use "[.]" or "\." to escape the regex meaning of dot) -- GreenC 00:18, 3 October 2024 (UTC)

diamondbacks, braves, orioles, redsox, cubs: 1,305 pages.

whitesox, reds, indians, rockies, tigers: 1,181 pages.

astros, royals, angels, dodgers, marlins: 1,134 pages.

brewers, twins, mets, yankees, athletics: 1,118 pages.

phillies, pirates, padres, giants, mariners: 1,304 pages.

cardinals, rays, devilrays (both are subdomains for the same team), rangers, bluejays, nationals: 1,260 pages. Helpful Raccoon (talk) 05:16, 14 September 2024 (UTC)

Should be OK to combine into a single project since they use the same root domain, problems like soft-404s will be the same. Thanks for creating the separate searches. I saw one for "m.cubs.mlb.com" which is the mobile link for the Cubs. It is a soft-404, so looks like "*.cubs.mlb.com" need to be checked. -- GreenC 15:54, 14 September 2024 (UTC)

Enwiki

  • Checked 5,505 pages and edited 4,080 pages. Moved 4 links to a new URL. Added 4,124 {{dead link}}. Switched 1,160 |url-status=live to dead. Added 5,495 archive URLs (5,431 Wayback). Changed 721 citation metadata.
Comment: high number of {{dead link}} -- GreenC 21:27, 3 October 2024 (UTC)
Looks like WaybackMachine performance has been poor creating timeouts resulting in false negatives thus the high number of {{dead link}}. I am beginning to reprocessing those at a slower pace. -- GreenC 15:35, 5 October 2024 (UTC)
  • Round 2: Checked 1,921 pages and edited 1,426 pages. Added 2,388 archive URLs (2,388 Wayback).
Reprocessed the "Added 4,124 {{dead link}}" from above, due to Wayback Machine timeouts. Converted 2,388 {{dead link}} to archive URLs. -- GreenC 17:59, 6 October 2024 (UTC)

IABot DB

  • Checked and updated about 30,000 links which propagate to 300+ wikis

 Done -- GreenC 14:14, 8 October 2024 (UTC)

Some Vietnamese newspapers

RFI Vietnamese, VTC News and Zing News changed their domain names:

  • vi.rfi.fr and viet.rfi.fr -> rfi.fr/vi
  • vtc.vn -> vtcnews.vn
  • news.zing.vn and zingnews.vn -> znews.vn

Billboard Vietnam website (billboardvn.vn) has been closed. Cherry Cotton Candy (talk) 09:05, 22 September 2024 (UTC)

vi.rfi.fr

12 pages — Preceding unsigned comment added by GreenC (talkcontribs)

Tried this to that it doesn't work. -- GreenC 01:41, 7 October 2024 (UTC)
@GreenC Can you skip the above link and continue with the others? For example, http://vi.rfi.fr/viet-nam/20191111-nhung-nguoi-linh-viet-nam-hy-sinh-vi-nuoc-phap-trong-the-chien-i -> https://www.rfi.fr/vi/viet-nam/20191111-nhung-nguoi-linh-viet-nam-hy-sinh-vi-nuoc-phap-trong-the-chien-i Cherry Cotton Candy (talk) 13:09, 7 October 2024 (UTC)
Cherry, there are only 12. Could you do this manually? It will be less work than me programming the bot and working through the issues. -- GreenC 15:31, 7 October 2024 (UTC)

vtc.vn

197 pages — Preceding unsigned comment added by GreenC (talkcontribs)

zingnews.vn

246 pages — Preceding unsigned comment added by GreenC (talkcontribs)

billboardvn.vn and thanhniennews.com

Billboard 130 pages — Preceding unsigned comment added by GreenC (talkcontribs)

Thanhniennews 261 pages. These websites have been closed. Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)

tuoitre.com.vn

41 pages. Some articles can be found manually on tuoitre.vn, for example:

Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)

Unable to do by bot. -- GreenC 23:58, 7 October 2024 (UTC)

thanhnien.com.vn

124 pages. Some articles can be found manually on thanhnien.vn, for example:

Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)

Unable by bot. -- GreenC 23:58, 7 October 2024 (UTC)

laodong.com.vn

49 pages. Few articles can be found manually on laodong.vn, for example:

Cherry Cotton Candy (talk) 03:59, 23 September 2024 (UTC)

Unable by bot. -- GreenC 23:58, 7 October 2024 (UTC)

 Done -- GreenC 18:19, 8 October 2024 (UTC)

aviation-safety.net

These (currently) 299 results ought to have "/operator/airline.php?var=" replaced by "/operators/". Updating the redirected domain "aviation-safety.net" to "asn.flightsafety.org" could be done along the way as well. 1234qwer1234qwer4 16:02, 24 September 2024 (UTC)

User:1234qwer1234qwer4, given http://aviation-safety.net/database/operator/airline.php?var=6345 can you tell me the new URL? -- GreenC 16:07, 24 September 2024 (UTC)
http://aviation-safety.net/database/operators/6345 works, though it is a redirect to https://asn.flightsafety.org/database/operators/6345. 1234qwer1234qwer4 16:13, 24 September 2024 (UTC)

Enwiki

  • Checked 298 pages and edited 298 pages. Moved 1,073 links to a new URL. Resolved 8 ghost redirects. Switched 7 |url-status=dead to live. Switched 2 |url-status=live to dead. Added 22 archive URLs (21 Wayback).

IABot DB

  • Checked and fixed about 800 links which propagate across 300+ wikis.

 Done -- GreenC 22:52, 8 October 2024 (UTC)

planespotters.net

260 pages that should have "planespotters.net/Airline/" changed to "planespotters.net/airline/". 1234qwer1234qwer4 17:16, 24 September 2024 (UTC)

  • Checked 241 pages and edited 231 pages. Moved 251 links to a new URL. Removed 1 {{dead link}}. Added 1 {{dead link}}. Switched 99 |url-status=dead to live. Added 22 archive URLs (13 Wayback).

 Done -- GreenC 23:13, 8 October 2024 (UTC)