The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.

Operator: Mets501

Automatic or Manually Assisted: Automatic

Programming Language(s): AWB for this task

Function Summary: Tagging talk pages with no assosciated articles.

Edit period(s) (e.g. Continuous, daily, one time run): One time run until all pages tagged are dealt with, and then after that possibly more runs in the future.

Edit rate requested: 3 edits per minute on low traffic days only

Already has a bot flag (Y/N): Y

Function Details: The bot will go through all talk pages that do not have an associated article and tag them with a template such as this:

This page may meet Wikipedia's speedy deletion criteria, as it is a talk page of a page which does not exist (CSD G8).

This notice was added by a bot. There are three reasons why a talk page may exist without an article, which are:

  1. Suggestions for a future article
  2. Important deletion discussion only available at this page
  3. If this page is a non-vandalism subpage.

Only delete this page if it clearly does not meet any of those three criteria.

This will add pages to a new category, such as Category:Talk pages with no main page, not to C:CSD, as to avoid flooding. It will get these pages through list comparing a database dump; it will compare the list of pages in the main space to the list of pages in the talk namespace, and all talk namespace articles that do not have an associated main space page will be tagged.

Once the bot has tagged all of these talk pages (in all talk namespaces except user talk), admins (including me) can go through Category:Talk pages with no main page and either delete the pages, or change the template from ((db-botnomain)) (or some other name) to ((db-botnomainreviewed)) (or something like that), which will exempt those pages from being added to Category:Talk pages with no main page in the future.

Discussion

[edit]

A particularly sneaky vandal could create talk subpages for articles that don't exist. Any way for the bot to identify that? I'm guessing that if it is a subpage most admins will assume it is fine and won't bother checking to make sure the article it is a subpage of actually exists, the bot identifying such pages would be easier. VegaDark 22:24, 22 September 2006 (UTC)[reply]

Yes, it will identify subpages. I will alter the template above to say a non-vandalism subpage. —Mets501 (talk) 22:38, 22 September 2006 (UTC)[reply]
I guess I don't see the advantage of hundreds/thousands of page edits, versus a database dump that compiles a single list in project space. It's also easier to annotate (valid, already checked) in a list, vs. a category, and you can update several at a go, again obviating the need for multiple edits on our well-taxed servers.
Mind you, I think that excluding pages with Category:Wikipedia articles requested through talk page creation (for starters) is a good purpose, but that category already exists, albeit not very well populated as yet. -- nae'blis 22:41, 22 September 2006 (UTC)[reply]
I see what you're saying. However, I do think it's much better with a category because it's easier when pages that are dealt with are automatically removed. If a page is deleted, it's removed from the category, and if a page is not deleted, when the tag is changed to ((db-botnomainreviewed)), the page will be removed from the category. A few hundred edits spaced out on a low traffic day (like Tuesday, Wednesday, or Thursday) is not a huge server load either. —Mets501 (talk) 22:56, 22 September 2006 (UTC)[reply]
Task approved. Voice-of-All 22:28, 24 September 2006 (UTC)[reply]
Thanks! —Mets501 (talk) 22:38, 24 September 2006 (UTC)[reply]

I've been asked on my talk page to create a plugin to do this, which I'm happy to do - it's a simple job. One thing though: the request said to add ((db-botnomain)) if the main article exists - shouldn't it be if the main article doesn't exist? --kingboyk 10:51, 26 September 2006 (UTC)[reply]

Oh, yeah. Sorry, that was a typo :-) —Mets501 (talk) 10:56, 26 September 2006 (UTC)[reply]

This is clearly not a sensible job for a bot, in fact offline reports are already produced, and save a massive amount of edits/bandwidth etc. over using a bot. Martin 14:09, 27 September 2006 (UTC)[reply]

Let me first point out that my writing a plugin was simply responding to a request from a fellow Wikipedian, and not an endorsement of this application as a member of the approvals group. That said, I think this idea has merit (whether or not a bot is used). In my experience, admins are over-zealous when it comes to speedy deleting "orphan" talk pages, regularly zapping talk pages which contain notes about planned articles. Also, some WikiProjects are experimenting with a "Needed-class", where talk pages are marked as "an article is needed here" (see Wikipedia talk:Version_1.0_Editorial_Team/Index#The_.22Needed.22_class). If such pages could be spared the chop by using ((db-botnomainreviewed)) I think it would be advantageous. Again, whether or not it's a bot job (and I don't currently see why it "clearly" isn't) is open to discussion. --kingboyk 14:16, 27 September 2006 (UTC)[reply]
It clearly isn't because the offline reports produced previously worked fine. How would a category make admins less zealous than a list? We could still have a category for talk page that shouldn't be deleted. Martin 15:14, 27 September 2006 (UTC)[reply]
Progress is made by people trying new solutions; the "it ain't broke don't fix it" approach is sensible but doesn't lead to many breakthroughs :) The lists are static, out of date, and large. I think categorisation is a much better way of doing this. One point though: where does the data come from for the bot? A database dump? Or the page we're discussing? --kingboyk 15:17, 27 September 2006 (UTC)[reply]
The lists of mainspace articles talk pages were cleared in days before, no one can be bothered to filter through the other stuff as most of it is legit. I don't see how making thousands of edits could be anything other than a waste of time and resources. Martin 15:23, 27 September 2006 (UTC)[reply]
Just a note that I'll wait to continue bot operations until we're at agreement here. I do believe that a category is better, however, because it is dynamic, and reviewed pages can easily be removed from the category by changing the template instead of having to be removed from a static list, and new pages can easily be added by tagging them with a template instead of adding to a list. —Mets501 (talk) 21:35, 27 September 2006 (UTC)[reply]
Looking through that list, the ones for articles seem to be done (at least that were listed there) so they likely are not up to date, and the "other namespaces" one has pages that do have a main page. I'd advice running this bot, and then using Jude's automagic delete or something on the category.Voice-of-All 17:53, 29 September 2006 (UTC)[reply]
Agreed, those reports date back to April and isn't really helpful anymore, and when I asked if it were going to be updated anytime soon the creator referred me to this bot (making an update unnecessary), so apparently even he agrees this is a good idea. VegaDark 21:43, 29 September 2006 (UTC)[reply]
More out of interest than anything, but I'll repeat my question: how are the lists for this bot being made? --kingboyk 21:56, 29 September 2006 (UTC)[reply]
Sorry for not answering: simply a list of talk pages from the most recent database dump. I'm going to continue operations of the bot now. —Mets501 (talk) 22:01, 29 September 2006 (UTC)[reply]
...but there might be an approvals group issue here, as I see it's tagging talk archive pages. Is that as designed or an oversight? Wouldn't it be best to filter out /Archive pages from the list before the run, or (getting more complicated) extend the plugin to check if the top level article exists? (e.g. if the page is Talk:I Like Cake/Archive 1 you check I Like Cake not I Like Cake/Archive 1). To do such a thing you'd either need to create a new instance of the AWB webcontrol and check the page in the background, or add the article to check to AWB's list, save state, and reprocess when AWB sends the article. Hint: Filtering the list would be easier ;) I dunno, just an observation, if you intend to tag talk archives and are confident somebody will check these pages rather than zap them automatically that's OK. --kingboyk 22:02, 29 September 2006 (UTC) (edit conflict)[reply]
Sorry about the tagging talk archive pages. I didn't filter the list at first, but after checking the bots first few edits and seeing how many archives were tagged, I'm now filtering "archive" from page titles. —Mets501 (talk) 22:33, 29 September 2006 (UTC)[reply]
Good man (/woman/creature). --kingboyk 22:42, 29 September 2006 (UTC)[reply]
I'm a male Homo sapien :-) —Mets501 (talk) 23:23, 29 September 2006 (UTC)[reply]
Actually, every talk archive it tagged was improperly named. It didn't have the proper naming conventions of Talk:I Like Cake/Archive 1, but everything it tagged so far has been Talk:I Like Cake (Archive 1), hence it should contine to tag those types so someone can move them to the proper name and then delete the old page. VegaDark 06:27, 30 September 2006 (UTC)[reply]
I don't think that necessitates tagging; that should happen preferably at list filtering stage, or a couple of lines of code can be added to the plugin to log potentially misnamed archives, or a renaming template added to the page (if there's too many for Mets to fix by himself). This template is specifically for talk pages which are deletion candidates, not renaming candidates. --kingboyk 10:43, 30 September 2006 (UTC)[reply]
Yes, what I mean was they should still be tagged, but not with this message. They should get something saying they need to be moved. VegaDark 22:27, 30 September 2006 (UTC)[reply]

How will the bot know which talk pages don't have articles? (I'm just curious, as I did this manually a few months ago before the toolserver died, see User:Rory096/orphanedtalks, and I would have done it again if I had a way of getting a dump of orphaned talk pages). Also, will your bot be filtering out archives? That was the major problem with my dump, even after anything with the word "archive" was filtered out. --Rory096 17:56, 5 October 2006 (UTC)[reply]

It's just getting a list of all talks and seeing if the main page exists. I don't know if this project is going to go anywhere, however: like you said, basically all of the orphaned talks are legitimate for some reason or another. — Mets501  (talk 18:12, 5 October 2006 (UTC)[reply]
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.