The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.

Operator: Cyberpower678 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 02:04, Thursday June 27, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): PHP

Source code available: No

Function overview: Tag all pages containing blacklisted links in the MediaWiki:Spam-blacklist and the meta:Spam blacklist with ((Spam-links))

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Unreliable_source_bot

Edit period(s): Daily

Estimated number of pages affected: Unknown. Probably hundreds or thousands at first

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: This bot scans the above mentioned lists and tags any page with blacklisted link with ((Spam-links)) in the article namespace.

Discussion[edit]

Since the sites on those lists have been determined to be spam, would it be better to simply remove those links? Would your bot only consider external links, or also references? Thanks! GoingBatty (talk) 02:33, 27 June 2013 (UTC)[reply]

I believe it would be better to simply tag them instead of remove them. It is uncertain whether removing them may end up breaking something. I can have my bot remove them instead if that is what is preferred, or the MediaWiki software turns out to inhibit the bot. As for your questions, it would handle any link matched in article space.—cyberpower ChatOnline 02:40, 27 June 2013 (UTC)[reply]
External links are not ((unreliable source)), they're external links. Also, you should probably skip links listed at MediaWiki:Spam-whitelist. And note there are also links on the blacklist that aren't there because anything using that link is unreliable, e.g. any url shortener is there because the target should be linked directly rather than via a shortener. Anomie 11:24, 27 June 2013 (UTC)[reply]
Thanks for the input. I could remove external links while tagging refs with ((unreliable source)).—cyberpower ChatOffline 12:32, 27 June 2013 (UTC)[reply]

Wait, you're tagging external links that are listed on the spam blacklist with ((unreliable source))? Unless I'm missing something here this won't work. When the bot tries to save the page, it will hit the blacklist and won't save. --Chris 13:55, 27 June 2013 (UTC)[reply]

I have considered that possibility, which is why my alternative is to simply remove the link and refs altogether.—cyberpower ChatOnline 14:51, 27 June 2013 (UTC)[reply]
In that case, I think this is something better dealt with by a human. Simply removing external links will probably lead to a bit of "brokenness" in the article where the link was, and would need human intervention to clean up after the bot. Also, if the article does have blacklisted links it in, chances are it probably has other problems (e.g. the entire article could be spam), so it would be preferable to have a human view the article and take action. I think if you want to continue with this task, the best thing to do would be for the bot to create a list of pages that contain blacklisted links, and post that for users to manually review. --Chris 15:18, 27 June 2013 (UTC)[reply]
I'm not certain if the software will block the bots edits, if the spam link is already there. I was thinking more along the lines that the tags place it in a category, that humans can then review. If it can't tag it next to the link, maybe it can tag the page instead and place it in the same category. What do you think?—cyberpower ChatOnline 15:30, 27 June 2013 (UTC)[reply]
As I understand it, if the spam link is already on the page, the software will block the edit anyway. --Chris 16:00, 27 June 2013 (UTC)[reply]
hmmm. I'm looking at the extension that is responsible. If the software blocks any edit that has the link in there already, that would likely cause a lot of problems on wiki. But, I'll have more info later tonight.—cyberpower ChatOffline 16:57, 27 June 2013 (UTC)[reply]
((BAGAssistanceNeeded)) I have tested the spam filter extensively on the peachy wiki. Tagging blacklisted links will not trip the filter, nor will removing it or adding the link if it already exists on the page. Modifying the link, or adding it to a page where the link is not yet present will trip the filter.—cyberpower ChatOffline
Ok, I stand corrected. I'd like to review the source code for this bot. --Chris 12:40, 3 July 2013 (UTC)[reply]
Also can you give a bit more detail on exactly how the bot will operate? Will it only be tagging references, or will it remove external links as mentioned above? How will the bot deal with any false positives? Will it skip links listed on MediaWiki:Spam-whitelist? Will it be possible to whitelist other links (e.g. url shorteners as mentioned by Anomie), that shouldn't be tagged as unreliable? --Chris 12:47, 3 July 2013 (UTC)[reply]
The bot code is not yet fully completed as of this writing. I seem to be hitting resource barriers. Because it process an enormous amount of external links, I am working on conserving memory usage. Also, the regex scan is quite a resource hog as well, which I am trying to improve efficiency on. Yes, it will obey the whitelist. Because there is a risk of breaking things when removing the link, and tagging references can lead to false positives, I thought about placing a tag on the top of the page, listing the links that it found. False positives can be reported to me, or an admin, who will modify a .js page in my userspace with an exception to be added or removed, that the bot will read before it edits the pages.—cyberpower ChatOnline 14:00, 3 July 2013 (UTC)[reply]
Although this page states that blacklisted external links will be tagged with ((unreliable source)), Wikipedia:Village_pump_(miscellaneous)#New_Bot states that they will be tagged with ((spam-links)). Could you please clarify? Thanks! GoingBatty (talk) 23:38, 24 July 2013 (UTC)[reply]
Some changes were made since the filing of this BRFA. I have now amended the above.—cyberpower ChatOffline 05:43, 25 July 2013 (UTC)[reply]

Review:

    if( empty($blacklistregexarray) ) goto theeasystuff;
    else $blacklistregex = buildSafeRegexes($blacklistregexarray);

You could have written:

    if( !empty($blacklistregexarray) ) {
           $blacklistregex = buildSafeRegexes($blacklistregexarray);
           <LINES 89 - 112>
    }

 Done all labels removed.—cyberpower ChatOnline 12:24, 25 July 2013 (UTC)[reply]

 Already done The break command was a remnant from the debugging period. It's removed now.—cyberpower ChatOnline 12:24, 25 July 2013 (UTC)[reply]

substr($exception[0],strlen("page="))

 Done Missed this one.—cyberpower ChatOnline 12:44, 25 July 2013 (UTC)[reply]

 Already done You reminded me that I programed that safeguard into the Peachy framework already. :p—cyberpower ChatOnline 12:24, 25 July 2013 (UTC)[reply]

        if( preg_match($regex, $link) ) {
            foreach( $whitelistregex as $wregex ) {
                if( preg_match($wregex, $link) ) return false; 
                else return true;
            }
        }

v.s.

        if( preg_match($regex, $link) ) {
            foreach( $whitelistregex as $wregex ) {
                if( preg_match($wregex, $link) ) 
                     return false; 
            }
            return true;
        }

 Fixedcyberpower ChatOnline 12:24, 25 July 2013 (UTC)[reply]

 Already done Framework has throttle.—cyberpower ChatOnline 12:24, 25 July 2013 (UTC)[reply]

 Donecyberpower ChatOnline 12:24, 25 July 2013 (UTC) --Chris 09:29, 25 July 2013 (UTC)[reply]

Trial[edit]

Ok, we'll start with a small trial to make sure everything runs smoothly, and then we can move onto a much wider trial. Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. --Chris 10:57, 2 August 2013 (UTC)[reply]

It started out ok, but then something went horribly wrong and it started tagging pages with empty tags. I have terminated the bot at the moment and will be looking into what caused the problems.—cyberpower ChatOnline 12:46, 10 August 2013 (UTC)[reply]
Bug found. Bot restarted.—cyberpower ChatOnline 19:58, 10 August 2013 (UTC)[reply]
Trial complete. I haven't looked at the edits yet as it's currently the middle of the night right now.—cyberpower ChatOffline 00:32, 12 August 2013 (UTC)[reply]

Even after the restart, 2 pages had blank tags added (1, 2). Also, maybe non-article pages should be skipped (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), unless there is some reason that these should have the links removed:Jay8g [VTE] 01:18, 13 August 2013 (UTC)[reply]

Thank you. I am already looking into the bug. And am already working on excluding namespaces.
The bugs have been fixed. The exceptions list now supports entire namespaces.—cyberpower ChatOnline 12:14, 15 August 2013 (UTC)[reply]

Approved for extended trial (1000 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Although I would ask that you do them in batches (maybe 100 or 200 edits at a time) --Chris 10:16, 24 August 2013 (UTC)[reply]

those links are not blacklisted, I just managed to save a page with one of the links ... --Dirk Beetstra T C 21:20, 28 August 2013 (UTC)[reply]
If they exist already, they won't be blocked. Also, I have found that the filter only partially enforces the regex list. Sometimes it blocks links with petition in it and other times it doesn't. The regex generator is the same as MediaWiki's extension. The validation process of these links is identical. If it's really a bug, then it's a bug with PHP.—cyberpower ChatOnline 22:03, 28 August 2013 (UTC)[reply]
See diff --Dirk Beetstra T C 21:23, 28 August 2013 (UTC)[reply]
It is also tagging links to googlebook with search strings with "forbidden words" like "petitions" in it.[2] That's not a spam link and it doesn't not trigger the spam filter either, because it ain't forbidden in that context. I'll also second that the box is too large and overwhelming. Slp1 (talk) 01:53, 29 August 2013 (UTC)[reply]

Cyberpower678 can you stop the trial until we sort out the above. --Chris 02:00, 29 August 2013 (UTC)[reply]

Per my talk page discussion, it has been brought to my attention that these issues are indeed a bot issue, not because it's bugged, but because it's running outdated code. I seem to have downloaded an outdated version of the extension. I'll make the modifications in the next few days. My bot has been shut down for quite some time now. I still recommend that petition regex be removed. There's no need for it. As for the spam links template, that can be fixed later as it's not crucial to the bot's operation.—cyberpower ChatOnline 02:05, 29 August 2013 (UTC)[reply]
I am glad that you have found the problem, but I can't say I agree that the template is not crucial to the bot's operation or that it can be fixed later. This is an encyclopedia and the template (rather than the code) is what our readers and editors see and use. The prior template was inappropriately large and overwhelming; it talks about "spam links" which is not the case and uses the term "external links" in a way that is not consistent with WP's definition at external links. At least on the William Wilberforce page, the promised list of "problematic links" was not shown, which meant we had to dig to try and figure out what the (non)problem was. Given the fact that the bot is in a trial stage and is making mistakes, it seems to me that it would be better to provide information about where to report errors in tagging, rather than the current formulation which suggest that the bot can do no wrong, that the article and its editors (or the blacklist) are at fault, and that they need to figure out the problem and act on it or the bot will be back. That's not the case, and feedback needs to be given, received and then acted upon with a positive spirit. Slp1 (talk) 11:54, 29 August 2013 (UTC)[reply]
I agree with you completely. The problem was that I was convinced that there was no issue with the code, that the regex is generated by the blacklist extension itself, that the bot simply validates the regex against the blacklist. I was right in every aspect. Where I was wrong was that I was running an out of date version of the extension. The newer version has a more refined regex generator. The template layout should be decided by the community. I merely created something for the bot. I'm not going to force this template on the community if they don't want it.—cyberpower ChatOffline 12:47, 29 August 2013 (UTC)[reply]
To expand on what I said, I meant not crucially in fixing bugs in the bot that may cause mistags. I do agree that the template will need to be fixed and adjusted to reflect consensus, but now I am only concerned in making sure the bot operates correctly.—cyberpower ChatOnline 14:21, 29 August 2013 (UTC)[reply]
Yeah, but the bot operating correctly comprises not just the technical aspects of a bot etc but also the surface and interface aspects of the template, as well as the giving and receiving of feedback between the community and the bot operator. I understand that you did not think that there was a mistake with the bot, but the fact is that there was a problem. It isn't the end of the world and it isn't a question of fault, because everybody makes mistakes, and mistakes are a good opportunity for learning and growth. But what I am a little concerned with is that you might be a little bit too interested in getting the code right and not enough interested in the interactional component of bots with the community. A bunch of people told you it was making mistakes and you just said "no, it is doing what it is supposed to do", when it wasn't. You even said that to one person after you'd found there was a problem.[3]. A number of articles are still tagged inappropriately. How about cleaning them up? And it doesn't seem like you are taking seriously the comments about the template because you want to do it "later" and are even apparently planning another run without working on this. What happens if there is another bug that you didn't realize was there? I realize you don't think there is, but that's what you thought the last time, wasn't it?. Personally I don't think you should add any more tags until the erroneous ones are removed and the concerns about the template addressed. A bot is a whole package, not just the coding Slp1 (talk) 17:43, 29 August 2013 (UTC)[reply]
No offense, but I think you are twisting things out of proportion. There are people who will claim a link is not blacklisted when in fact, it is. The bot is supposed to keep tagging pages until it no longer sees them as blacklisted. No, I do not run the bot if I see a bug surfacing. That's why it's been off. The next run is supposed to test the removal of tags containing links not blacklisted. I have indeed fixed the regex scanner. I am aware that the tag is important, but it's not my say on how it's supposed to look, but the communities, so I won't take initiative. Feel free to modify the tag so it still works, but looks "better". If there is a visible bug, the bot gets shut down immediately. The petition problem was not because the code was buggy, but the regex generator running on the wrong version. I take comments very seriously, and I find that remark mildly offensive, to say I don't. You have essentially just called me incompetent.—cyberpower ChatOffline 18:33, 29 August 2013 (UTC)[reply]
Cyberpower, you say that you take comments very seriously. You created and are the major contributor for the template which was apparently specifically designed with this bot in mind. You took the initiative in creating it, decided how it was going to look then, and now it would be great if you would follow up by considering the community's good faith comments above (and on the template talk) and modifying it. That would show that you do indeed take comments seriously, as opposed to passing on the responsibility "to the community". Removing the incorrect tags from articles asap -manually if need be- would also show that you take your responsibilities as a bot operator seriously. I am sure you that you are not incompetent but it seems that you may not understand the degree of frustration and discouragement such bots can cause to good faith (and perhaps technically clueless) content editors when articles are incorrectly tagged (sometimes repeatedly). This is talkpage posting from another editor makes this point well. Anyway, with this I leave you and the rest of the bot experts to your work. Slp1 (talk) 19:06, 29 August 2013 (UTC)[reply]

I am going on Wikibreak for awhile. As I will not be around to monitor the trial, please stop the trial, and wait for another BAGer to approve and monitor it. --Chris 11:25, 2 September 2013 (UTC)[reply]

More blank tags were added (1, 2), both in namespaces where no tags should have been added:Jay8g [VTE] 03:04, 3 September 2013 (UTC)[reply]

I saw it, and already fixed the issue. Thank you.—cyberpower ChatOnline 03:17, 3 September 2013 (UTC)[reply]
I will commence the last ~50 edits sometime tomorrow. This run will test to see if the good links are no longer tagged. This will verify that the exceptions list, whitelist, and the new regex scanner are working correctly.—cyberpower ChatOnline 16:47, 3 September 2013 (UTC)[reply]
Cyberpower. Did you not see that Chris G has told you to stop the trial for now? See his post of 11:25, 2 September 2013 (UTC). Slp1 (talk) 17:27, 3 September 2013 (UTC)[reply]
Oops. I forgot to mention that addshore has volunteered to take over this trial and has allowed me to resume it.
It is probably best to ask addshore to come here and do so officially, don't you think?Slp1 (talk) 18:10, 3 September 2013 (UTC)[reply]
I do. I'll ask him to post here before proceeding.—cyberpower ChatLimited Access 18:17, 3 September 2013 (UTC)[reply]
I see no harm in completing the last 50 edits! :) ·addshore· talk to me! 11:17, 10 September 2013 (UTC)[reply]
Trial complete. I accidentally went a bit over. I will post my results and fixes shortly.—cyberpower ChatOnline 14:55, 11 September 2013 (UTC)[reply]

If possible, it might be a good idea to un-exclude "Wikipedia talk:Articles for creation/" prefix pages, as the intention is that they would become articles:Jay8g [VTE] 02:11, 12 September 2013 (UTC)[reply]

puuuh. That's going to be tricky. I'd have to I exclude the Wikipedia talk namespace and add each Wikipedia talk page manually.—cyberpower ChatLimited Access 11:17, 12 September 2013 (UTC)[reply]
On second though, there shouldn't be any blacklisted links on those AfC pages, since the filter should filtering them out.—cyberpower ChatOffline 13:52, 12 September 2013 (UTC)[reply]

Post-Trial[edit]

I am currently generating the post trial report for a BAGger and everyone else to review.—cyberpower ChatOffline 13:53, 12 September 2013 (UTC)[reply]

The following below are bugs and issues brought up during the extended trial and that status of the issue:

  1. ((Spam-links)) should be renamed to a more neutral name:  Done Renamed to ((Blacklisted-links)). Bot change will go into effect upon approval or new trial.
  2. Tag template used incorrect terminology:  Fixed
  3. ((Blacklisted-links)) is too large in size and disruptive to the readers:  Resolved template is collapsed by default with a one line notice.
  4. ((Blacklisted-links)) is not showing the list of blacklisted URLs, in some cases, despite it being present in the Wikimarkup:  Fixed
  5. Bot has been tagging links that shouldn't be tagged.  Fixed Regex scanner was not up to date.
  6. Bot is tagging wrong namespaces.  Resolved Any administrator can alter the settings on the exceptions page.

If I have missed any error reports that should have been addressed, please let me know.—cyberpower ChatOnline 14:50, 12 September 2013 (UTC)[reply]

Approved for extended trial (100 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Quick trial to see any more issues. Article namespace. —  HELLKNOWZ  ▎TALK 19:55, 17 September 2013 (UTC)[reply]

Second Post-Trial[edit]

The contributions from the trial can be found here.

Number of edits performed: 141

Bugs found: 0

Issues found: 1

  1. When the bot went through to de-tag invalid tags, because the tag was renamed, several pages did not get returned by the API. This left invalid tags in place.
    Solution: I have fixed this issue by writing a quick script to remove all tags under the old name. This should not be an issue for future runs as they will all be transcluded under the new name.

Per this and this, I can confirm the cleanup script did it's job and that only valid tags are present, in the article namespace only. The bot successfully detected and removed several petition false-positives since the regex scanner has been updated. I also recommend that the ((Spam-links)) tag be deleted at this point.—cyberpower ChatOnline 01:57, 22 September 2013 (UTC)[reply]

Thinking about this and looking at the edits. What happens when the offending link is removed from the page? Would the bot come back and remove the tag if it wasn't removed like here? Or if the link is no longer blacklisted? Otherwise, edits look good. —  HELLKNOWZ  ▎TALK 14:06, 22 September 2013 (UTC)[reply]

The bot will remove the tag if the link in question is either, white listed, removed from the blacklist, added to the exceptions list, or removed from the page.—cyberpower ChatOnline 17:33, 22 September 2013 (UTC)[reply]
Right, I should've looked more careful. —  HELLKNOWZ  ▎TALK 17:38, 22 September 2013 (UTC)[reply]

 Approved. Trials look good, no problems that I see, all issues seem to be reasonably resolved. BRFA open for quite a while, lots of edits, no more comment received. Would be good if you made a page explaining what to do after a page is tagged and link it from the edit summary. —  HELLKNOWZ  ▎TALK 17:38, 22 September 2013 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.