Please exclude, from Wikipedia:Database reports/Stubs included directly in stub categories, any article transcluding Project:Contributor copyright investigations/Darius Dhlomo/Notice. These pages aren't then problem of WikiProject Stub Sorting; and when their problem is solved, these pages will either have been deleted or reverted to a version which probably doesn't have direct use of stub categories. עוד מישהו Od Mishehu 08:10, 21 October 2010 (UTC)
User:WhatamIdoing has expressed a desire for data regarding popular WikiProjects, and those data interest me also. User talk:Mr.Z-man/Archive 12#Popular WikiProjects? has a record of the request. I would like to see a monthly report (of page views or edits or watchers; preferably one report for each of those three variables). I have added this talk page to my watchlist, and I will watch for a reply (or replies) here.
—Wavelength (talk) 19:25, 27 October 2010 (UTC)
Earlier this year the community introduced a new form of deletion for new unreferenced BLPs, there's a concern that we aren't tagging as many as we could. Please could we have the following regular report:
Many thanks ϢereSpielChequers 17:41, 28 October 2010 (UTC)
In light of this discussion, would it be possible to generate a report listing red-linked categories (i.e., deleted or never created) which have significant incoming links (i.e., from any namespace except User: and Wikipedia:)? Thank you, -- Black Falcon (talk) 17:31, 14 November 2010 (UTC)
So you want red-linked categories (with or without members) with incoming links from namespaces other than Wikipedia (4) or User (2)? What should the report title be? How often should it update? The more detail you can provide, the less thinking I have to do (and thus it's much easier for me to knock this out in a few minutes). :-) --MZMcBride (talk) 04:58, 23 February 2012 (UTC)
(unindent) Perhaps having a "Members" column would be good? That way you could easily sort the report by ones that are empty (0 members) and ones that aren't. Thoughts? --MZMcBride (talk) 22:00, 17 March 2012 (UTC)
P.S. Just for my reference, this report takes 45 minutes to run currently on willow.
Hi, Is it possible to get a list of (Wikipedia-) articles which have the largest 'Discussion' content? On these 'Talk:' pages you also see the amount of archives; for example the article about Barack Obama has 71 archives, and will end high in the list probably. Thanks in advance. Kyra —Preceding unsigned comment added by 82.171.75.225 (talk) 10:55, 17 November 2010 (UTC)
I've taken this to MfD, as it's quickly been misinterpreted. There's nothing problematic about having long talk histories so long as the archives are orderly, so it's not clear what purpose this particular report fulfils. Chris Cunningham (user:thumperward: not at work) - talk 13:01, 21 November 2010 (UTC)
Originally I had posted here on MZMcBride's talkpage but he has brought my request to this discussion board instead. Could someone construct two database reports for redirects in the File namespace with one or less incoming links, one for those that match Titleblacklist and another for those that don't? :| TelCoNaSpVe :| 05:23, 24 November 2010 (UTC)
On this version of the report, there is an entry for Head of the River (Victoria), with the entry noting 8 files in use. Yesterday, I removed [1] more than 400 uses of non-free files from this page. The report is showing the number of files, not the number of uses. This particular article has gone largely unnoticed for three years because "8" uses is not a high number. Yet this article is certainly one of the highest, if not the highest, abusers of non-free content we've ever had on this project.
I'd like to see this report modified, or a new report created, to show how many non-free file uses there are, rather than the number of non-free files used. --Hammersoft (talk) 14:18, 20 October 2010 (UTC)
The Talk pages by size MfD raised a lot of good issues and ideas. A main problem is that the database reports have very little information about them on the page, which has lead to confusion (as brought out in the MfD). Most of the database reports pages are not categorized. In response to the MfD, I modified the Wikipedia:Database reports/Talk pages by size page text using posts from the MfD. I also created Category:Wikipedia database reports as a place to categorize all the pages listed at Wikipedia:Database reports and other database reports pages whereever they may be. Initially, I though of using ((infopage)) on the top of the database pages, but that template didn't fit because the database report pages do not describe communal consensus on some aspect of Wikipedia norms and practices. I added a generic one to Talk pages by size instead. In view of the talk page tagging with the admonishment "Perhaps this will motivate greater efficiency in the use of kilobytes," you may want to create a top template to distinguish a utility report from a statistical report. I added text to the top of Wikipedia:Database reports to address this. You may further want to expand the text of Wikipedia:Database reports to provide some Database reports consensus control over the use of utility reports and statistical reports. -- Uzma Gamal (talk) 16:54, 25 November 2010 (UTC)
Wikipedia:Database reports/Long pages should be renamed Wikipedia:Database reports/Talk pages by length consistent with Wikipedia:Database reports/Talk pages by size. "Long" is a subjective conclusion whereas Talk pages by length offers no opinion to characterize the page. -- Uzma Gamal (talk) 16:54, 25 November 2010 (UTC)
Would it be possible to generate a report of "/doc" subpages in the Template namespace for which the main template page does not exist or is a redirect? Thanks, -- Black Falcon (talk) 05:13, 29 November 2010 (UTC)
/* Orphaned doc pages */
SELECT CONCAT("[[Template:", doc.page_title, "]]"),
IF(rd_title IS NOT NULL, IF(trg.page_id, "Occupied", ""), "Delete") AS "Move",
(SELECT COUNT(*) FROM templatelinks WHERE tl_namespace=doc.page_namespace AND tl_title=doc.page_title) AS "Transclusions",
rd_title
FROM page AS doc
LEFT JOIN page AS tpl ON tpl.page_namespace=doc.page_namespace
AND tpl.page_title=TRIM(TRAILING "/doc" FROM doc.page_title)
LEFT JOIN redirect ON rd_from=tpl.page_id
LEFT JOIN page AS trg ON trg.page_namespace=rd_namespace
AND trg.page_title=CONCAT(rd_title, "/doc")
WHERE doc.page_namespace=10
AND doc.page_title LIKE "%/doc"
AND doc.page_is_redirect=0
AND (tpl.page_is_redirect=1 OR tpl.page_id IS NULL)
And we have 209 results with many which need to be merged/redirected to the new /doc page. — Dispenser 18:31, 5 December 2010 (UTC)
I was wondering if Wikipedia:Database reports/Unused templates could possibly be changed so that templates which transclude ((substituted)) within them or within their /doc are either not included, or are separated into a separate section entitled "Substituted Templates" or something similar? This would make it much easier to see which templates intentionally have no transclusions (because they are supposed to be substituted) and those which are just unused. Thanks! Mhiji (talk) 18:23, 12 December 2010 (UTC)
((transclusionless))
; I'll write some code to exclude pages tagged with that template. --MZMcBride (talk) 18:52, 12 December 2010 (UTC)
((transclusionless))
redirects to ((substituted))
anyway? And ((substituted))
is used much more than it's redirect. Mhiji (talk) 23:38, 12 December 2010 (UTC)
((transclusionless))
or ((substituted))
. Either will work. --MZMcBride (talk) 02:15, 13 December 2010 (UTC)I start an article recently and then found there were already 20+ other versions on other language Wikipedias. This got me wondering what other articles exist in multiple other languages but don't appear in English. Has anyone tried to make a list of topics which have not been translated into English, sorted by the number of other language Wikipedias it appears in? Or is anyone willing to try making such a list? Thanks in advance. —Pengo 01:41, 25 December 2010 (UTC) (Crossposted from Wikipedia:Village_pump_(idea_lab)#Every_language_but_English.3F). And if you need more specifics I'd like I'd be happy to flesh out the idea.
Tim1357's broken query
|
---|
SELECT ff.page_title, count(ff.page_title)
FROM (SELECT z.page_title
FROM dewiki_p.page AS z
WHERE z.page_namespace = 0
UNION ALL
SELECT y.page_title
FROM frwiki_p.page AS y
WHERE y.page_namespace = 0
UNION ALL
SELECT x.page_title
FROM plwiki_p.page AS x
WHERE x.page_namespace = 0
UNION ALL
SELECT w.page_title
FROM itwiki_p.page AS w
WHERE w.page_namespace = 0
UNION ALL
SELECT v.page_title
FROM jawiki_p.page AS v
WHERE v.page_namespace = 0
UNION ALL
SELECT u.page_title
FROM eswiki_p.page AS u
WHERE u.page_namespace = 0
UNION ALL
SELECT t.page_title
FROM ptwiki_p.page AS t
WHERE t.page_namespace = 0
UNION ALL
SELECT s.page_title
FROM nlwiki_p.page AS s
WHERE s.page_namespace = 0
UNION ALL
SELECT r.page_title
FROM ruwiki_p.page AS r
WHERE r.page_namespace = 0
UNION ALL
SELECT q.page_title
FROM svwiki_p.page AS q
WHERE q.page_namespace = 0
UNION ALL
SELECT p.page_title
FROM zhwiki_p.page AS p
WHERE p.page_namespace = 0
UNION ALL
SELECT o.page_title
FROM cawiki_p.page AS o
WHERE o.page_namespace = 0
UNION ALL
SELECT n.page_title
FROM nowiki_p.page AS n
WHERE n.page_namespace = 0
UNION ALL
SELECT m.page_title
FROM fiwiki_p.page AS m
WHERE m.page_namespace = 0
UNION ALL
SELECT l.page_title
FROM ukwiki_p.page AS l
WHERE l.page_namespace = 0) AS ff
WHERE NOT EXISTS (SELECT 1
FROM enwiki_p.page AS aa
WHERE aa.page_namespace = 0
AND aa.page_title = ff.page_title)
GROUP BY ff.page_title
LIMIT 1;
|
I have created the report based on data from dumps. See Wikipedia:Dump reports/Missing articles. User<Svick>.Talk(); 14:51, 11 June 2011 (UTC)
Would it be possible to have a 'report' based version of http://en.wikipedia.org/wiki/User:MGA73/No_license which updated daily?
Sfan00 IMG (talk) 13:52, 5 January 2011 (UTC)
Is it possible to have a report that searches for media tagged ((mtc)) (and related) but which don't have a recognisable license, or one that's incompatible with Commons (such as fair-use)?
Report to run on a dialy or weekly basis depending on volume of media concerned?
Sfan00 IMG (talk) 13:56, 5 January 2011 (UTC)
Per this conversation at the Village pump, run a query very similar to tools:DBQ-87 to find editors eligible for the Autopatrol privilege.
Is it possible to run a query like this? Thanks very much. - Hydroxonium (talk) 00:37, 14 January 2011 (UTC)
Query
|
---|
SELECT rev_user_text
FROM revision
JOIN page
ON page_id = rev_page
AND page_namespace = 0
AND page_is_redirect = 0
LEFT JOIN user_groups
ON ug_group IN ( 'sysop', 'autoreviewer' )
AND rev_user = ug_user
WHERE rev_user!=0
AND Isnull(ug_user)
AND rev_timestamp > Date_format(DATE_SUB(NOW(), INTERVAL 6 MONTH),
'%Y%m%d%H%i%s')
AND rev_parent_id = 0
AND rev_deleted = 0
GROUP BY rev_user
HAVING COUNT(*) > 50
LIMIT 1;
|
Thanks very much, Tim. I'm not familiar with SQL, so I will let others chime in. Thanks again. - Hydroxonium (talk) 06:13, 18 January 2011 (UTC)
ug_group
list so it looks like this?Query
|
---|
SELECT rev_user_text
FROM revision
JOIN page
ON page_id = rev_page
AND page_namespace = 0
AND page_is_redirect = 0
LEFT JOIN user_groups
ON ug_group IN ( 'sysop', 'autoreviewer', 'bot' )
AND rev_user = ug_user
WHERE rev_user!=0
AND Isnull(ug_user)
AND rev_timestamp > Date_format(DATE_SUB(NOW(), INTERVAL 6 MONTH),
'%Y%m%d%H%i%s')
AND rev_parent_id = 0
AND rev_deleted = 0
GROUP BY rev_user
HAVING COUNT(*) > 50
LIMIT 1;
|
I have been running in to bots on the list from the jira:DBQ-87 query. I don't think bots are allowed to create articles except is special cases, but I've been seeing them. Thanks again. - Hydroxonium (H3O+) 23:50, 18 January 2011 (UTC)
Another version...
|
---|
SELECT
p2.page_creator,
usrtmp.rev_timestamp AS first_edit,
r2.rev_timestamp AS last_edit,
COUNT(*)
FROM u_mzmcbride_enwiki_page_creators_p.page AS p1
JOIN enwiki_p.page AS p2
ON p1.page_id = p2.page_id
JOIN (SELECT
user_id,
user_name,
user_editcount,
rev_timestamp,
GROUP_CONCAT(ug_group) AS groups
FROM user
LEFT JOIN user_groups
ON ug_user = user_id
JOIN revision
ON rev_user = user_id
AND user_id NOT IN (SELECT
ug_user
FROM user_groups
WHERE ug_group IN ('sysop', 'autoreviewer'))
AND rev_timestamp = (SELECT
MIN(rev_timestamp)
FROM revision
WHERE rev_user = user_id)
AND rev_timestamp < DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 6 MONTH),'%Y%m%d%H%i%s')
GROUP BY user_id) AS usrtmp
ON usrtmp.user_name = p1.page_creator
JOIN revision AS r2
ON p2.page_id = r2.rev_page
WHERE p2.page_namespace = 0
AND p2.page_is_redirect = 0
AND r2.rev_timestamp = (SELECT
MAX(rev_timestamp)
FROM revision
WHERE rev_user = usrtmp.user_id)
AND r2.rev_timestamp > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 MONTH),'%Y%m%d%H%i%s')
GROUP BY p1.page_creator
HAVING COUNT(*) > 49
ORDER BY COUNT(*) DESC;
|
I maintain a database that contains page_id, page_creator pairs. So rather than querying each page, you can just look up the stored info. It's at u_mzmcbride_enwiki_page_creators_p. This query uses that database, but it got interrupted today. This really is too obnoxious to put into one query. It needs to be split out into multiple parts using some sort of sane logic. I may do this tomorrow if I get a chance. --MZMcBride (talk) 07:22, 19 January 2011 (UTC)
CREATE INDEX user_creations ON u_mzmcbride_enwiki_page_creators_p.page (page_creator);
or something would do the trick. Tim1357 talk 04:01, 20 January 2011 (UTC)
CREATE DATABASE u_mzmcbride_enwiki_page_creators_p;
CREATE TABLE page (
page_id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
page_creator varchar(255) binary NOT NULL default ''
);
CREATE INDEX page_creator ON page (page_creator);
user_touched
a field in the database, and could that be used to check for recent activity? Thanks again for the help. - Hydroxonium (H3O+) 08:58, 19 January 2011 (UTC) AND MAX(rev_timestamp)>DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 90 DAY),'%Y%m%d%H%i%s')
to the HAVING statement. Tim1357 talk 04:17, 20 January 2011 (UTC)
┌──────────────────────────────┘
Hi guys. Thanks very much for working on this report. Could this be run monthly? It would help the guys assigning permissions if the workload came in smaller batches more frequently rather than one big batch. Thanks again. - Hydroxonium (H3O+) 18:11, 20 January 2011 (UTC)
Thanks very much for working on this report, everybody. Svick has created this report at Wikipedia:Database reports/Editors eligible for Autopatrol privilege. Thanks Svick. - Hydroxonium (H3O+) 03:41, 24 January 2011 (UTC)
A few comments about this new report:
~~~~
(it looks like this was already fixed)Other than that, looks good. It'd be nice to get a multi-maintainer project for these reports going at some point. I'll look into that at some point. --MZMcBride (talk) 05:33, 24 January 2011 (UTC)
|-
and you need to add ns_id = 6
to the SQL, so that it actually uses the File namespace. Svick (talk) 19:26, 26 January 2011 (UTC)
left join
s). This way, each duplicate pair will show up only once. Svick (talk) 13:52, 28 January 2011 (UTC)There are many users that have been indefinitely blocked for spamming, for using Wikipedia for promotional purposes, or for having a promotional username. Quite often these users have subpages in their userspace that contain promotional material that hasn't been deleted. This material can show up in Google searches - which is probably why spammers put it there. Would it be technically possible to have a database report that lists these pages? If so, would others find this useful enough to be worth the effort? Peacock (talk) 17:46, 23 June 2010 (UTC)
The Non-Functioning Query
|
---|
SELECT Concat('User:', page_title)
FROM page
JOIN logging
ON log_title = Substring_index(page_title, '/', 1)
AND log_namespace = 2
WHERE page_namespace = 2
AND page_title LIKE '%/%'
AND log_action = 'block'
AND log_comment LIKE '%spam%'
LIMIT 1;
|
I assumed PCock wanted current blocks, not all block actions, so I checked against the ipblocks table. It's an insanely slow query, though. Might be better to do it in sets in a programming language. I imagine you'd want to check the ipb_reason field for a few strings like "usernamehardblocked", "spam", etc. --MZMcBride (talk) 02:10, 26 June 2010 (UTC)
Working, but very slow, query
|
---|
SELECT
page_namespace,
page_title,
ipb_reason
FROM page
JOIN ipblocks
ON ipb_address = TRIM(SUBSTRING_INDEX(REPLACE(page_title, '_', ' '), '/', 1))
AND page_namespace IN (2,3)
WHERE ipb_expiry = 'infinity'
AND page_title LIKE '%/%'
LIMIT 1;
|
Hmm. I think this is one query that is not going to be done gracefully. I ran a new query that is set to save here when it finishes. (Note each page title lacks the 'User:' prefix). I'll check back tomorrow to see if the query had any success, otherwise I am out of ideas. Tim1357 talk 02:36, 26 June 2010 (UTC)
New Query
|
---|
CREATE TEMPORARY TABLE u_tim1357.blocked_users
(
USER VARBINARY(255)
);
INSERT INTO u_tim1357.blocked_users
SELECT DISTINCT log_title
FROM logging
WHERE log_action = 'block'
AND log_comment LIKE '%spam%'
ORDER BY log_timestamp DESC;
SELECT page_title
FROM page
JOIN u_tim1357.blocked_users
ON page_title = Substring_index(USER, '/', 1)
WHERE page_title LIKE '%/%'
AND page_namespace = 2;
|
logging_ts_alternative
table instead of logging
when you were running those queries on the Toolserver. — Dispenser 14:56, 6 July 2010 (UTC)
To find AFD discussions not properly transcluded. Maybe limit to recently created to eliminate noise. –xenotalk 15:38, 1 August 2010 (UTC)
At some point...
--MZMcBride (talk) 22:15, 29 September 2010 (UTC)
--MZMcBride (talk) 19:41, 4 March 2011 (UTC)
A biweekly (or weekly if it seems useful) report I'd like to see is a list of most-linked redirects. For example, the bot would count the number of times John Blain (football player) is linked, showing the top 500 or so redirects. While some would be useful and should be left alone, others could be found that were the result of page moves that should have fixed a long time ago. Wizardman Operation Big Bear 21:23, 9 October 2010 (UTC)
Take a look at Most-linked redirects (configuration). The initial limit was 30 incoming links. I've now raised it to 200 incoming links and it's running currently. We'll see what it outputs, how useful it is, and what (if any) adjustments are needed once it finishes.
The idea of creating another report for "Templates containing links to redirects" seems to be a good one. --MZMcBride (talk) 23:57, 11 October 2010 (UTC)
The page has two edits ([5], [6]) in the history that probably has to be implemented with the bot. At same also the 140K threshold could be lowered to 130K. --Kslotte (talk) 11:14, 5 December 2010 (UTC)
I'd like to see a report that has information about:
Can a query be done for this? Thanks, :| TelCoNaSpVe :| 00:16, 26 January 2011 (UTC)
I have been tackling some of the entries in this report but it is a bit difficult to see the wood for the trees sometimes as there are a lot of such categories which only contain pages in user space (e.g. Category:Capitalist Wikipedians and the like). Would it be possible to split the report (there are already 3 subpages but these seem to be just for size reasons) into say
Thanks. Boissière (talk) 22:36, 27 January 2011 (UTC)
Would be nice to be able to keep this at 500-1,000 items. Currently it has fewer then 100 entries that could still be stubs after removing over 500 !stubs. Vegaswikian (talk) 07:30, 3 February 2011 (UTC)
For the Cats listed at Wikipedia:Database reports/Deleted red-linked categories, would it be possible to either change the members column (showing how many pages are in the deleted cat) to use PAGESINCAT? Or, if the number is logged for the database report as how many were member pages at the time of the scan, perhaps a new column with a "current" heading? This would probably increase the page count, as I believe you can only call 500 parser functions at time (as opposed to the current 800), but I think it would help to process and clean up the entries there. Avicennasis @ 11:08, 30 Shevat 5771 / 4 February 2011 (UTC)
Two things, first of all in Wikipedia:Database reports/Polluted categories/Configuration, AND tl_title = 'Pollutedcat' should be updated to AND tl_title = 'Polluted_category' since the template was renamed.
Thanks. -- WOSlinker (talk) 11:16, 6 February 2011 (UTC)
And secondly, a new repot that checks all the categories that transclude ((Template category)) to see if any of them contain article pages wouldbe nice.
I've done some SQL below but not certain if it will work or need some tweaking.
SELECT page_title
FROM page AS pg1
JOIN templatelinks AS tl
ON pg1.page_id = tl.tl_from
WHERE pg1.page_namespace = 14
AND tl.tl_namespace = 10
AND tl.tl_title = 'Template_category'
AND EXISTS (SELECT 1
FROM page AS pg2
JOIN categorylinks AS cl
ON pg2.page_id = cl.cl_from
WHERE pg2.page_namespace = 0
AND pg1.page_title = cl.cl_to)
LIMIT 250;
Thanks. -- WOSlinker (talk) 11:16, 6 February 2011 (UTC)
I've selectively archived this talk page again. If you want a new report or a tweak to an existing report, post (or re-post) here and someone will take care of it, eventually. --MZMcBride (talk) 17:30, 6 February 2011 (UTC)
The new report Template categories containing articles (configuration) is good for some wikignoming but I was wondering if a report on the converse situation, i.e. where an article category contains a template, would also be possible (and useful)? Boissière (talk) 20:59, 17 February 2011 (UTC)
Here's a report I've made: Page moves grouped by user. Updated hourly, it groups all moves by user in the last 24 hours, week, or month. It's been very useful in catching pagemove sockmasters such as User:Tobias Conradi. --JaGatalk 18:58, 21 February 2011 (UTC)
I'm not sure how useful this would be, but a report like "talk pages that are redirects when parent article is not" may be helpful, particularly in finding poor moves or copy-paste articles. Avicennasis @ 02:49, 20 Adar I 5771 / 24 February 2011 (UTC)
MZ, is there a way to make the "Comments" field do some sort of word-wrapping? If not, I forgive you. Killiondude (talk) 21:51, 25 February 2011 (UTC)
How often is the list of Editors eligible for Autopatrol privilege updated ?
Would it be possible to filter it to exclude those who haven't created an article in the last month? ϢereSpielChequers 18:11, 2 March 2011 (UTC)
I should mention that we are still working on the original list from January. I am looking at automating more of the task and think breaking the it in to seperate parts may help. I started a thread at Wikipedia:Village pump (proposals)#Automating submissions for Autopatrol right. Everybody is encouraged to comment. Thanks. - Hydroxonium (talk) 13:27, 12 March 2011 (UTC)
A new version of the report, containing only those that created an article in the last 30 days is finally up. User<Svick>.Talk(); 11:09, 24 April 2011 (UTC)
When someone moves an article to a new name, they may not notice that sometimes there was a /to do subpage off the talk page, which then ends up being orphaned since it is not moved with the page and talk page. So just wondering if a query which lists pages that are redirects where there is also a /to do subpage that exists would be a useful report? -- WOSlinker (talk) 20:54, 2 April 2011 (UTC)
and not exists ( select * from pagelinks where pl_from=sub.page_id and pl_title<>'Database_reports/Talk_subpages_with_redirect_parent' )
select page_id
from page
where page_title = 'Database_reports/Talk_subpages_with_redirect_parent')
Could someone run a query for intersecting Commons:Category:Category redirects with the interwiki or iwlinks table. The table should have from (wikipedia title page), to(commons category), fix (commons category redirect target). The table could be read by a bot to resolve Wikipedia:BOTR#Check_commons_links_and_fix_cases_where_pages.2Fcategories_have_been_moved...not sure how often it should run.Smallman12q (talk) 11:51, 3 April 2011 (UTC)
Could someone write a report for articles with red links, and templates with red links sorted by number of red links?Smallman12q (talk) 12:18, 7 April 2011 (UTC)
For example. Is this feasible? --Closedmouth (talk) 05:41, 29 April 2011 (UTC)
^\s*#REDIRECT:?.*?\s*\[\[([^|[\]]*?)(?:\|[^[\]]+)?\]\]
is the regex I used for parsing redirects, I suggest you use it. — Dispenser 22:39, 5 May 2011 (UTC)As this project approaches 100 reports, there are some structural/design/maintability issues that should be addressed at some point. I put some notes at Wikipedia:Database reports/Redesign. If anyone has any comments/suggestions/etc., throw them on that page or this talk page or wherever. --MZMcBride (talk) 21:42, 30 April 2011 (UTC)
Any chance of an update for Wikipedia:Database reports/Articles containing links to the user space, couple of weeks since the last one--Jac16888 Talk 23:40, 21 May 2011 (UTC)
distinct
part in memory. (I'm assuming the pagelinks
query is the slow one, not the templatelinks
one.) User<Svick>.Talk(); 20:12, 30 May 2011 (UTC)/* Gadget statistics - run time 1.5 hrs SLOW_OK */
SELECT up_property, COUNT(*)
FROM user_properties_anonym
WHERE up_property LIKE "gadget-%"
AND up_value=1
GROUP BY up_property
With the same record keeping purpose as Page count by namespace, I propose the creation of a installed gadgets report (see previous: WT:Gadget#Usage-Stats). The table interesting properties the table includes are: gender, imagesize, diffonly, blank edit summary warning, recent change/watchlist flags bot, minor. We may also want to ask to derived columns such as user_active and a binned edit count. — Dispenser 00:02, 26 May 2011 (UTC)
Could we get a list of pages where a ((citation needed)) is tagged itself with ((citation needed)) as in Facebook_features#Networks.2C_Groups.2C_and_Like_Pages. Smallman12q (talk) 23:17, 1 June 2011 (UTC)
((citation needed|reason=foo))((cn|date=2010-04))
((cn|date=2009-02|reason=((PAGENAME)) sucks)) ((citation needed))
Just so I don't forget, what's really needed here is a generic function that can take wikitext input and output pairs of "grabbed templates" and their respective positions within the wikitext. It needs to be able to catch odd cases like ((foo|bar=((baz)))), but once it reasonably can, it should be trivial to take each position, subtract it from the subsequent one, and then output only templates that are really close to each other (matching on template name too, of course). A generic template_grabber function would resolve this report and be very helpful for other projects (including the next step, which is taking the grabbed templates and then parsing their parameters). This isn't very complicated, but it will require a few hours of dedicated development time to do properly. --MZMcBride (talk) 01:33, 2 June 2011 (UTC)
(((?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))))
to match top-level templates, that can have other templates nested inside them. Although I never understood how it works, and it's probably .Net-specific. User<Svick>.Talk(); 22:07, 2 June 2011 (UTC)
Template Parser
|
---|
spaces=' \t\n\r'
chars=['a', 'b', 'c', 'd', 'e', 'f', 'g','_','h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
class Template():
def __init__(self,text=None):
'''The parameter "text" is a string that starts with "((". The parser will automatically find the end of the template'''
self.start='(('
self.end='\n))'
self.wrapper=lambda g,p=None:'\n | %s = %s'%(p,g) if p!=None else '\n | %s'%g
if text != None:
self.text=text
try:
self.text=self.text.strip()
except:
raise TypeError('The Class: "Template" requires text be passed to it.')
self.r=None
self.cursor=0
cursor=0
m = len(self.text)-1
while 1:
if '|'== self.text[cursor] :
self.standard()
if self.r!=None:
self.r=iter(self.r)
te=[]
for p,g in self.r:
if p !=None:p=p.strip()
te.append((p,g.strip()))
self.r = te
del te
break
elif self.text[cursor:cursor+2]=='))':break
cursor+=1
if cursor>=m:
raise TemplateSytnaxError('Could Not find Template Start')
self.name=self.text[:cursor+1].strip('{}|')
def standard(self):
max=len(self.text)
cursor=0
r=[]
try:
while self.text[cursor]!='|':cursor+=1
except IndexError:
self.r=None
return
cursor+=1
last=cursor
while cursor<max:
n, cursor =self.checkname(cursor=cursor)
last=cursor
while True:
if self.text[cursor:cursor+2]=='((':
cursor = self.untill(cursor=cursor+1)
elif self.text[cursor:cursor+2]=='[[':
cursor = self.untill_img(cursor=cursor+1)
elif self.text[cursor:cursor+2]=='))':
r.append((n,self.text[last:cursor]))
self.r=r
self.text=self.text[:cursor+2]
return
elif self.text[cursor]=='|':
r.append((n,self.text[last:cursor]))
cursor =cursor+1
last=cursor
break
cursor+=1
if cursor>=max: break
self.cursor=cursor
raise 'No ending brackets!'
def checkname(self,text=None,cursor=0):
if text ==None: text=self.text
n=cursor
l=cursor
while text[cursor] in spaces:cursor+=1
while text[cursor] in chars: cursor+=1
n=text[n:cursor]
while text[cursor] in spaces:cursor+=1
if text[cursor]=='=':
cursor+=1
return (n,cursor)
return (None,l)
def untill_table(self,text=None,cursor=0):
if text ==None: text=self.text
s=cursor
st=1
while True:
if text[cursor:cursor+2] == '|}':
st-=1
elif text[cursor:cursor+2] == '{|':
st+=1
elif text[cursor:cursor+2] == '((':
cursor=self.untill(cursor=cursor+1)
elif text[cursor:cursor+2]=='[[':
cursor=self.untill_img(cursor=cursor+1)
cursor+=1
if st==0:
return cursor
if cursor >=len(text):
return cursor
def untill_img(self,text=None,cursor=0):
if text ==None: text=self.text
s=cursor
st=1
while True:
if text[cursor:cursor+2] == ']]':
st-=1
elif text[cursor:cursor+2] == '[[':
st+=1
elif text[cursor:cursor+2] == '((':
cursor=self.untill(cursor=cursor+1)
elif text[cursor:cursor+2] == '{|':
cursor=self.untill_table(cursor=cursor+1)
cursor+=1
if st==0:
return cursor
if cursor >=len(text):
return cursor
def untill(self,text=None,cursor=0):
if text ==None: text=self.text
st=1
while True:
if text[cursor:cursor+2] == '))':
st-=1
elif text[cursor:cursor+2] == '((':
st+=1
elif text[cursor:cursor+2]=='[[':
cursor=self.untill_img(cursor=cursor+1)
elif text[cursor:cursor+2] == '{|':
cursor=self.untill_table(cursor=cursor+1)
cursor+=1
if st==0: return cursor
if cursor >=len(text):return cursor
def old(self):
return self.text[:self.cursor+2]
def __dict__(self):
return dict(self.r)
def __getitem__(self,key):
a=dict([(p.lower() if p is not None else None,g) for p,g in self.r])
assert a.has_key(key.lower()), KeyError('\'%s\''%key)
return a[key.lower()]
def __delitem__(self,key):
'''Usage: del TemplateObject['key']'''
for k,v in self.r:
if k is None: continue
if k.lower()==key.lower():
self.r.remove((k,v))
return
raise KeyError('\'%s\''%key)
def __iter__(self):
return iter(self.r)
def __str__(self):
if self.r == None:
return self.text
i=self.start+self.name
for p,g in self.r:
i+=self.wrapper(g,p)
i+= self.wrapper(self.image,'image')
i+=self.end
return i
|
Can someone please determine why this is no longer updating each month? Wikipedia:Database reports/Stubs included directly in stub categories. I'm guessing maybe an error in cron syntax, but I don't see the problem. Dawynn (talk) 13:29, 16 June 2011 (UTC)
Your "cron" job on willow
PYTHONPATH=$HOME/scripts python $HOME/scripts/database-reports/directstubs.py > /dev/null
produced the following output:
Traceback (most recent call last):
File "/home/mzmcbride/scripts/database-reports/directstubs.py", line 72, in <module>
''')
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/cursors.py", line 173, in execute
self.errorhandler(self, exc, value)
File "/opt/ts/python/2.7/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1317, 'Query execution was interrupted')
Suggestion: redirects to highly visible templates which aren't protected, perhaps sortable by number of transclusions. Is that possible? Par exemple: Template:Infobox Organization --Closedmouth (talk) 16:31, 17 June 2011 (UTC)
Hi Guys, I made a SQLite3 database that indexes all the infoboxes used in the article space. I was finding that I was doing quite a bit of work with the API just to find a parameter in an infobox, so I decided to create a database to save time and bandwith. I was wondering if you think the database (about 60/90 MB with/without indexes, uncompressed) would be helpful here. If you think it might come in handy, I'll upload it to the MySQL database. Cheers. Tim1357 talk 18:44, 25 June 2011 (UTC)
sqlite> .schem CREATE TABLE InfoboxParameters (Infobox INT, Arg Varbinary, Value Varbinary, Article Varbinary); CREATE TABLE Infoboxes (Name Varbinary,ID int unsigned, PRIMARY KEY(ID)); CREATE INDEX Infobox_Arg_Index on InfoboxParameters (Infobox,Arg);
Could we add a report on the largest WikiProjects by the number of articles tagged with a project's banner? The data for each project is available on the toolserver when "All" is clicked but there is no option to sort the projects by number of articles. Once each month or two would probably be frequent enough for this task. This report would be useful for size comparisons and descriptions in the Signpost's WikiProject Report. -Mabeenot (talk) 23:56, 26 June 2011 (UTC)
SELECT p_project, p_count,p_qcount,((p_qcount+.0)/p_count)*100 as percent_qualityAssessed ,p_icount,((p_icount+.0)/p_count)*100 as percent_importanceAssessed
FROM u_titoxd_wp10_p.projects
ORDER BY p_count DESC;
The output (after I formatted it a bit) is here. The report is very lightweight and should be no problem to run monthly. Tim1357 talk 04:33, 13 July 2011 (UTC)
u_enwp10
. I don't have read-access to that database right now, but I asked CBM (the maintainer of the project) for access. Tim1357 talk 14:46, 14 July 2011 (UTC)
Please consider addding ((Non-free image data)) and ((tl|Non-free image rationale)) Images to the exclusions list for this report, if not already done so . Sfan00 IMG (talk) 10:48, 27 June 2011 (UTC)
The Wikipedia:Database reports/Redirects containing red links report is flooded with talk pages that contain templates based on the ((WPBannerMeta))
meta-template, which automatically create links to "/Comment" subpages whether the subpage exists or not. In other words, Talk:TOPIC ends up in the report because it "contains" a red link to Talk:TOPIC/Comments, although the redlink actually only exists in the template code and wouldn't be visible to readers even if the talk page were not a redirect. Suggested tweak: exclude any link to a page in namespace 1 whose title ends in "/Comments" from the report. --R'n'B (call me Russ) 14:06, 29 June 2011 (UTC)
Just what it sounds like. Maybe monthly? Some new users redirect move their user pages (and associated talkpages) to the mainspace when they are done drafting an article in the userspace. However, that means that automated tools like Twinkle which leave a note at the user's talk page are actually redirect to the article talk page. These user pages need to be un-redirected and having a list of them should make that easy. –Drilnoth (T/C) 21:48, 20 July 2011 (UTC)
Is there any reason why some usergroups are listed twice? For example:
No. | User | Last edit | User groups |
---|---|---|---|
863 | LawBot | 20101017062645 | bot,bot |
1491 | Tb | 20110115193417 | rollbacker,reviewer,rollbacker,reviewer,rollbacker,reviewer,rollbacker,reviewer |
Is this a bug, or an undocumented feature? :-) Avicennasis @ 06:31, 20 Tamuz 5771 / 22 July 2011 (UTC)
Maybe it is not the best place, but is it possible to generate list of 1000 pages with most views in last year? Bulwersator (talk) 09:23, 22 July 2011 (UTC)
Now that Pending Changes has ended, shouldn't we remove Wikipedia:Database reports/Potential reviewer candidates? — Train2104 (talk • contribs • count) 15:41, 23 July 2011 (UTC)
I would like a list of filenames -- both on en.wikipedia and on Commons - that have more than one image attached to them due to accidental overwriting, so as to figure out which ones need splitting into separate images.
This will probably generate a huge amount of false positives; it can be reduced by eliminating images which are uploaded by authorized image-modification bots like RotateBot, and by eliminating later images uploaded by the original uploader. It will still, however, require human attention to tag them for splitting. DS (talk) 01:08, 30 July 2011 (UTC)
I'm not sure if this is possible, but what would be helpful would be a list of semi-protected pages by how long they have been semi-protected. It might be useful to see if those which have been semi-protected the longest could afford to be unprotected. –MuZemike 16:31, 2 August 2011 (UTC)
Would it be possible to add ((Has-NFUR)) to the list of templates on an image which the report skips over?
Also ((Standard rationale)) and ((short rationale)) were removed a while back and should thus not be being used to skip images.
Sfan00 IMG (talk) 10:28, 22 August 2011 (UTC)
Such as rationales written directly as text.
Sfan00 IMG (talk) 10:50, 24 August 2011 (UTC)
Further to Wikipedia talk:Bot Approvals Group/Archive 7#Bots & long term inactivity, could we have a report updated once monthly or quarterly that lists accounts with the 'bot' flag that have no edits or logged actions in over 1 year, sorted by activity with longest inactive at the top? (Am trying to think if there is an easy way to identify read-only bots that should not have any edits or logged actions, will let you know if I come up with anything.)
If possible can you identify the owner and list their last edit or logged action in the table? –xenotalk 13:29, 2 September 2011 (UTC)
I notice that Wikipedia:Database reports/Articles containing overlapping coordinates has not been updated in over nine days. Does anyone know why? —Stepheng3 (talk) 18:44, 17 September 2011 (UTC)
The status of this backlog has changed radically in the past few weeks, thanks to my colleague Redrose64. I'd like this report to become more frequent now. Perhaps twice a week? —Stepheng3 (talk) 00:41, 9 November 2011 (UTC)
I've been going through the following list over on Toolserver : http://toolserver.org/~magnus/image_pages_without_image.php?language=en&project=wikipedia and very slowly weeding out local media description pages that meets F2 criteria..
Would it be possible to have a report that compiles a list of media files for which F2 is applicable namely?
Having such a report would help speed up what I am attempting to do from the toolserver list by not including media description pages which do need to be local..
Sfan00 IMG (talk) 21:54, 18 September 2011 (UTC)
To be run on a weekly basis , find media files with Very short filenames.
See: User:Tom_Morris/todo#File_name_length_query
Sfan00 IMG (talk) 14:04, 28 September 2011 (UTC)
I noticed that some monthly reports show the last updates in April and July. Looks like something is slipping through the cracks. Also long stubs shows as having been updated a few days ago, but all on get on the wiki is the October 21 version. Vegaswikian (talk) 19:20, 31 October 2011 (UTC)
Can a list of articles which are more than 4 months old for which editors that have edits only to that article make up more than 1/3 of the total editors of the article be made? Smallman12q (talk) 01:53, 12 December 2011 (UTC)
Could a list of files which have 2 or more "Licensing" sections be made?Smallman12q (talk) 20:56, 15 December 2011 (UTC)
Wikipedia:Database_reports/Files_with_conflicting_categorization
I'm finding that a number of these are
Would it be possible to drop images with the relevant templates present from the report, as they have already been identified and put in process?
Sfan00 IMG (talk) 12:27, 22 December 2011 (UTC)
Would it be possible to add a file-date sort option to : Wikipedia:Database reports/Non-free files missing a rationale so that older uploads can be handled in a more sensitive manner? Sfan00 IMG (talk) 00:24, 23 December 2011 (UTC)
I have recently learned about WP:DBR. I saw Wikipedia:List of Wikipedians by article count, which is not currently included at DBR, and wondered if it would be worth producing similar reports for template space, image space, category space and project space. Also, since the report ingores redirects (and there are more redirects than non-redirects in mainspace), maybe a redirect list could be produced too. I think these reports could be produced weekly and listed at DBR. I have tried asking about this at WP:VPT and WP:HELPDESK.--TonyTheTiger (T/C/BIO/WP:CHICAGO/WP:FOUR) 15:09, 23 December 2011 (UTC)