The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion. A summary of the debate may be found at the bottom of the discussion.

This is a request for comment (RFC) regarding turning on the Memento extension.

This is a preliminary RFC to assess community interest among English Wikipedia users for this functionality. No significant commitment of Wikimedia Foundation engineering resources has been made yet. An early pilot would likely run on the English Wikipedia; hence the initial poll is taking place here.

What is Memento?[edit]

When searching information on the Web, you cannot navigate in the past. A link typically takes you to the current version of a resource. Memento – a project funded by the Library of Congress and run by Los Alamos National Laboratory in collaboration with Old Dominion University – aims to "make the history of the internet accessible" and bridge the gap between a current resource and its prior versions. At the moment, the closest analogy is the Internet Archive's "Wayback Machine", which allows you to view versions of websites as they were at certain points in time. While it is useful for broad coverage of the history of the web, the Wayback Machine has certain disadvantages:

  1. it's limited to those sites the Internet Archive is able to access;
  2. it's limited to as many versions as the IA's servers can cache; specifically for Wikipedia, the Internet Archive will only have a very spotty coverage of article versions, compared to the full article revision history accessible in Wikipedia;

Memento solves these problems by developing a standard way for individual websites to expose their own revision histories and for clients to negotiate these histories. MediaWiki, of course, already provides access to old article revisions via page histories and via its API but it does so without using a standardized protocol that other sites also use. Supporting Memento will allow readers to negotiate Wikipedia's contents (and any other Memento-compliant website) to return the revision of a given article matching a specified time or time range (for example: what did the 2011–2012 Egyptian revolution article look like 24 hours after the protest started? or how the Michael Jackson article changed before and after his death?). It will also allow bots, web services and applications to perform time series analysis by extracting information from the article (text mining, data extraction, etc) and to retrieve and integrate time-dependent information from different data sources. As such, it will contribute to building a key piece of infrastructure for the W3C linked data initiative.

How does it work?[edit]

Memento adds support for datetime negotiation (a variation on content negotiation), and new Relation Types for the HTTP "Link" header aimed at interlinking resources with their archival/version resources. The Memento team have developed a MediaWiki extension that would allow Wikipedia and other Wikimedia projects to support this protocol. A working browser plugin for Firefox is also available.

How does it impact the editing community?[edit]

Memento makes no visible difference for editors. We're certainly not moving away from page histories or revisions. The difference is going to be that browsers that support Memento will be able to search our content using a format standardised with other parts of the Internet.

Why support it?[edit]

Wikipedia and the Wikimedia movement projects are leaders in the field of open data, offering unmatched transparency and open licensed content. By making it easier for Wikipedia to be syndicated by humans and machines alike we help spearhead the linked data vision of an interoperable ecosystem of open licensed, structured information. Memento also helps tackle the problem of linkrot, where sites archive their content or delete it from public view creating dead links and obsolete references. A standardized method to pull up what a page used to look like at a given point in time makes referencing easier and more reliable. By supporting this protocol we pave the road for other projects and organizations to do the same. Some major players already have, including:

FAQ[edit]

This feature has been removed from the extension now and hence no deleted revisions will be accessed.

Support turning it on[edit]

Note that all of that is possible within the existing Wikipedia system without adding to the concerns noted below. --Nouniquenames (talk) 14:26, 3 September 2012 (UTC)[reply]
Not at all; there's currently no interaction between our versioning system and the wider semantic web. Okeyes (WMF) (talk) 15:36, 3 September 2012 (UTC)[reply]
Indeed. It is not possible from outside of the system. My point was that this (viewing historic versions...) is currently possible from the Wikipedia site. --Nouniquenames (talk) 23:51, 3 September 2012 (UTC)[reply]

Oppose turning it on[edit]

I won't say I oppose it, but is it really such a great idea? Do we really want to make past versions of our pages (with all the vandalism, libel and unwanted personal details they may contain) any easier for the world to access than we do already? Victor Yus (talk) 19:32, 29 August 2012 (UTC)[reply]

Is there a mechanism that would prevent viewing of pages that have been rev-deleted  ? "....We are all Kosh...."  <-Babylon-5-> 19:38, 29 August 2012 (UTC)[reply]
From what I understood, implementing this proposal will only make the already public information available in a format conforming to the Memento specification. Keφr (talk) 19:41, 29 August 2012 (UTC)[reply]
Correct. It's a standards based way to get to pages in history, it doesn't make anything accessible that would not be otherwise. It is a very different question whether or not past pages should be removed completely. azaroth42 (spec editor)
If someone decides to make a copy of a page which is later revdeleted, Wikipedia can't do anything about it. This wouldn't change with the new extension; so yes, there is a chance revdeleted content could be accessible if other websites choose to make a local backup of Wikipedia. This would mean, however, that each of those other websites would be responsible for any copyright violations and claims of libel, which are the primary reasons for revdeletion. —JmaJeremy 22:29, 29 August 2012 (UTC)[reply]
This extension would definitely have to honor your user privileges on Wikipedia: if you can't access a deleted revision, you won't be able to negotiate it via this extension either. JmaJeremy is totally right about third-party reuse of Wikipedia data, check out this paper if you are interested in the survival of revdeleted content. --DarTar (talk) 22:48, 29 August 2012 (UTC)[reply]
I get the argument that anyone could be mirroring later-deleted content now. The concern that I and others share is whether this would lead to the creation of much easier ways for the public to view deleted content. For example, currently we have an informal gentleman's agreement with Google that they will very promptly remove deleted content from their caches. I'd be willing to support turning it on for now, but if it results in much easier access to deleted content, I think we'd want it off again. Gigs (talk) 15:35, 30 August 2012 (UTC)[reply]
(Re Shawn, Nouniquenames) The extension only makes pages that are already available accessible, via a standards based mechanism as well as the existing history pages mechanism. One of the main strengths of Wikipedia is its openness and transparency about the editing process and history. Being able to see old revisions of a page is an important aspect of the credibility of the site. The Memento protocol and extension do not take any standpoint on what should be accessible, and if a revision should not be accessible, then there are existing mechanisms to deal with that. The extension would simply make it easier for editors to find the older pages, in order to make the determination as to whether to restore previous text or not. Azaroth42 (talk) 22:22, 30 August 2012 (UTC)[reply]
Whoa, wait. So deleted revisions will be shown, it's just the content that will remain hidden? - jc37 00:13, 31 August 2012 (UTC)[reply]
My (potentially flawed) understanding is that visibility will be the same as for any current user without special rights. Any revdel might show up that a revision was deleted, but it would be impossible to see what that revision was or what it contained. If I'm wrong, someone please correct me. --Nouniquenames (talk) 05:41, 31 August 2012 (UTC)[reply]
I'm not against the ability to view past revisions. I'm against making it easier for any random passer-by to grab an old, potentially inaccurate version of a page without doing so via specific, deliberate, intentional, locally controlled steps. (I hope I stressed that enough.) The intelligence requirement for reading content from this site is not particularly high, and that is a good thing. That said, it takes an extra few clicks to see an old version of a page for a reason. If we wanted everyone to see the old version, we wouldn't have changed it. If someone wants to see the old version, it is possible, but the (minimal) extra time and effort help to weed out those who might inadvertently stumble across an old article (possibly vandalized or incomplete) and think it the current. To enable this, vandalism response would almost be required to include a revdel in essentially every case lest a vandalized page be what people see. Further, if we are simply enabling a standardized API (as I understand it), we lose that control over how easily one might accidentally see an old version of a page as the current (causing or reinforcing the revdel requirement). Absent giving everyone the ability to delete pages (at least from Memento), which would likely give quick rise to new, inventive forms of vandalism we generally don't have to deal with now, I cannot see this as a good thing. --Nouniquenames (talk) 05:25, 31 August 2012 (UTC)[reply]
It is exactly because [e]arly versions of articles may be unreliable, biased, wrong, spammy, flawed in any number of ways that the Memento extension is so important. When taking a historical view on any matter, it is crucial to be aware of the fact that Wikipedia at the time may have had an article about it that was very different from what it is at present. See more of my reasons for supporting this request above. There I also explain why no random passer-by will accidentally grab an old, potentially inaccurate version of a page. --Thüringer ☼ (talk) 08:02, 3 September 2012 (UTC)[reply]
It takes time and resources (of servers and developers) which could be put toward other issues, for one. --Nouniquenames (talk) 14:26, 1 September 2012 (UTC)[reply]
No, the extension has already been developed. It's done. We're talking about how to turn it on. Yes, it'll take some server cycles - but this isn't going to be enabled unless Ops confirm that it scales. Ironholds (talk) 14:41, 1 September 2012 (UTC)[reply]
What leads you to believe - no need to answer, because the question is rhetorical - that it wins any hearts and minds for the Support side to rebut every single Oppose voter, any more than it's the case anywhere else on Wikipedia? I stated my position. I am not minded to change it just because you think this extension is Wicked Cool. If you want to debate it, take it down to the section clearly marked "Discussion." Ravenswing 17:47, 1 September 2012 (UTC)[reply]
I'm rebutting one oppose vote :). And I've not explained that I think this extension is Wicked Cool; I've explained that your one reason for opposing it is somewhat weak. Ironholds (talk) 23:57, 1 September 2012 (UTC)[reply]
And I agree that it should totally be discussed. Can I suggest you look at the discussion section, particularly the bit about server resources? Of particular interest is the line "The performance hit for a TimeGate request is significantly less than generating a history page, as it doesn't need to build the list, just find the version closest in time. As such this would be an advantage, performance wise, if people were to use it". Ironholds (talk) 23:58, 1 September 2012 (UTC)[reply]
It will never be done. To say otherwise is to misunderstand software development. By the same logic, we could have stopped at the first functional version of MediaWiki. There will always be bugs, bugfixes, new features, and testing against bloody everything that is added or tweaked later. Further, not only does it not help us, it duplicates an existing functionality. Also, it apparently has not even begun. Please read the intro: This is a preliminary RFC to assess community interest among English Wikipedia users for this functionality. No significant commitment of Wikimedia Foundation engineering resources has been made yet. --Nouniquenames (talk) 05:30, 2 September 2012 (UTC)[reply]
No, it hasn't been evaluated by Ops yet. The extension has been fully developed by the MementoWeb developers, and evaluated to make sure it's compatible. There seems to be a misunderstanding about how MediaWiki extension development works; the WMF doesn't write all of them (or even most of them); our volunteer developer community is responsible for quite a few. Ironholds (talk) 10:49, 2 September 2012 (UTC)[reply]

Discussion[edit]

How well would it work for Category:Virginia cities for 22 August 2006? This category contains a template which had been modified significantly (see here what ((cfd)) looked like at the time), and then later moved. עוד מישהו Od Mishehu 20:05, 29 August 2012 (UTC)[reply]

Addressed here: http://www.mediawiki.org/wiki/Extension:Memento#Templates TL;DR: It can work if a small patch is also included into the core parser. azaroth42 (spec editor)
With that patch it would be amazing. It's always been frustrating that we can't easily go back and see what heavily templated pages (such as Main Page) looked like in the past. If it's really that simple, and wouldn't cause any problems performance-wise, I'm all for it! the wub "?!" 23:05, 29 August 2012 (UTC)[reply]
How well would it work for the actual category listing for Category:Virginia cities for 22 August 2006? Would it somehow show the pages that were in the category then, or the pages that are in there now? Anomie 01:55, 30 August 2012 (UTC)[reply]
It would retrieve the old version of the page exactly as in the history for the category. The links would not be rewritten to point directly into other history pages, but clients such as the MementoFox browser add-on, take care of this for you. Thus, if you set your datetime preference to be August 22, 2006 and clicked on a link in a page, it would take you to the version of the new page closest in time to August 22, 2006. If you install the MementoFox browser add-on, you will see how it works via a (slow and computationally expensive) proxy based solution. Azaroth42 (talk) 22:34, 30 August 2012 (UTC)[reply]
If you were replying to my question, I think you misunderstood it. If not, feel free to delete this comment. Anomie 23:55, 30 August 2012 (UTC)[reply]

A couple concerns[edit]

I support the idea but I do have some concerns about it in addition to the ones above.

  1. Does it include all namespaces (especially non article like File, User, Special, Mediawiki, etc.). If so how will it reflect images or articles that have been deleted due to copyright?
    1. It does not include namespaces for which there is no history, such as Special. So it works for User (eg User:Azaroth42) and User_talk but not Special (eg not Special:Preferences).
  2. If an article or its content is deleted due to Copyright via how is that relayed through Memento?
    1. As above, it is not retrievable via Memento if it is not retrievable via the History tab.
  3. Is it for English Wikipedia only? What about the other languages, commons or sister projects like Wiktionary and Wikinews? Each will have its own issues with this.
    1. The extension is a generic MediaWiki extension. Sister projects would have to enable it themselves, based on their own discussions, one imagines. I defer to Wikipedia folk as to different languages, but assume that it would.
  4. Will this cause any performance problems with the servers?
    1. The performance hit for a TimeGate request is significantly less than generating a history page, as it doesn't need to build the list, just find the version closest in time. As such this would be an advantage, performance wise, if people were to use it. The performance of the TimeMap request is almost identical to that of generating a page of History; they both need to know 500 revision ids, however the TimeMap does not include diffs or anything else, just the links. Furthermore, TimeMaps are able to be cached, which would reduce the load on the database.

That's all I can think of at the moment. Kumioko (talk) 00:39, 30 August 2012 (UTC)[reply]

Thanks for the comments Kumioko :) Azaroth42 (talk) 22:45, 30 August 2012 (UTC)[reply]

Thanks for the quick reply Azaroth. I still support the idea but I have some trouble with the User namespace being a part of it. Sometimes people put person info on the User page, some with the understanding that it applies primarily to WP because most mirror sites don't pull that data in so I imagine some users are going to have some heartburn about that. Kumioko (talk) 00:14, 1 September 2012 (UTC)[reply]

Past revisions = pre-deleted revisions?[edit]

So will this include the revisions of a page for the month of its existence prior to speedy deletion as a BLP attack page? Being the encyclopedia that anyone can edit, means there are a lot of edits which occur which may be considered problematic to say the least.

And this doesn't even get into edit warring or patent nonsense or privacy.

And what about robots.txt? will all those pages' revisions be included? Will all talk pages? - jc37 00:41, 30 August 2012 (UTC)[reply]

See above. "It's a standards based way to get to pages in history, it doesn't make anything accessible that would not be otherwise." Not sure what you mean about robots.txt. the wub "?!" 09:42, 30 August 2012 (UTC)[reply]
I believe he means that tools such as the Internet Wayback Machine typically respect robots.txt (the English Wikipedia's robots.txt file is here). From our article, "Robots.txt is used as part of the Robots Exclusion Standard, a voluntary protocol the Internet Archive respects that disallows bots from indexing certain pages delineated by the creator as off-limits." --MZMcBride (talk) 20:05, 30 August 2012 (UTC)[reply]
Thanks MZM. And nod, though also whether such revisions will now suddenly be open to be mirrored through bypassing robots.txt. I don't know enough about momento to knowhow this will affect things.
I've read over the extension several times. And get the idea that deleted revisions while standardised through momento, will not be viewable except by those with the ability to view deleted. Same with oversight, etc. (And does that mean we will be even more vulnerable to a compromised admin account/admin tools gained on the sly just to robot-copy everything?)
I don't understand how all this will work, and maybe it's because I don't quite understand the extension.
Right now, At the internet archive (and other such places) I can look at a previous version of a page, which has since been deleted. Will this extension allow that from Wikipedia directly? And further, will this extension make it easier for other sites to save deleted contributions? In other words, even though a page is deleted, through the momento standardisation, will the bots now be able to copy any edits (deleted and otherwise) and install them at their own wiki, and now they can undelete at their site, etc. This has privacy ramifications etc.
(I'm hoping the response is: "Chuckle, and no, you don't understand what this will actually do, let me more clearly explain..." : ) - jc37 23:25, 30 August 2012 (UTC)[reply]
1) Access to crawlers is guided by robots.txt file. All the old revisions in wikipedia are in the /w/ path and the robots.txt file for en.wikipedia.org reads:
User-agent: *
Disallow: /w/
Hence, no bots have access to these old revisions and memento does not change anything about this.
2) Deleted revisions will not be accessible using this extension. Please refer to FAQ for more information. --Hariharshankar (talk) 14:36, 1 September 2012 (UTC)[reply]
1.) Thanks for the clarifications concerning robots. Though I'll note that we've long seen that there are bots which ignore the exclusions in robots.txt
2.) As I have already said, I've read the extension which notes that that is the intention. But a quote about a certain road being paved with good intentions, comes to mind. hence why I am asking these questions : )
Would it be possible to create a temporary wiki, port a few hundred edits of varying types to it, and show exactly how this would work? This was done prior to the implementation of the filter, and I think it would help deal with concerns about this. - jc37 23:03, 2 September 2012 (UTC)[reply]
That would be awesome. We'd have to stick it on test.wikimedia or prototype.wikimedia before deployment anyway - one of those, maybe? Okeyes (WMF) (talk) 01:04, 3 September 2012 (UTC)[reply]
Setting up a Memento-powered MediaWiki instance on Labs sounds like a no-brainer. --DarTar (talk) 17:45, 5 September 2012 (UTC)[reply]

WMF legal advice[edit]

Have you consulted WikiMedia Foundation for confirmation that their legal staff have no objection to the proposal?

You have made clear that hidden revisions would not be exposed by this new interface, so it does not amount to publishing any more information than is already available via the History tab. Also, users would presumably be well aware that they were not viewing the most recent versions of articles (and transcluded templates), and hence would appreciate that they might be more likely to see inaccurate or potentially libellous information than if they were reading the current website.

On the other hand, the interface is intended to make accessing non-revdeleted revisions easier, and inaccuracies are often simply reverted rather than revdeleted, so remain visible in the history page and under the proposed interface. On balance, I don't think that is objectionable, but it would be good to know whether WMF share this view.

Richardguk (talk) 10:33, 3 September 2012 (UTC)[reply]

Well, the RfC was started by a pair of staffers in their professional capacity - but you raise an excellent point. I don't see a problem myself, but I don't work for Legal; I'll check in with them today (if I can find them. It's Labour Day, apparently, which means the dang 'merkins get the day off). Okeyes (WMF) (talk) 10:43, 3 September 2012 (UTC)[reply]
FWIW, we already note that "This is an old revision of this page ... it may differ significantly from the current revision." when you look at a history version. (I would love it if this box was more emphatic!) Readers using a complex opt-in web history tool are probably more likely than casual browsers to be aware of this, but I think it would be quite reasonable to have a (click-to-dismiss?) banner across the top of all pages reminding them of the Wikipedia-specific risks of older content.
(That said, I suspect many people will be reading it this way to look for those inaccuracies and oddities - "what was being reported about X on this day, before we knew about Y"?) Andrew Gray (talk) 11:59, 4 September 2012 (UTC)[reply]
I've spoken to Michelle Paulson over at legal; she has no objection as long as it doesn't make visible anything that isn't already visible (which it shouldn't). Okeyes (WMF) (talk) 12:55, 5 September 2012 (UTC)[reply]

Behaviour of Memento[edit]

My vague memory of having Memento described to me in a web-archive context is that it would allow date preferences to carry through to linked pages (where supported) - you'd read the article on United States, as of 1/1/08, which would say that George W. Bush was president, and then click through to that article, where it would retain the date and give you a version as of 1/1/08, etc.

Does the MediaWiki installation work like this? If so, it'd make a more compelling case for its usefulness than the examples above, which are all one-page scenarios. Andrew Gray (talk) 16:09, 4 September 2012 (UTC)[reply]

This is exactly how it works. If you have your datetime preference set to 1/1/08 and you click from one article to the next, you'll end up at the version of the clicked on article from that same date. Azaroth42 (talk) 16:25, 4 September 2012 (UTC)[reply]
Thanks. I suspected this was the case, but I was having trouble persuading Firefox to place nicely with a test MediaWiki installation to confirm it! Andrew Gray (talk) 16:29, 4 September 2012 (UTC)[reply]

On the fence[edit]

I will either strongly oppose or strongly support this, but I haven't decided which yet :-) the implications are complex and I would urge people not to make reflex judgements on something like this.

Those are my principles; if you don't like them, I have others. bobrayner (talk) 11:07, 18 September 2012 (UTC)[reply]

Mind if I join you on the fence? : )
I agree with the above, and the last point is a big deal breaker for me. But as I yet dunno what's going on with deleted revisions, I'd like to see the test version first. - jc37 01:12, 26 September 2012 (UTC)[reply]
My understanding is the extension would not make any data public that is not already. However, it may make it easier to stumble across reverted revisions without trawling through long page histories looking for them. Dcoetzee 07:20, 27 September 2012 (UTC)[reply]
So, in response to your not caring about whether it makes life easier or harder for third parties: the point of Memento is that it makes it easier for readers to access Wikipedia content as it was in the past. As for old edits that should have been revdelled: they are still accessible now. This isn't a good criticism of Memento: it's like opposing a wheelchair ramp being added to the local school because it'll allow paedophiles in wheelchairs to break in. Making access to old edits is a good thing: they have substantial educational value for people wanting to understand the history and culture of Wikipedia and the history of the subjects we cover. See the heavy metal umlaut video and the history of the Iraq War through Wikipedia edits. —Tom Morris (talk) 18:19, 2 October 2012 (UTC)[reply]

Summary[edit]

There is consensus for a test run pilot of Memento on the English Wikipedia, provided that it does not make content not already available through the History tab available to Memento users. MBisanz talk 22:58, 4 October 2012 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. No further edits should be made to this discussion.