Student edits as "civic engagement"; how Wikipedia readers interact with images: And other new research findings
The Signpost

Wikimedia Research Newsletter Logo.png
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

Students' contributions on Wikipedia as civic engagement

Reviewed by Bri

In this paper, "Civic Engagement Meets Service Learning: Improving Wikipedia's Coverage of State Government Officials",[1] the author argues that students' contributions on Wikipedia serve as civic engagement in the educational approach known as service learning. The paper cites other academic work highlighting Wikipedia's value as a teaching platform because of ease of entry, its ability to "boost students' writing, information literacy, creativity, and critical-thinking skills" while they are motivated to create content that "matters to the world". Background research also showed that basic biographical information about political representatives is often hard to find, becoming "a costly and semiprecious commodity".

For the study, students edited the Wikipedia biography of "a state or local representative who lacked a substantial Wikipedia presence", i.e. creating a new article or improving an existing low-quality one. Then they conducted self-reflective essays and "Small-N surveys" concerning the subjective outcomes.

The outcomes were generally positive except for a number of deleted new articles due to Wikipedia notability standards. The survey results found that "students left the course better able to understand government, more attentive to government actions, more likely to discuss government, and more confident that their vote matters".

"A large scale study of reader interactions with images on Wikipedia"

Reviewed by Tilman Bayer

This paper[2] presents a wealth of results from the "first large-scale analysis of how interactions with images happen on Wikipedia".

The authors first note that (excluding images that appear as icons), only a minority of articles are illustrated:

"Out of the 6.2M articles, 2.7M (44%) contained at least one image, for a total of 5M unique images across all English Wikipedia articles. The vast majority of the articles (91%) contain two images or less, while only 1.5% has more than eight images [..]. Around 84% of images is unique to the article where it appears."

Using a machine learning based topic model, they find that "Geographic articles are the most illustrated, containing 1/4 of the images in our dataset. Biographies, making up 30% of the articles on Wikipedia, also contain around 15% of the images. Topics such as entertainment (movies, plays, books), visual arts, transportation, military, biology, and sports follow, covering together another third of the images in English Wikipedia."

Examining the length of image captions, the study finds a "large fraction of the images without a description and the majority of existing captions centered around ten words." Regarding the position of images in the article, "only 36% of the images in our dataset is generally placed in infoboxes, while only 16% can be found in galleries, and that the majority of inline images are generally placed at the top of the article".

The analysis of reader interactions with these images is based on internal web log data from March 2021 recording three types of such interactions: image views (opening an image in Media Viewer), pageviews (of articles with images) and page previews (on the desktop version of the Wikipedia website), grouping these into reading sessions based on the (somewhat imperfect) heuristic that readers are uniquely identifiable based on the combination of IP address and user agent. A main finding (highlighted in the abstract) is "that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content", or in more detail:

We find that the [global click-through rate] across all pages in English Wikipedia with at least one image is 3.5%, meaning that around 3.5 out of 100 times readers visit a page, they also click on an image. This metric is higher for desktop (5.0%) and lower for mobile web users (2.6%), probably due to differences in the way readers navigate Wikipedia on the two devices and the better Media Viewer experience on desktop. Over time, the behavior also changes depending on the device used. For example, on desktop, readers tend to click more often on images during weekdays (Monday to Friday), with an increase of 5.5% over weekends ...

Figure 9 from the paper: Modeling image clickthrough rates by article topics (left) and various variables describing the image (right), via a regression analysis
Figure 9 from the paper: Modeling image clickthrough rates by article topics (left) and various variables describing the image (right), via a regression analysis

Images in articles about "topics such as transportation, visual arts, geography, and military" were found to have higher engagement, whereas |clicks on images are less likely in education, sports, and entertainment articles." Furthermore,

we observe that the most important negative predictor is the text offset, i.e. the relative position of the image with respect to the length of the article, meaning that images are more clicked if placed in the upper part of an article. Regarding the visual content, we observe a strong positive effect of outdoor settings, consistently with the positive coefficients of transportation and geography, topics in which a large portion of images display outdoor scenes. Regarding the image position on the page, we find that images in galleries show a high level of engagement, as well as images in the infobox, even though with a moderate effect.

The researchers also investigated how reader engagement was associated with page popularity and image quality (using an automated rating of image quality, based on a machine learning model trained on a balanced dataset of community rated "quality images" on Commons):

From the paper: "Examples of high and low image-specific CTR images by page popularity (left) and image quality (right). We ranked images by iCTR, popularity and quality, and picked examples from the top-100 (“high”) and bottom-100 (“low”) for each dimension"
From the paper: "Examples of high and low image-specific CTR images by page popularity (left) and image quality (right). We ranked images by iCTR, popularity and quality, and picked examples from the top-100 (“high”) and bottom-100 (“low”) for each dimension"

The paper proceeds to study more involved questions, e.g. finding that "the tendency to click on images with faces varies depending on page popularity. On pages with less that 1000 monthly pageviews, the presence of faces induces higher level of interactions, with a difference of 0.1%, whereas, after 1000 pageviews, we observe the opposite behavior, with a difference of 0.06%." and concluding that "Faces engage us, but only if unfamiliar".

Another high-level conclusion is that "Images serve a cognitive purpose" on Wikipedia - based on "a negative relation between article length and iCTR. This suggests that [...] images might be used by readers to complement missing information in the article".

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

Eye-tracking Wikipedia readers

From the abstract and paper:[3]

"We present an Attention Feedback (AF) approach for Wikipedia readers. The fundamental idea of the proposed approach comprises the implicit capture of gaze-based feedback of Wikipedia readers using a commodity gaze tracker. The developed AF mechanism aims at overcoming the main limitation of the currently used “pageview” and “survey” based feedback approaches, i.e., data inaccuracy."
"For each reading session, along with the gaze density heat map, we also provide a set of sentences where a user-focused while reading along with the time for which each sentence was focused. [...] After processing the sentences, we arrange them in the order they are read along with their gaze quotient. By gaze quotient, we mean the time duration (in seconds) for which a sentence is being gazed at or read. [...] the proposed AF framework also captures some additional information listed below: (1) Wikilink clicks [...] (2) Eye blinks [...] (3) Scroll events [...]"
"Extensive experiments demonstrated [this setup's] efficiency compared to other feedback approaches used by the Wikipedia research community [...] Moreover, incorporating a single-camera image processing-based gaze tracker into a web application framework makes the overall system costefficient and portable. This study’s outcomes are currently being discussed in the Wikimedia Foundation for developing specialized tools to capture readers’ implicit feedback."

(See also meta:Research:Which parts of an article do readers read for an overview of related work)

"The Wikipedia Contribution to Social Resilience During Terrorist Attacks"

From the abstract:[4]

"We have conducted an ethnographic analysis of several [of the French] Wikipedia's terrorist attacks pages as well as interviews with regular Wikipedia's contributors. We document how Wikipedia is used during crisis by readers and contributors. Doing so, we identify a specific pace of contributions which provides reliable information to readers. [...] we highlight how historical sources (i.e. traditional media and authorities) support this pace. Our analyses demonstrate that citizens are engaging very quickly in processes of resilience and should be, therefore, considered as relevant partners by authorities when engaging a response to the crisis."

How Wikipedia is "reducing uncertainty in times of crisis"

From the abstract:[5]

"... we analyse contributions on Wikipedia and Twitter during major crises in France through online ethnographies and semi-structured interviews to investigate their roles in building and sharing information. Wikipedia has often been analysed as a collaborative tool but this approach has underestimated its use in reducing uncertainty in times of crisis. We demonstrate that despite their distinct pace and designs, Twitter and Wikipedia are used with seriousness by citizens in their dissemination of information."


  1. ^ Norell, Elizabeth, "Civic Engagement Meets Service Learning: Improving Wikipedia's Coverage of State Government Officials.", PS: Political Science & Politics, 55 (2): 445–449, doi:10.1017/S1049096521001451
  2. ^ Rama, Daniele; Piccardi, Tiziano; Redi, Miriam; Schifanella, Rossano (2022-01-03). "A large scale study of reader interactions with images on Wikipedia". EPJ Data Science. 11 (1): 1–29. doi:10.1140/epjds/s13688-021-00312-8. ISSN 2193-1127. (see also research project page on Meta-wiki)
  3. ^ Dubey, Neeru; Verma, Amit Arjun; Iyengar, S. R. S.; Setia, Simran (2021-09-15). "Implicit Visual Attention Feedback System for Wikipedia Users". 17th International Symposium on Open Collaboration. New York, NY, USA: Association for Computing Machinery. pp. 1–11. ISBN 9781450385008.
  4. ^ Bubendorff, Sandrine; Rizza, Caroline (May 2020). "The Wikipedia Contribution to Social Resilience During Terrorist Attacks". ISCRAM 2020 Conference Proceedings – 17th International Conference on Information Systems for Crisis Response and Management. Bringing Disaster Resilience into Focus. Blacksburg, Virginia, United States.
  5. ^ Bubendorff, Sandrine; Rizza, Caroline; Prieur, Christophe (2021). "Construction and dissemination of information veracity on French social media during crises: Comparison of Twitter and Wikipedia". Journal of Contingencies and Crisis Management. 29 (2): 204–216. doi:10.1111/1468-5973.12351. ISSN 1468-5973. closed access

+ Add a commentDiscuss this story
  • It is disappointing that the researchers examining "quality" images on Commons chose to use Commons:Quality images as their benchmark of community consensus over what makes a high quality image. A simple query to the Commons community would have informed that that this was a useless measure of the quality of images as they appear as thumbnails in Wikipedia articles. The reasons are:
    • The Quality Image badge is only available for images generated by a user on Wikimedia Commons. The majority of images on Commons, and no doubt the majority used on Wikipedia, are not user generated. They may come from government sources, be old historical images, or otherwise scraped from another site like Flickr. Many images are also merely reproductions of an artwork.
    • The Quality Image criteria is not at all concerned with how great an image is from an artistic point of view. It need not have any "wow" at all. An straightforward view of some suburban railway station, with an overcast sky and a messy collection of commuters will get QI if accurately focused and exposed.
    • The technical requirements of QI are concerned with pixel-peeping the full-resolution image, not analysing the little thumbnail on Wikipedia. Many images that are somewhat out of focus or very noisy look fine in thumb.
    • There is a minimum resolution requirement for QI which is way above that necessary to produce a nice thumbnail.
    • Even for those images that are user-generated, it isn't like all of them have been judged, as only those who participate at QI tend to nominate their own images.
    • QI only requires the approval of a single judge (though there is scope to contest a vote). It is hard to say that a promotion represents community consensus vs the opinion of one random individual.

As a consequence, QI is more a forum to encourage Commons photographers to take and upload technically fine images taken with high quality equipment. It is in no way an attempt to categorise the body of images as being of high quality. -- Colin°Talk 07:43, 13 May 2022 (UTC)[reply]

@Colin: Interesting points. Part of the justification of this approach in the paper includes the claim that "Only a few images make it to the “image quality” category: there is, therefore, a large consensus on the quality of the images in that category", which does seem to be a bit in tension with the process as you describe it. That said, your first bullet point might pose a bigger threat to the construct validity of the resulting image quality measure as used in the paper; at least I don't see an easy way to rule out the possibility that the underlying classifier overfits on, say, the image being a photo from a contemporary digital camera and other aspects that may be over-represented among images created by Commons users themselves.
CCing two of the paper's authors (those whose wiki accounts I was able to find via the research project page on Meta-wiki) in case they want to comment: @Miriam (WMF) and Daniram3:
Regards, HaeB (talk) 14:28, 29 May 2022 (UTC)[reply]
User:HaeB, I'm having difficulty processing your reply, working out which sentences you agree with me and which you don't or that you only partly agree. Could you rephrase it in more straightforward language and shorter sentences?
I clicked on Random Article a bunch of times and recorded the first 10 photographs that lead each article that had one. They are:
Of the three user generated images, one is only 0.7MP so not valid for QI. Another is 2.08MP so barely valid. All three would not pass QI, even though all three serve a useful illustrative purpose as thumbnails in their articles. Mostly, being a useful illustration of the subject, and good-enough at thumb, is all that Wikipedia needs. The other seven images would not be valid at QI no matter how great they were. -- Colin°Talk 18:41, 29 May 2022 (UTC)[reply]