Note: LinkBot has been superseded by the Can We Link It link suggesting web tool.

"Can We Link It" has the following benefits above LinkBot, or feature parity with it:

However there are some downsides as compared to LinkBot:


What is the Link Suggester, and what is LinkBot?

The Link Suggester is a bit of software that takes the article text of a Wikipedia article, and looks for links that could be made to other articles, and that have not been made yet. LinkBot is the bit of software that takes these suggestions and presents them in way that people can easily read them on the Wikipedia. Both bits of software are written and operated by Nickj.

How to use the Link Suggester?

Can you give an example?

Yes, using a real-world example. Consider the following snippet (wiki codes are shown):

Newtown lies partly in the electorate of [[Grayndler]], currently represented by Anthony Albanese of the [[Australian Labor Party|ALP]].

The output of the link suggester might look like this:

What happens with this suggested link information?

The plan is for it to be added to the article's talk page.

So the article itself would not be modified?

No. The best judges of what are good links are humans, not software. Software does not understand context or meaning, whereas a human does. Therefore, the link suggestions would be added to the article's talk page, and then a human editor can add the links to the article, or disregard or delete them if they are not appropriate.

So if humans are better at determining appropriate links, why suggest links?

4 reasons:

  1. Humans miss things. In the previous example, "Anthony Albanese" was a good link, but it was simply missed.
  2. Checking for links is tedious. The only way to know if the Wikipedia contains articles that are relevant is to search for them. Most people don't bother other than for a subset of the possible links.
  3. Links are content. [1]
  4. The Wikipedia is expanding, and while there mightn't have been an appropriate link when the article was written, there may be one now.

The best approach is likely to be a fusion that combines the strengths of humans, and the strengths of software; A bit of software can perform the tedious and repetitive process of finding missing links; and a human is best able to take that information and apply it appropriately to the article.

Has anyone discussed this idea before?

Yes - look here for a discussion of auto-linking.

Note that most of the arguments against assumed that:

An approach is being used that specifically tries to address these problems - please see below for more information on good links versus bad links.

Surely we don't want every possible link? That would be lots of links!

Exactly, and this is where it gets interesting. Making every single possible link is simply not conducive to the flow of an article. For example, consider the following real-world example wiki sentence:

A reorganisation of local government boundaries in 1968 saw part of Newtown placed under [[Marrickville]] council.

The link suggester, when showing every possible link would show the following for this sentence:

  1. Can link local: A reorganisation of local government boundaries in 1968 saw p...
  2. Can link government: A reorganisation of local government boundaries in 1968 saw part of Newt...
  3. Can link boundaries: A reorganisation of local government boundaries in 1968 saw part of Newtown placed

This is a pretty exhaustive list, but these are probably bad things to link on. To avoid this, the Link Suggester applies some simple rules-of-thumb to try and improve the signal-to-noise ratio (i.e. keep the 'good links' and eliminate the 'bad links').

So what makes a good link, and what makes a bad link?

A good link is usually either :

Then there are things that are sometimes worthwhile linking on:

Things that are usually bad links:

Additionally, a link should only be suggested once per article, as per the Wikipedia style.

Also, an article shouldn't link to itself (even by linking to a page that redirects to the original article).

So what exactly will it link on?

The link suggester will suggest links that meet these criteria:

What remains are generally quite safe links to suggest, with a good signal-to-noise ratio.

What is the current project status?

Would you like it to be run on the whole wiki in the future?

The Link Suggester has been run on a local copy of the whole English Wikipedia already for testing purposes, but thus far these results have not been uploaded to the Wikipedia. The aim (if the feedback is suitably positive) is eventually for the LinkBot to upload the suggestions for every article in the English Wikipedia.

Are there any spin-off projects that flow out of this?

Yes, there are two:

  1. When processing the page, the link suggester needs to look at the page's wikicode (so that it doesn't suggest a link something that is already a link, for example). When doing this it is easy to detect pages with broken wikicode (e.g. unclosed wiki-links, unopened wiki-links, malformed section headers, and so forth). This spin-off project evolved to become the Wiki Syntax Project.
  2. The Missing Redirects Project, which suggests new useful redirects and disambiguation pages, from the data that we already have in the Wikipedia.

Will it suggest links multiple times? e.g. Will it try to link the same proper noun two or more times?

No.

Will it suggest links to articles with titles similar to, but not identical to, the text in my article?

No (apart from capitalization differences).

Why did you make this?

I kept finding that there were articles that I hadn't linked to yet, simply because I didn't know that they existed. I decided that there should be an automated way of suggesting links, based on the text of the article.

Can I get the source code?

Source code for LinkBot is now available. It's a bit of a mess, sorry!

Does it ever make bad suggestions?

Yes. Please take all the suggestions with a grain of salt. Bad suggestions are almost always because the same combination of words are used to mean different things to different people. I have endeavoured to eliminate bad suggestions, whilst keeping good suggestions, but getting a perfect automated link suggester is probably impossible without genuine Artifical Intelligence.

Also some suggestions can be tangential - that is they're not inherently wrong suggestions, they're just not appropriate to link to in the context that they're suggested. For this reason the suggestions are provided for your review, so that people can make the links they like and disregard the rest.

Will it ever modify the original article?

No, never. If the original article ever gets modified by the software, then that's a bug.

Is this an official Wikipedia project?

No - at the moment this is a personal project.

How does it work?

  1. The most recent available copy of the enwiki database is downloaded and stored.
  2. A large index of the names of every article, and the target (if it's a redirect) is built in memory (for speed).
  3. The text of every article is retrieved from the local database, and then processed to look for unlinked text that matches the title of an existing article. Those results are then filtered based on the rules of what makes a good or bad link, and the wiki syntax of the article is checked.
  4. The results are then saved to the local database.
  5. Once the above has finished, the saved suggestions can then be uploaded to the Wikipedia by the LinkBot.

How long does it take to suggest links for the whole of the English Wikipedia?

On the old Pentium-3 800 MHz which I am using for this:

So total time taken = 7 + 0 (rounding down) + 51 = 58 hours.

Note: These times do not include any uploading of suggestions to the Wikipedia by the LinkBot. This is purely the time taken to generate the suggestions, which the LinkBot can then upload.

Note: These times are proportional to the number of articles in the Wikipedia (and were accurate as of 13-Dec-2004) - so as the Wikipedia continues to grow, the times taken to process the data will increase too.

What programming language is it written in?

In PHP, running on Debian Linux 3.0.

How long roughly might it take to upload the data to the Wikipedia?

Currently the main speed-limiting factor is political, and not technical. This is because there is maximum allowed limit of 6 transactions per minute for bots (although there is some discussion on allowing exceptions to this rule where there is consensus on it).

At the 6 transactions per minute rate, uploading all the suggestions to the Wikipedia would take 24 days, which is a very long time (and runs the high risk of the suggestions becoming out of touch with the article if it has been edited in the intervening period).

The fastest the LinkBot could probably go is around 30 or 40 transactions per minute, at which rate uploading suggestions would take around 5 days (which is far more realistic). So, if after the trial phase goes well, and if it still seems a good idea to run it on the whole Wikipedia, then I will see if there is consensus on raising the transactions per minute limit on the LinkBot.

Are there any examples I can see?

Yes - for current examples here is the edit log for LinkBot.

There are also some much older examples (these are from Phase 1, done with manual cutting and pasting, around late October 2004):

Can I leave feedback, and if so where?

Absolutely - you can leave both positive feedback and negative feedback (if you're not sure which, pick whichever you think is most appropriate). You can also let me know about suggested links that probably should never be suggested.

Why were no links suggested pointing to my article?

If no links were suggested, then the reason was one of the following:

What are some of the reasons for manual exclusions?

Manual exclusions (never suggesting a link to a page, even though it qualifies as a "good link") are added because of one or more of the following reasons:

Why add the suggestions to a talk page, and not as some kind of big list?

The idea of adding suggestions to the talk pages is that:

  1. They're suggestions - a user may not like some of them - and so the actual article should never be automatically modified. People feel strongly about this, and I agree with them.
  2. By adding suggestions to the talk page, those suggestions are automatically associated with the page, and visible to anyone watching the page, all without the problems inherent in modifying the page.
  3. The link suggester shows suggested links from a page, but it will also show suggested links to a page (but only if there are outgoing links as well).

The problems with listing them separately as a big series of list pages are that:

  1. Point 2 above is lost.
  2. It would be a huge number of pages, much more than 100 or 200. With 210931 pages with suggestions, at around 3.8 suggestions on average per page, that's over 800,000 suggestions in total. Experience with this kind of process of making lists of changes (with creating the data for the Wiki Syntax Project) has shown that 140 suggestions per page (includes link to page, word that can be linked, and context of the change) is the perfect size to get the page size just under the 32K suggested maximum page size. 800000 / 140 means there would be 5715 pages.
  3. Then people have to process those 5715 pages. Experience with asking people for their help in processing changes listed on pages (as part of the Wiki Syntax Project), with a lot of effort by a lot of people, has shown that the processing rate is around 35 such pages per week. At that rate of progress, with the same consistent and concerted effort (which would probably be very hard to do over such a long-haul project), it would take around 165 weeks to process all of the suggestions, which equals 3.2 years.
  4. Point 3 above is lost unless you have twice as many pages (a list to, and a list from), which would be 11430 pages.
  5. Over such a long period of time, with the rapid pace of change in the Wikipedia, the data would age very rapidly, and quickly become irrelevant to the content that was actually on the pages at the time a human got around to looking at the suggestion.
  6. In other words, the most viable approach appears to be to distribute the problem of processing the links out to the page authors / maintainers by putting those suggestions on the talk pages (and this act of distributing the problem is one of the most fundamental reasons why the Wikipedia is so successful). Of course, page authors can ignore the suggestions (personally, I wouldn't, if I thought that they were good suggestions, and if I cared about the page) - and with the tests, sometimes people ignore the suggestions, and sometimes they use them. Of course if you ignore the suggestions, then you're no worse off than you were before.

Can I submit some text to look for linkify-ing?

I am looking for pages to contribute to. A friend suggested writing my own personal bio, then submitting it to this linkify bot as a way to find pages I might be interested in editing. dfrankow 17:02, 12 October 2005 (UTC)

What is the best way to annotate the suggestions?

If I create some of the links as suggested by the link bot, it is acceptable to delete that part of the link bot text on the talk page, or is that considered as bad as deleting the comments of a real person? I'd rather not have to put a comment after every suggestion saying that it is now taken care of, but if the text just stays, other people will waste time investigating every suggestion again.--ragesoss 15:24, 11 January 2006 (UTC)

It's no problem to delete them, and no offense whatsoever will be taken by deleting them. Alternatively, you can always strike the suggestions out to indicate that they have been done. -- All the best, Nickj (t) 04:14, 12 January 2006 (UTC)