wikEd diff is a free JavaScript visual diff library for inline text comparisons. It is the only available JavaScript diff library that detects and highlights block moves and that works on the word and character level. While wikEd diff has been developed and optimized for comparing Wikipedia source texts, it works great for any type of text, including program code. The library is customizable, has Unicode and multilingual support, is fully commented and documented, and is free (public domain). The script is used by the Wikipedia/MediaWiki in-browser editor wikEd and by the gadget wikEdDiff. You can test the library by checking any of these gadgets in your English Wikipedia preferences. For easy text comparisons by copy-and-pasting you can also use the wikEd diff online tool and demo. The library has also been ported to PHP in the form of a MediaWiki extension.
Features
Visual inline style, changes are shown in a single output text
The library is fully customizable. The following settings can be changed on your User:YourUsername/common.js page. Please check the top of the diff.js program code for additional settings.Please see wikEdDiff for wikEdDiff-specific settings.
The regular expressions for splitting the texts are optimized for natural language containing wiki markup. While this default also works great for other text types such as source code, it can be customized:
This method is word-based and uses unique words as anchor points to identify matching text and moved blocks.
wikEd diff has additional code for resolving unmatched islands that are caused by common tokens at the border of sequences of common tokens or by sequence crossing-overs. From these matching data, the program extracts moved blocks and compiles a new text with added insertion, deletion, and moved blocks, and block marks.
Details
Matching tokens (Heckel method):
A list of real words and chunks is compiled from the old and the new text for later "unlinking" step wordParse()
Old and new text versions are split into tokens ("symbols", "words"), creating two token lists splitText().tokens
The number of token occurrences in each version are counted in the symbol tables calculateDiff()symbol
Tokens unique in both texts serve as anchors or seed crystals and are connected ("linked", "matched") between old and new version calculateDiff().tokens.link
Starting from these connected pairs, adjacent identical pairs are connected outwards
In addition to the classical Heckel method, the following improvements have been implemented:
The process is repeated ("levels") while decreasing the size of unmatched tokens from paragraphs to lines, chunks (i.e. html and link markup), words, and single characters
Words are split into characters only in corresponding unmatched regions ("gaps") of identical token number and strong similarity in all tokens: splitRefineChars()
Identical tokens (i.e. space separators) will be linked, resulting in word-wise character-level linking
One token became connected or separated by space or dash (or any token) (axb → a b)
Addition or deletion of flanking strings in tokens (xabx → ab)
Addition or deletion of internal string in tokens (axb → ab)
Same length and at least 50 % identity (abc → abx)
Same start or end, same text longer than different text (abcd → abcxx)
Same length and at least 50 % identity (abcd → abcxx)
Gaps caused by "crossing-overs" are resolved by repeating the procedure at each level with a fresh symbol table for unlinked tokens
Corresponding gaps caused by non-unique tokens on both sides of the gap borders are resolved by repeating the procedure at each level for each gap pair recursively
Ambiguous gaps with the same front and back tokens are aligned with newlines or word borders if possible ("sliding") slideGaps()
Extracting block information:detectBlocks()
Unchanged text blocks ('same' blocks, sequences of matched tokens) are added to the blocks table in new text order getSameBlocks()blocks
To set blocks as fixed vs. moved, the longest sequences of blocks in increasing old text order are identified (longest increasing subsequence problem for moved blocks):
Independent block sections with no block moves outside the sections are identified getSections()sections
Continuous blocks in old text order are identified as groups getGroups()groups
The longest subsequence of groups in sections in old text order is identified using a cached recursive method findMaxPath() and set fixed setFixed()
Blocks outside sections are set fixed
Preventing fragmented outputs caused by false positive matchings in texts with many changes:unlinkBlocks()
Unchanged blocks are converted into insertion/deletion pairs in token lists ("unlinking"):
Unlink whole group if no block is shorter than three real words blockMinLength and contains no word unique to the whole text
Unlink group blocks stepwise from the front and the end until a block contains more than one real word or a word unique to the whole text
After unlinking, the whole process of extracting block information from the token lists has to be repeated
Unlinking is repeated as long as possible
Deletion blocks ('del' blocks, unmatched sequences of text blocks from the old text token list) are added to the blocks tablegetDelBlocks()
Deletion blocks are sorted-in in between the unchanged blocks at the correct position in new text order:positionDelBlocks()
Deletion blocks that are next to a fixed neighbor block in old text order are positioned next to that block
Deletion blocks without a fixed neighbor in old text order are positioned next to a neighbor if that block is not the start or end of a group
Alternatively, the deletion block is positioned next to the closest fixed neighbor to the left
Deletion blocks around unchanged blocks will be arranged in order of their old text positions
Insertion blocks ('ins' blocks, unmatched text blocks from the new text token list) are added to the blocks tablegetInsBlocks()
Insertion block group numbers and fixed status are set:setInsGroups()
If the insertion is inside a group, that group number and fixed status is used
For insertions outside existing groups a new single-block group is created
Adding marks at the original position of moved groups ('mark' blocks) to the blocks table:insertMarks()
This is similar to positioning deletion blocks
Marks of moved goups that are next to a fixed group in old text order are positioned next to that group
Alternatively, the mark is positioned next to the closest fixed group to the left
Mark blocks and deletion blocks around unchanged blocks will be arranged in order of their old text positions
Moved group colors are set in order of the marks in the new text
Collect diff fragment list for markup (abstraction layer for customized diffs):getDiffFragments()
Groups are processed in new text order
If block moves are not displayed as per configuration showBlockMoves, then this text is inserted as a deletion
Otherwise, a mark is added with this text as a popup title
Clip unchanged sections from unmoved block text:clipDiffFragments()
Unmoved block text is clipped at the closest feature that is detected from the borders (lines, heading, paragraph, line break, blank, fixed number of chars)
Create html formatted diff code from:getDiffHtml()
See also
wikEd, a full-featured MediaWiki-integrated text editor that adds enhanced text processing functions to edit pages. wikEd provides syntax highlighting, search and replace functions, and on-page previews and diff views
wikEdDiff, uses this library to provides an improved and easier to read diff view for comparing article versions