This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: "Right-to-left mark" – news · newspapers · books · scholar · JSTOR (January 2019) (Learn how and when to remove this template message)

‏The right-to-left mark (RLM) is a non-printing character used in the computerized typesetting of bi-directional text containing mixed left-to-right scripts (such as English and Cyrillic) and right-to-left scripts (such as Persian, Arabic, Urdu, Syriac and Hebrew).

RLM is used to change the way adjacent characters are grouped with respect to text direction. However, for Arabic script, Arabic letter mark may be a better choice.

Unicode

In Unicode, the RLM character is encoded at U+200F RIGHT-TO-LEFT MARK (HTML ‏ · ‏). In UTF-8 it is E2 80 8F. Usage is prescribed in the Unicode Bidi (bidirectional) Algorithm.[1]

Example of use in HTML

Suppose the writer wishes to inject a run of Arabic or Hebrew (i.e. right-to-left) text into an English paragraph, with an exclamation point at the end of the run on the left hand side. "I enjoyed staying -- really! -- at his house." With the "really!" in Hebrew‏, the sentence renders as follows:

I enjoyed staying -- באמת! -- at his house.

(Note that in a computer's memory, the order of the Hebrew characters is ‭ב,א,מ,ת‬.)

With an RLM added after the exclamation mark, it renders as follows:

I enjoyed staying -- באמת!‏ -- at his house.

(Standards-compliant browsers will render the exclamation mark on the right in the first example, and on the left in the second.)

This happens because the browser recognizes that the paragraph is in a LTR script (Latin), and applies punctuation, which is neutral as to its direction, in coordination with the surrounding (left-to-right) text. The RLM causes the punctuation to be surrounded by only RTL text—the Hebrew and the RLM—and hence be positioned as if it were in right-to-left text, i.e., to the left of the preceding text.

Security

When inserted into a filename, an RTL mark can make a filename seem like it isn't an executable file. When inserted before "exe." in the filename "abcdexe.fghijk.doc", it makes it seem like this is a .doc file, when really it is a .exe file.

Additionally, an RTL mark can be used in source code comments or string literals to make the compiler output a binary different to the one advertised by the source code. Effectively, sections of the program are syntactically valid abagrans of the source code without RTL marks.[2][3][4][5][6]

Visual Studio Code highlights RTL characters since version 1.62 released on October 2021.[7]

See also

References

  1. ^ UNICODE 12.0 Standard, http://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf, p. 880
  2. ^ Trojan Source: Invisible Vulnerabilities ( University of Cambridge Computer Laboratory )
  3. ^ Trojan Source:Invisible Source Code Vulnerabilities
  4. ^ ‘Trojan Source’ Bug Threatens the Security of All Code (Krebs on Security)
  5. ^ "Trojan Source" Bug Threatens the Security of All Code ( Soylent News )
  6. ^ 'Trojan Source' Bug Threatens the Security of All Code ( Slashdot )
  7. ^ "Visual Studio Code October 2021". code.visualstudio.com. Retrieved 11 November 2021.