![]() | This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
It seems to me that novices to regexps. are likely to misinterpret "regular" to mean "occurring at fixed intervals", "evenly spaced", or similar concepts expressing repetitive similarity or even repetitive identity. Another misunderstanding could well be that "regular" means "commonplace" or "ordinary". Recalling the Spanish word "reglas" for "rules", it seems to me that "regular" in the context of this article means more "according to specific, defined rules" or such. I'd like to see a brief comment in the article explaining what seems to be this atypical use of the word "regular". Regards, Nikevich (talk) 21:05, 25 December 2010 (UTC)
Does anyone know where the usage of ^ for start of string and $ for end of string in regular expressions comes from? Did syntax using these symbols for start and end already exist before regular expressions? I know some syntax for newly invented things, such as CSS selectors, also use these symbols. I'm wondering where the usage of these symbols came from because it seems like a very weird arbitrary choice today, especially with $ looking like the S of start but meaning end instead, and there simply be no logical connection between both symbols at all, unlike for example '>' and '<'. Maybe this could be an interesting historical note. 84.253.55.210 (talk) 22:26, 12 January 2011 (UTC)
A quote from the intro:
As an example of the syntax, the regular expression \bex can be used to search for all instances of the string "ex" that occur after "word boundaries" (signified by the \b). Thus \bex will find the matching string "ex" in two possible locations, (1) at the beginning of words, and (2) between two characters in a string, where one is a word character and the other is not a word character.
Isn't (1) a correct description of what \bex will match (ie ex at the beginning of words) and isn't (2) slightly incorrect since the first of the two characters that the string is between must be a non word character and the second of the two characters can be anything? — Preceding unsigned comment added by 184.158.71.3 (talk) 05:32, 4 July 2011 (UTC)
Shouldn't the section on Fuzzy be left out? If their fuzzy, then their not regular. This info is well covered in Approximate string matching. I'm thinking of removing this whole section and moving any missed info to Approximate string matching. What do others think? Have I misunderstood something? 02:57, 11 July 2011 (UTC)
Hi! Serach "flags" at mozilla ... RegExp. I was spending to much time to adapt en:user:Lunchboxhero/externISBN.js to my needs. Regards ·לערי ריינהארט·T·m:Th·T·email me· 18:51, 5 October 2011 (UTC)
I think there should be at least mentioned about negativity. As i have read in other sources it is not possible to simply inverse the result (like grep -cv or something liek that), so it should be mentioned so people would start to look for other solutions.
P6v53as (talk) 14:22, 8 November 2011 (UTC)
For readers having active minds, but with less or no formal education in computer science, formal logic, or higher math., the term "regular" might seem to be the antonym of "irregular". With a modest knowledge of Spanish, I came to realize that reglas, "rules" in Spanish, offered a very helpful insight. In this context, "Regular" implies "according to rules"; a naïve independent scholar might not realize this for a while. Perhaps a short paragraph explaining this might be helpful for reducing confusion and misunderstanding. Regards, Nikevich (talk) 08:20, 19 November 2011 (UTC)
None of the examples show replacing a string with another - which at least some implementations allow. Should we show at least one? Martin Packer (talk) 00:28, 28 January 2012 (UTC)
I think that the formal definition should be moved to the end of the article. Most of the people accessing this article want to see some example. Therefore the article should present the examples first and the formal definition later. — Preceding unsigned comment added by 89.23.239.59 (talk) 18:19, 25 March 2012 (UTC)
It is said in the article that: "For yet other languages, such as Object Pascal(Delphi) and C and C++, non-core libraries are available"...
But regex does actually exist as a C core library; "man regex" gives me that on BSD OS:
REGEX(3) BSD Library Functions Manual REGEX(3)
NAME
regcomp, regerror, regexec, regfree -- regular-expression library
LIBRARY
Standard C Library (libc, -lc)
69.80.96.81 (talk) 04:48, 28 May 2012 (UTC)
One of the examples given is:
* the word "car" when not preceded by the word "motor"
Then it says "These examples are simple."
I don't think that's a simple example - in fact I can't think of a way to match that. Either I'm being dim (quite possible) or that example should be removed. Jj Banana (talk) 15:12, 13 February 2012 (UTC)
The third sentence of the article is flagged {citation needed}, but I don't know how to fix this within the article.
I would like to suggest this reference:
"Mastering Regular Expressions, 2nd Edition from O'Reilly, by Jeffrey E. F. Friedl; Chapter 3 "Overview of Regular Expression Features and Flavors", page 85, under the heading "The Origins of Regular Expressions", third and fourth paragraphs:
"Although there is evidence of earlier work, the first published computational use of regular expressions I have actually been able to find is Ken Thompson's 1968 article Regular Expression Search Algorithm in which he describes a regular-expression compiler that produced IBM 7094 object code. This led to his work on qed, an editor that formed the basis for the Unix editor ed.
ed's regular expressions were not as advanced as those in qed, but they were the first to gain widespread use in non-technical fields. ed had a command to display lines of the edited file that matched a given regular expression. The command, "g/Regular Expression/p", was read "Global Regular Expression Print." This particular function was so useful that it was made into its own utility, grep (after which egrep--extended grep--was later modeled."
--hope this helps leeeoooooo [002012-06-05] — Preceding unsigned comment added by Leeeoooooo (talk • contribs) 01:12, 6 June 2012 (UTC)
The document http://genius.cat-v.org/brian-kernighan/articles/beautiful suggests the same:
"Regular expressions first appeared in a program setting in Ken Thompson's version of the QED text editor in the mid-1960's. In 1967, Ken applied for a patent on a mechanism for rapid text matching based on regular expressions; it was granted in 1971, one of the very first software patents [US Patent 3,568,156, Text Matching Algorithm, March 2, 1971]. [...] Regular expressions moved from QED to the Unix editor ed, and then to the quintessential Unix tool, grep, which Ken created by performing radical surgery on ed."
Also, roughly the same text appears in the book "Beautiful Code: Leading Programmers Explain How They Think" (the above link is a draft).
70.82.120.78 (talk) 19:23, 15 August 2012 (UTC)
What is the pattern matching algorithm which is actually working at the ground level is it 1. KMP Matching Technique 2. Rabin Karp... Questions araised since the Text is converted internally into char array and the pattern is matched over that.... — Preceding unsigned comment added by 106.51.151.241 (talk) 15:16, 2 December 2012 (UTC)
Read above (-: I've edited it already, please respond when I'm wrong.
Topics stay on the talk-pages until someone (or something) clears away old topics into separate pages called archives (this page has two archives linked in the header area). TEDickey (talk) 17:17, 7 April 2013 (UTC)
You mean, like some kind of disease? In which case it would be "-itis" (and probably not even hyphenated, like such: "backslashitis"). --Jerome Potts (talk) 16:03, 1 June 2013 (UTC)
The article assumes that regular expressions apply only to strings. At some point we will have to generalize its content to include graph-based regular expression patterns that can find paths in graphs and entire subgraphs. See the paper: Alkhateeb, Faisal, Jean-François Baget, and Jérôme Euzenat. "Extending SPARQL with regular expression patterns (for querying RDF)." Web Semantics: Science, Services and Agents on the World Wide Web 7, no. 2 (2009). Gmelli (talk) 18:52, 4 June 2013 (UTC)
The fuzzy matching section of main article now asks for citations, I believe topics like Levenshtein automata contain bunch of relevant research. Algorithms relevant to error tolerant traversal of finite-state network without using composition are described in e.g. Oflazer's error tolerant matching with finite-state automata, I might be able to write something up later. --Flammie (talk) 04:25, 17 July 2013 (UTC)
With all due respect to previous editors, I think the present state of this article is quite poor. I remember coming to this article several years ago, before I knew regexes, and leaving even more confused. It could almost do with a total rewrite in many places. Rather than trying to cover all possible aspects of regex syntax, it should present more information about regexes and defer to (say) Wikibooks (to which some of this content should be moved, if appropriate).
I might have a go at some stage, but the door is wide open for anyone keen for a spot of pruning and rewriting here. — This, that and the other (talk) 11:12, 19 July 2013 (UTC)
When did Ken Thompson build Kleene's notation into the editor QED? In other words, when were Regular Expressions first used in software? The QED page is more vague about that. Sam Tomato (talk) 20:33, 30 September 2013 (UTC)
Hermel (talk) 19:45, 9 October 2013 (UTC)
The file Thomson-kleene-star.svg is wrong. There is a transition missing between the state to the right of q, and the state to the left of f. Also this could be written as a determinalistic finite automata with two states. Using a NFA is unessesary and confusing to new readers.
Here is the offending file:
Rekahsoft (talk) 05:41, 30 May 2014 (UTC)
I didn't see a section about regular expression tools. I recently found this interesting Windows program that will allow a person to click on fields to create an expression. http://www.ultrapico.com/Expresso.htm Does anyone know of other similar tools? • Sbmeirow • Talk • 16:12, 1 July 2014 (UTC)
Please see Talk:Perl Compatible Regular Expressions#pcre syntax highlighting lost. John Vandenberg (chat) 06:48, 18 July 2015 (UTC)
Why all the examples are in PHP? Shouldn't they be in pseudocode or in a more-common-syntax language? I find that the $ in the variables names can be confused with the regexs syntax. 186.136.108.233 (talk) 13:52, 11 February 2014 (UTC)
I'm adding links to search online regular expression testers and to one specific example, which are an excellent way to explore regular expressions with sufficiently equipped browsers, but require Wikipedia:EL#Rich_media exensions. - Tatzelbrumm (talk) 10:52, 22 May 2014 (UTC)
The image example is awful, as it is showing an exact opposite of a regular expression. Positive look-behinds are NOT regular. --141.89.226.146 (talk) 23:30, 13 November 2015 (UTC)
Hello,
In the example at the top of the page the lookaround "(?<=)" and "(?=)" groups are used but not explained in the text. I've added a link at the bottom to a Quick Start guide with at least a description. Can someone add the operators to some list in the page. The page indeed needs clean up. As a computer scientist I can say that the page is written from the point of view of a theoretical computer scientist, not the point of view of an average Wikipedia user.
Thanks!
Jgamleus (talk) 17:41, 17 July 2015 (UTC)
I'm not an expert on regular expressions so don't want to edit myself. In the table of metacharacters there are two instances of "?". Should the second be "=" ? SolarMcPanel (talk) 11:12, 2 August 2016 (UTC)
?
Matches the preceding pattern element zero or one time.?
Modifies the *
, +
, ?
or {M,N}
'd regex that comes before to match as few times as possible.H.?e
as an example regex where the ?
makes the .
optional (zero or one occurrences matches). The second regex uses l.+?o
where the ?
modifies what the preceding +
means. On its own, +
matches the previous item one or more times, as many as possible. In combination, +?
matches the previous item one or more times, as few as possible. Johnuniq (talk) 11:41, 2 August 2016 (UTC)I've always heard and said rejex (I don't speak phonetics) and I'd be very surprised if anyone used a hard g, although it could be argued as logical.Andthepharaohs (talk) 19:12, 9 September 2016 (UTC)
Where did you first hear that? I've always heard/said it with a hard 'g', which as you point out is logical. FusionDude (talk) 20:33, 28 October 2016 (UTC)
Maybe its a Locale thing? In the UK, rejex was the norm (but I did retire 10 years ago and it may have suddenly changed). We might ask Ken Thompson who wrote the original unix code, but the usage is in the public domain now. A Google search reveals no concensus. So I would go with Larry Walls, the inventor of Perl, who reckoned, "There's always more than one way to do it". Javalava101 (talk) 12:58, 14 December 2016 (UTC)
Here in Germany in a multilingual work environment I've never heard "rejex" at all. It still sounds most peculiar to me ;-) --Alfe (talk) 13:53, 15 March 2017 (UTC)
The blurb about DTDs is way off base. DTDs are not (in general) regular; they're more like CFGs. DanConnolly (talk) 02:43, 3 October 2017 (UTC)
as a PhD in molecular biology, I know about explaining tech stuff (trust me) this article lacks a simple example at start, and lacks a sentance in intro that is clear please, add simple stuff (if the proverbial mom or dad can't get it, it ain't simple enough)
someothing like Regex refers both to a theory about how to find certain patterns, and programs that look for certain patterns. Eg, suppose we want to look for the word "serialize" and some common mispellings, and find "serialize" when it is between 1 and 20 characters from the word "journal" we would use... blah blah or something like that — Preceding unsigned comment added by 64.130.228.122 (talk) 18:06, 29 December 2017 (UTC)
However, this does not ensure that not the whole sentence is matched in some contexts. The question-mark operator does not change the meaning of the dot operator, so this still can match the quotes in the input. A pattern like ".*?" EOF will still match the whole input if this is the string "Ganymede," he continued, "is the largest moon in the Solar System." EOF
"However, this does not ensure that not the whole sentence is matched" is either incomprehensible or very poorly phrased (double negation). In any case, ".*" EOF
will match this part:
Ganymede," he continued, "is the largest moon in the Solar System.
Whereas ".*?" EOF
will match the same thing (lazy/minimal/reluctant matching makes no difference here because there's only one possible match). That is to say NOT the whole input. Urhixidur (talk) 19:40, 22 January 2018 (UTC)
I came here to get some regex examples and these are too ambiguous https://en.wikipedia.org/wiki/Regular_expression#Formal_definition
Examples:
a|b* denotes {ε, "a", "b", "bb", "bbb", …}
(a|b)* denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", …}
Using short strings and not actual words isn't too comprehensible. If I was a computer examples of how to search binary numbers might be fine but these examples don't teach the concept. — Preceding unsigned comment added by Jawz101 (talk • contribs) 16:54, 2 April 2018 (UTC)
I tried to test the regex for binary multiples of three, but it seems to only provide either 0 or binary numbers that have a 1 at the beginning and the end. This can not be right, since e.g. 1100 is binary for 12 and does not end in a 1. — Preceding unsigned comment added by 2A02:8108:1BF:704E:58DB:9A23:AB51:9406 (talk) 22:28, 3 June 2018 (UTC)
(0|(1(01*0)*1))*
(note the asterisk at the end), and that certainly matches "1100": the leading "11" is matched by the 1(01*)*1
part of the alternative, and the zeroes are matched by the 0
part of the alternative, twice. – Tea2min (talk) 06:30, 4 June 2018 (UTC)(Kleene star) R* denotes the smallest superset of set described by R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from set described by R. 11:03, 17 October 2018 (UTC)
a set described by R
, no?Mczuba (talk) 11:03, 17 October 2018 (UTC)
the
". Thanks for noticing! - Jochen Burghardt (talk) 15:09, 17 October 2018 (UTC)The example image's regex uses lookaheads/lookbehinds without them being defined anywhere in the article!
I realise this is a result of edits to the image's original caption over time, but the image should probably be removed, or definitions added. — Preceding unsigned comment added by Swith22 (talk • contribs) 22:16, 2 May 2018 (UTC)
Regex "[^"]*+" in possessive matching not gives a different result than "[^"]*" in lazy matching. Tejasvi Singh Tomar (talk) 13:40, 4 June 2019 (UTC)
The text suggests that "regular expressions" in the modern software sense aren't actual "regular expressions" in the mathematical sense (this is demonstrably true) and that they are actually "regexes". This is misleading. There are those who use the term "regex" to mean "modern software regular-expression-like engines" (this was first clearly articulated in the early 2000s as far as I know, by Larry Wall when developing Perl 6 Rules and has gained some traction). But it's trivial to find counter-examples in the literature.
Here is a researcher referring to mathematical regular expressions as "regexes":
And here is a Google patent that refers to software regular expressions with seemingly arbitrary alternation between the two terms:
You can see that there's just no consensus in Google Scholar search (results above from page 2) and Wikipedia really should not be used to try to assert one where it does not exist...
-Miskaton (talk) 20:32, 7 August 2019 (UTC)
See the WP:MOS for guidance on ''See Also. This is a topic on regular expressions, which is not the same as a list of applications which implement regular expressions. TEDickey (talk) 11:10, 28 November 2019 (UTC)
The Patterns section says standard textual syntax. I know that there are definitions of regular expressions within the POSIX standard but that is for within POSIX. Where is the authority saying that the definition within the POSIX standard applies outside of POSIX? Sam Tomato (talk) 21:40, 19 March 2020 (UTC)
@Irontitan76: Your edit (diff) inserted "of" in the following:
That totally changes the meaning. Are you sure about that? The original seems much more likely to me. Johnuniq (talk) 06:33, 29 October 2020 (UTC)
@Johnuniq: You're completely right and I completely made a mistake. I reverted the change. Irontitan76 (talk) 14:13, 29 October 2020 (UTC)
There is no discussion here of the 'not' operator. It is fleetingly shown in the mention of assertions. I would expect it to be in the metacharacter list as well. Neils51 (talk) 01:23, 26 November 2020 (UTC)
I think it's worth touching on the pronunciation of the word, as it seems that that some people pronounce it "redge-ex" and others pronounce it "regg-ex". The former seems to be linguistically more natural since there is no glottal stop in the middle of the word, and seems to be used more commonly, however some sources suggest it is pronounced the way indicated in the latter example.
Thoughts? Nabeel_co (talk) 05:15, 25 February 2021 (UTC)
The text states, without citation: "In most formalisms, if there exists at least one regular expression that matches a particular set then there exists an infinite number of other regular expressions that also match it—the specification is not unique." Well yes, X|X matches the same set of strings as X, as does X|X|X etc. But is this worth saying, does it matter, and if it is worth saying, shouldn't there be a citation? 82.152.109.221 (talk) 10:35, 31 January 2022 (UTC)
Find and replace is a related, but quite different, concept. It should be a separate article, discussing the usage of find and replace in UIs and text editors. Caleb Stanford (talk) 00:57, 9 December 2021 (UTC)
@Tacsipacsi: Re Special:Diff/1092712430/1092785488 -- makes sense to me! Thanks Caleb Stanford (talk) 22:02, 12 June 2022 (UTC)
I was *really* surprised that the list of programming languages supporting regular expressions does not include Perl, considering several of the languages listed use PCRE (Perl Compatible Regular Expressions) under the hood and that most regex packages these days follow the regex syntax conventions pioneered by Larry Wall in Perl. Perl is one of the pioneers of regexes in programming languages (along with Awk) and the 800 pound gorilla in the regex supporting programming language. It is really really strange to see it not listed explicitly. I am going to add it. — Preceding unsigned comment added by 121.7.90.69 (talk) 04:07, 4 February 2023 (UTC)