This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||
|
It's a bit long now. Perhaps there should be a separate article about LR(0) parsers that deals with the LR(0) algorithm. -- Jan Hidders 10:01 Aug 18, 2002 (PDT)
Two key notions used in the article "reduction" and "derivation" are not explained. Without these notions the article is useless for a beginner. —Preceding unsigned comment added by 82.240.122.196 (talk) 19:08, 25 April 2009 (UTC)
This article is one of the better explanations of LR parsing I have seen. My only complaint is the dual use of the "*" character as (1) the current position in a in the LR(0) item, and (2) as the multiplication operator. That leads to items like "E * * B" (an actual example from the text) which is confusing. Perhaps we could just use a period? --Doradus 22:35, 23 Aug 2003 (UTC)
Looks like the "note on SLR and LALR" overlaps a bit with the section on conflicts. Perhaps they could be combined. --Doradus 01:38, 26 Oct 2003 (UTC)
It's a really minor thing, but could we maybe change the grammars which use nonterminals "E" and "B" to use some other letters? They often look awfully similar on my screen. Marnanel 05:53, 4 Feb 2004 (UTC)
I'm looking for a reference for the claim that Perl and C cannot be parsed by an LR parser. Does somebody know one? -- Jan Hidders 14:12, 21 Jul 2004 (UTC)
This is, indeed, a great explanation of the LR(0) algorithm. Nevertheless, I'd like to suggest a few changes that may make it better:
a) Change the term "begin state" to "start state", as it seems to be the most correct form to reference the, err, start state of a grammar.
b) the section "Conflicts in the constructed tables" needs some editing, as it seems. For example, it begins with the phrase "The automaton that is constructed is by the way it is constructed always a deterministic automaton", which, although correct, sounds confusing (at least to me). A better wording might be: "The steps presented so far always construct a deterministic automaton." or "The automaton contructed by looking for transitions between item sets will always be a deterministic one."
c) Besides (still in the section referred above), we find several references to the grammar not being LL(0). Because we're discussing LR(0), it looks like a typo. It would be better if an explanation of how the non-LL(0)-ness of the grammar afftects the LR(0) parser was given.
d) Finally -- and maybe this is just in my system, I guess that using • instead of · to represent the dot would make it more visble.
Great article. --Branco 13:39, 23 October 2005 (UTC)
Although LR(0) item set construction is important bacause it is the basis for the SLR(1) parser construction algorithm as well as most LALR(1) construction algorithms, it should be pointed out that very few parsers are based on LR(0) itself. Indeed, LALR(1) parsers such as those produced by Yacc and GNU bison seem to be the most commonly used LR(k)-based parsers. Discussing LR(1) item set constuction would not only describe the LR(k) methodology when lookahead is a component of an item (as it always is except for the trivial case of k = 0), it would provide the basis for understanding the LALR(1) construction method and why an LR(1) grammar may not be LALR(1) or why an LALR(1) grammar may not be SLR(1).
It might also be pointed out that although a grammar might be LR(k) for some k > 1, there will always be LR(1), LALR(1), and SLR(1) grammars for the same language. --Raaronson 16:58, 27 December 2005 (UTC)
It might be worthwhile to have a section about what sort of grammars work best with LR parsers: yes, they can parse all LR grammars which is a strictly-defined set, but some work better than others. E.g.:
list ::= item ',' list list ::= item '.'
is an LR(1) grammar that would need to shift the entire input onto the stack before reducing it, whereas:
list ::= list-items '.' list-items ::= item list-items ::= list-items ',' item
is a grammar for the same language that allows the parser to reduce after each item. I just added a note to left recursion that mentions this, although not in as much detail. The same reference [2] is probably useful here. JulesH 08:06, 4 August 2006 (UTC)
I removed from section 1.2:
In practice the parser will usually try to recover from a syntax error by ignoring unexpected symbols and/or inserting missing symbols, and continue parsing, but this is outside the scope of this article.
This paragraph is like taken from a tutorial. Well, I imagine it's only because of the "outside of scope" thing. Whatever. --euyyn 22:14, 4 September 2006 (UTC)
These are what I found when searching on the Internet for my project on implementing a LR(1) yacc:
- Practical LR(k) Parser Construction [3]. By David R. Tribble, 12/12/2004. This is a long and complete introduction, although many places are labeled as "+INCOMPLETE".
- The Honalee LR(k) Algorithm [4]. By David R. Tribble, 4/27/2006.
BTW, in "E → E + B", maybe B can be changed to T so it looks different from E. "Item" is called "Configuration" in some early papers in the 1970s.
-- oxc 10:04, 7 December 2006 (UTC)
I am not convinced that the example is entirely consistent with the given definition of closure.
In particular, in construction of the item sets the rule "S → E •" is added to Item set 3, and two others are added to complete the closure. However this rule has empty follow set, so closure as defined (as I understand) will not add a rule such as "E → E • * B" to the item set.
It is entirely possible that I have misunderstood some part of this -- if so perhaps this is an indicator that the explanation needs clearing up a little (or that I should pay attention in lectures :p ), but a comment either way by someone more knowledgable than I would be appreciated. JoeKearney 19:52, 6 April 2007 (UTC)
If we're at the state of "S → E •" then we want (i.e. the closure includes) all rules to parse anything in the follow-set of E.
The article states that "The parser is a finite state machine". This statement is false, as the parser has a stack, which means it has a potentially infinite number of states. Perhaps the parser is a Pushdown Automaton, but i am not sure enough to change it. Someone who does know for sure, please change this. 62.131.189.89 07:55, 22 May 2007 (UTC)
Prefaced the description of the stack machine with a remark about the alternative, functional, recursive ascent, view on LR parsing, and added a corresponding reference 81.5.0.80 18:45, 11 July 2007 (UTC).
I made a few corrections. A parser does not deal with a grammar. It reads computer languages. It's the parser generator that "deals" with a grammar.
About the "dangling else" problem ... A BNF grammar that contains the dangling else problem is ambiguous and NO parser can handle it, unless the ambiguity is resolved !!! An LR paser can and does handle the dangling else problem just fine, because the ambiguity (or conflict) gets resolved at parser generation time by chosing to shift the "else" onto the parse stack instead of making a reduction before accepting the "else". So, it is a misconception to say that an LR parser cannot handle the dangling else problem.
About C++ and typedef ... A simple or pure LR parser cannot handle the context sensitive problems with C++ and the typedef in C. However, a smart LR parser which uses the symbol table to classify <identifier>s as "typedef" or just <identifier> can successfully parse C++ and C typedef situations. So, again, one should not generalize that LR parsers cannot handle C++, because in practice it is possible. Theoretical computer scientists might disagree, but then changing definition of LR parser can decide the argument if you want to get mathematical. I just don't like to put limitations on LR parsing, besides what other techniques are you going to use -- recursive descent (what a nightmare)?
Paul B Mann —Preceding undated comment added 00:36, 10 September 2009 (UTC).
This article is sorely lacking a History section. Does anyone know enough to add one? --Doradus (talk) 16:46, 3 February 2010 (UTC)
I belive Knuth invented LR parsers. At least that information should be in the article and a reference to the original publication. 89.153.132.2 (talk) 23:22, 8 May 2011 (UTC)
Article says "Every LR(k) grammar for k > 1 can be mechanically transformed into an LR(1) grammar", then later it says "most programming languages can be expressed with LR(k) grammars, where k is a small constant (usually 1)." Surely, given the first quote, 'usually 1' makes no sense? —Preceding unsigned comment added by 80.0.167.213 (talk) 17:27, 3 August 2010 (UTC)
“LR parsing can be generalized as arbitrary context-free language parsing without a performance penalty, even for LR(k) grammars.” What? This sentence does not make sense, I thought, LR(k) ⊂ CFL, and in the following sentences a performance-penalty is mentioned. --Chricho (talk) 12:07, 11 September 2010 (UTC)
The example parse for '1 + 1' is too trivial to give any insight into bottom-up versus top-down parses, or the handling of delimiters and non-infix operators, or necessary recursion, or the meaning of individual LR(0) states. DBSand (talk) 06:57, 3 May 2012 (UTC)
Second thoughts. The example '1 + 1' is rich enough to show bottom-up versus top-down parsers, if the parse tree is graphically shown in a diagram (not yet present). It is rich enough to show the meaning of LR(0) states. (Showing the core items in the state chart would help.) And it does show one important use of grammar recursion. Showing use of non-operator delimiters or prefix unary operators would be nice, in a separate article about grammars, but would needlessly complicate this short article about LR.
The '1 + 1' example and grammar has three disadvantages for this article:
1) Its grammar is only LR(0) so the parse table has no examples where lookahead is needed to pick between shift action and reduce action. That's very important for explaining SLR(1) and LALR(1) parsing as used in practice; there are very few practical grammars that are merely LR(0). The existing article handled the shift&reduce topic by later working with a separate 2-rule grammar. But the actual table for that second example is not shown. In the alternate combined grammar examples I've considered, the cases needing lookahead had two separate precedence levels for products and sums. Is there some simple 1-level expression grammar that does need lookahead?
2) The example '1 + 1' has no multiple-symbol reductions other than the final one at the end of the entire input. The article should give an example where the parse stack is clearly acting like a stack. Also, with reductions reaching back to a state that was not already on the top of the stack. This now occurs at the end, but that can give the impression that this happens only then. It is somewhat helpful if there is some multiple-symbol reduction before reaching the end of the entire input. That could be done with example "1 + 1 + 1" or "1*1 + 1" or "-1 + 1".
3) Having both * and + at the same precedence level is counter to actual common practice. This is minor, and could be fixed by using - instead of *.
DBSand (talk) 15:50, 13 May 2012 (UTC)
The example Action/Goto table should show the meaning of each state (ie its unclosed item set), not just the state number.
The example parsing steps chart should have the stack (symbols & state#s) as left column, the decoded top state (ie its unclosed item set) as second column, the lookahead and unscanned input as third column, and parser action as fourth column. The action column should show the actual grammar rule being applied, not just a rule number. The 'output stream' column should be discarded; it is just the column of applied rules.
DBSand (talk) 06:57, 3 May 2012 (UTC)
This Prolog implementation belongs in some separate specialty article for Prolog fans, not here. The backtracking pattern-matching mechanism underlying Prolog is much more complicated, slower, and opaque than LR's own deterministic, linear pattern matching methods. Using Prolog to explain LR does not explain why LR is fast, or why LR grammars must be deterministic. Other editors will not know how to correctly update the Prolog code if the article's main example changes. And finally, Prolog is unfamiliar to most readers. I doubt very much that 99% of readers of this article will get any understanding from a prolog description of the LR tables and parser algorithm.
An imperative description of the parser's inner loop, in some widely understood language, with helpful names, could be helpful. DBSand (talk) 06:57, 3 May 2012 (UTC)
The goal of this major revision was to make LR parsers better understood by programmers who are users of LR parser generators, and give an intuitive understanding of the generator's work and parser's work without going into the mathematical details and formalisms of a parsing theory textbook. Where possible, the parser's actions are explained in terms of the basic shift-reduce framework, or by showing a concrete realistic (but small) example. The progression from concrete actions, to tables, to state generation, to theory, will likely feel backwards to some. I hope this ordering is helpful to beginners. The example input and example grammar are bigger than before. None of the existing commentary based on the old examples was retained as is. The prolog implementation was similarly dropped.
I welcome feedback from previous editors. I wonder if the old and new articles should both remain visible for awhile?
DBSand (talk) 06:00, 9 May 2012 (UTC)
I reverted this major change and restored the main LR parser article back to its original form. The proposed replacement article is instead at a temporarily-separate article User:DBSand/LR parser/Rewrite. This separate location allows the original article to continue evolving incrementally for awhile without trying to share one Wikipedia change history before a final merge. Please review the reworked article for correctness, omissions, etc and suggest ways to merge the two articles together. Thanks! DBSand (talk) 19:16, 13 May 2012 (UTC)
My plan to support separate edits of the original and replacement articles has hit a snag. Some Wikipedia admins unconditionally toss all new articles that look like forks, into userspace, and suggest that further work occur there in some individual's userspace until the article is stable and no-one resists it replacing the original. But regular links from mainspace articles to personal space are immediately deleted. Other admins strongly recommend that any multi-author articles get edited only in Wikipedia main space, so that Wikipedia has a complete legal record of who had personal copyrights when the article text entered main space and was revised. So collaborative editing in user space is impractical. DBSand (talk) 01:06, 14 May 2012 (UTC)
The new plan is to put the new article online and visible, with the example contents of the original article retained as another section in the combined article. That will be a starting point for subsequent edits, removing overlaps, using a common style, etc. DBSand (talk) 17:36, 14 May 2012 (UTC)
Please don't change the article. It is well written and understandable to non-expert also. Having two examples is important to reinforce the understanding of the subject as one example is not enough. The introduction to the article was also excellent. Do not worry about the length it is very well manageable. Vettukal (talk) 13:51, 8 February 2014 (UTC)
Within minutes of my creating a separate article LR parser/Rewrite in the main Wiki namespace, two Wikipedia admins moved it into a personal namespace User:DBSand. And no links from articles in the main namespace to personal namespaces will persist longer than a day or two. W. has defense mechanisms against what it perceives as permanent forking of articles with similar topic coverage. Don't know how to proceed. I don't want to lock up the original article and prevent it from getting incremental changes as part of a slow merge. DBSand (talk) 19:49, 13 May 2012 (UTC)
The article lacks inline citations, is not structured in accordance with the WP:MOS. These would appear to be major problems. Jezhotwells (talk) 19:00, 13 May 2012 (UTC)
A combined version of the original article and a major rewrite is now online at the LR parser article's main public page. The merging process is incomplete. The intro, theory, and reference sections are similar to before. All other content from the original article is now section "Additional Examples"; this section needs further adjustments.
The new section LR Tutorial is aimed for beginner readers, and to programmers who are users of LALR generators but are not concerned with the details of LR construction. The section LR Generator Analysis is aimed to users who need understanding of grammar conflict complaints from their generator. Both sections are built around a single example. The example and grammar are bigger than that of the original article, so that 1) the table has LR(0) conflicts as in most SLR(1) and LALR(1) grammars, and 2) some multi-symbol parse stack reductions occur before the end of input. DBSand (talk) 20:30, 14 May 2012 (UTC)
In the section "Architecture of LR Parsers", now removed,
In the section "General Case", now removed,
The diagram "Architecture of a table-based bottom-up parser" could be improved by showing the symbols & parse subtrees on the stack, not just the state numbers of the 1+1 example; and by showing the stack growing rightwards, not upwards; and by omitting the output line.
The metacode description of the LR parser loop should move upwards into the main tutorial section.
DBSand (talk) 22:50, 20 May 2012 (UTC)
The original and new sections use different styles for
These notational & terminology differences should eventually be eliminated.
DBSand (talk) 18:20, 22 May 2012 (UTC)
The original and new sections use different styles for
The use of "you" in the article is a violation of WP:TONE. We should remove all these "you"s. I propose we also remove the paragraph starting with "When using an LR parser within some larger program, ...". — Kennyluck (talk) 10:17, 5 January 2013 (UTC)
what are the pink dots denoting?! — Preceding unsigned comment added by 89.134.223.11 (talk) 06:53, 1 July 2013 (UTC)
The item E → E • + B, for example, indicates that the parser has recognized a string corresponding with E on the input stream and now expects to read a '+' followed by another string corresponding with B.
Is there any specific reason for using pink (or any other color)? I was having a hard time discerning whether it was a speck on my LED display, while reading this page in non-optimal lighting conditions. I imagine it would be a "bad" color from a usability POV. I will watch here, and unless I see any arguments I will probably change it to at least a darker color. --Lasse Hillerøe Petersen (talk) 22:18, 5 November 2014 (UTC)
The third section of the article in its current form references a grammar as an example that is labeled explicitly as LR(0), but entirely isn't prefix-free. You can easily create "1 + 1" (E → E + B → B + B → 1 + 1) which is, of course, a prefix of "1 + 1 + 1" (E → E + B → E + B + B → B + B + B → 1 + 1 + 1). Yet that should not be possible since LR(0)-grammars are supposed to be prefix free. [1]
Or am I missing something here?
03:29, 19 January 2014 (UTC)
Yeah, it's "1+1$" and "1+1+1$". The termination symbol is what makes it prefix-free. Basically, this is a confusion arising from the fact that it is possible to convert an arbitrary formal language on N letters to a prefix-free formal language on N+1 letters by adding the termination symbol. In this article, this is called "augmenting the grammar" which is why there is an extra production whose only job is to accept the start symbol, which in this case is E. So in your productions, the initial step is missing, so you would write "1+1$" (__start__ → E + "$" → E + B + "$" → ...). Hope this helps. 173.239.78.54 (talk) 21:07, 16 February 2014 (UTC)
References
Currently the introduction ends with the sentences "LR is also better at error reporting. It detects syntax errors as early in the input stream as possible.", comparing LR to LL parsers. However, LL(1) parsers have the immediate error detection property. Strong LL(1) parsers only have the valid prefix property by default, but a small change to the algorithm can restore the IED property. The literature strongly implies that LL(k) has the IED property for all k, though I haven't found anywhere that states that outright. I think the statement should be removed, especially in light of the fact that the LL parser page only shows a pure LL(1) parser, which is likely to result in anyone reading both pages to think that the example parser has worse error detection properties than an LR parser, which is not correct. Klkblake (talk) 11:05, 24 February 2016 (UTC)
According to the linked articles, CYK algorithm#lead, Earley parser#lead, and GLR parser#Advantages, they all have O(n3) worst case complexity. So they can't need exponential time. Conversely, an algorithm that does need exponential time due to bad guesses doesn't deserve any of the threee algorithm names. - Jochen Burghardt (talk) 13:51, 1 April 2016 (UTC)
Some registered user might want to further explain - or define somewhere, that whenever the article talks about 'symbols', tokens and not necessarily single chars are meant. This might confuse some beginners. --2A02:8070:9288:6B00:6515:51E4:6551:5F5A (talk) 22:14, 27 February 2017 (UTC)
Hello fellow Wikipedians,
I have just modified one external link on LR parser. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template ((source check))
(last update: 5 June 2024).
Cheers.—InternetArchiveBot (Report bug) 15:12, 14 December 2017 (UTC)
In the article it is claimed that "LR is also better at error reporting. It detects syntax errors as early in the input stream as possible.", a claim that is disputed in this Stack Overflow answer. I'm not an expert on the topic, but I'm more inclined to believe a well-voted-up SO answer than an unsourced claim on wikipedia. Should these two sentences be removed?
The article immediately begins talking about `Product` and `Sum` productions without ever having defined the grammar whose sentences we are parsing. This makes it extremely confusing to follow the diagrams and explanation of the algorithm. It would be useful to precisely and clearly state the grammar that is being parsed at the outset. — Preceding unsigned comment added by 173.177.218.122 (talk) 13:58, 11 March 2022 (UTC)