{"id":45,"date":"2006-09-08T22:46:16","date_gmt":"2006-09-08T20:46:16","guid":{"rendered":"http:\/\/ken.friislarsen.net\/blog\/2006\/09\/08\/recursive-decent-parsers-in-c\/"},"modified":"2010-07-16T09:43:24","modified_gmt":"2010-07-16T07:43:24","slug":"recursive-descent-parsers-in-c","status":"publish","type":"post","link":"http:\/\/ken.friislarsen.net\/blog\/2006\/09\/08\/recursive-descent-parsers-in-c\/","title":{"rendered":"Recursive Descent Parsers in C#"},"content":{"rendered":"<p><a href=\"http:\/\/www.dina.kvl.dk\/~sestoft\">Peter Sestoft<\/a> and I have written a <a href=\"http:\/\/www.itu.dk\/people\/kfl\/parsernotes.pdf\">note about how to write scanners and parsers in C#<\/a>. The note is based on earlier versions for SML and Java.<\/p>\n<p>The note contains an thorough introduction to grammars on Backus&#8211;Naur form (BNF).  This includes a description of properties your grammar should have so that it can be mechanically translated to a program.  And also some prescriptions about how to transform your grammar so that it has the desired properties.  In technical terms, the note describe how you can check that your grammar is an LL(1) grammar, and if your grammar is not an LL(1) grammar, we give your some tricks that will usually transform the grammar into an LL(1) grammar.<\/p>\n<p>The parsers you write using our method are recursive descent parsers.  For the scanners, however, we just use an add-hoc method.  Both parsers and scanners makes good use of the .NET framework.  For instance, the scanners creates a token stream from a <code>TextReader<\/code>.  Hence, the scanners can be used to scan both files and strings.  Likewise, a token stream is represented as <code>IEnumerable&lt;Token&gt;<\/code> and scanners uses <code>yield<\/code> to create this token stream.  Thus, creating the token stream lazily.<\/p>\n<p>To give an example, the simplest scanner presented in the note is the following scanner:<\/p>\n<pre>\r\nusing TokenStream = System.Collections.Generic.IEnumerator&lt;Token&gt;;\r\n\r\nclass ZeroOneScan : IScanner {\r\n  public TokenStream Scan(TextReader reader) {\r\n    while ( reader.Peek() != -1 ) {\r\n      if ( Char.IsWhiteSpace((char) reader.Peek()) )\r\n        reader.Read();\r\n      else\r\n        switch(reader.Read()) {\r\n        case '-': yield return Token.FromKind(Kind.SUB); break;\r\n        case '0': yield return Token.FromKind(Kind.ZERO); break;\r\n        case '1': yield return Token.FromKind(Kind.ONE); break;\r\n        default: throw new ApplicationException(\"Illegal character\");\r\n        }\r\n    }\r\n    yield return Token.FromKind(Kind.EOF);\r\n  }\r\n}\r\n<\/pre>\n<p>I find this use of <code>yield<\/code> quite elegant.<\/p>\n<p>Working with this note and getting my name on it has special meaning to me.  A precursor note for SML was actually the note Peter used when he taught me for the first time many years ago (on my second semester at university). Over the years, I have returned to the note many times when I have needed to parse a small language that did not warrant the use of a parser generator, or when a generated parser would have been inconvenient to use because the text to be scanned did not come from a file stream (modern parser generators will not generate parsers with this problem).<\/p>\n<p>The note ends a bit to early, in my opinion.  I would like extend the note to cover Extended BNF.  And I would also like to cover parser combinators.  Well, one day when time permits&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Peter Sestoft and I have written a note about how to write scanners and parsers in C#. The note is based on earlier versions for SML and Java. The note contains an thorough introduction to grammars on Backus&#8211;Naur form (BNF). This includes a description of properties your grammar should have so that it can be <a class=\"read-more\" href=\"http:\/\/ken.friislarsen.net\/blog\/2006\/09\/08\/recursive-descent-parsers-in-c\/\">[&hellip;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,4,3],"tags":[],"class_list":["post-45","post","type-post","status-publish","format-standard","hentry","category-csharp","category-coding","category-general"],"_links":{"self":[{"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/posts\/45","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/comments?post=45"}],"version-history":[{"count":2,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/posts\/45\/revisions"}],"predecessor-version":[{"id":90,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/posts\/45\/revisions\/90"}],"wp:attachment":[{"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/media?parent=45"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/categories?post=45"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ken.friislarsen.net\/blog\/wp-json\/wp\/v2\/tags?post=45"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}