SILex & lexing

posted in Programmology
Published January 14, 2009
Advertisement
I've been playing around with SILex recently. It's a very sweet lexical analyzer, essentially a port of Lex to Scheme. Now, I'm no experienced lexer (never even used Lex), so I'm not going to compare Lex and SILex. But I do want to write about something cool SILex let me do.

I need to write an FFI (foreign function interface) for each C library I want to use from Scheme. Possibly the most important one is OpenGL. To give you a taste of the FFI I'm working with in Gambit-C Scheme, here's an example:

(define glRotatef    (c-lambda (GLfloat GLfloat GLfloat GLfloat) void "glRotatef"))


This lets me call `glRotatef' from Scheme. Now, if you think writing this for every single C function would be monotonous, repetitive, and error-prone... you're right! There's no reason why we can't automatically generate this. Well, there are some difficulties with FFI's in general, but we can get close.

The most straight-forward way is to parse the OpenGL header, gl.h, and generate something of worth from it. Technically, we will first "lexically analyze" the text in the file, and then "parse" the tokens into something we can use.

Lexical Analysis



Lets use SILex for some lexing. There's a postscript manual for SILex in the download if you want to know more about it. From what I can tell, however, it's remarkably like Lex in the way it works, so you'll recognize this if you're familiar with Lex. Lex's manual describes the basic process.

You give SILex a specification file which contains a mapping of regular expressions to tokens. It then uses this to generate a "lexer" program, or something that takes a program as input and produces a set of tokens.

In order to focus more on lexing/parsing, lets simply try to parse all of the preprocessor constants defined in the OpenGL header gl.h. You should be able to parse all of the function definitions too, but that involves a few tricky nuances that would just be distracting.

I knew that Chicken Scheme already has a general FFI generator, and I noticed that it's using SILex as well. So I grabbed it's SILex specification file which tokenizes C and parts of C++/Objective-C and started there. Here it is:

; easyffi.l -*- Scheme -*-;; Copyright (c) 2000-2004, Felix L. Winkelmann; All rights reserved.;; Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following; conditions are met:;;   Redistributions of source code must retain the above copyright notice, this list of conditions and the following;     disclaimer. ;   Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following;     disclaimer in the documentation and/or other materials provided with the distribution. ;   Neither the name of the author nor the names of its contributors may be used to endorse or promote;     products derived from this software without specific prior written permission. ;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS; OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY; AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR; CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR; CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR; SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY; THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR; OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE; POSSIBILITY OF SUCH DAMAGE.;; Send bugs, suggestions and ideas to: ;; felix@call-with-current-continuation.org;; Felix L. Winkelmann; Unter den Gleichen 1; 37130 Gleichen; Germanyletter    [a-zA-Z]digit     [0-9]digit16   [0-9a-fA-F]digit8    [0-7]space     [ ]%%\\[ ]                              (yycontinue)\\?\13?\n                          (if pp-mode                                       (begin                                          (set! pp-mode #f) 'pp-end)                                       (yycontinue) ){space}+                           (yycontinue)\9+                                (yycontinue)\13+                               (yycontinue)\12+                               (yycontinue)"//"                               (let loop ()                                     (let ([c (yygetc)])                                       (if (or (eq? 'eof c) (char=? #\newline c))                                           (begin                                             (if pp-mode                                                 (begin                                                   (set! pp-mode #f)                                                   'pp-end)                                                 (yycontinue) ) )                                           (loop) ) ) )"/*"                               (let loop ([c (yygetc)])                                     (cond [(eq? 'eof c) (parsing-error "unexpected end of comment")]                                           [(char=? #\newline c) (loop (yygetc))]                                           [(char=? c #\*)                                            (let ([c2 (yygetc)])                                              (if (eq? #\/ c2)                                                  (yycontinue)                                                  (loop c2) ) ) ]                                           [else (loop (yygetc))] ) )"enum"                             'enum"typedef"                          'typedef"extern"                           'extern"static"                           'static"___fixnum"                         'fixnum"___number"                         'number"___symbol"                         'symbol"___bool"                           'bool"___pointer"                       'pointer"___u32"                           'u32"___s32"                           's32"___s64"                           's64"int64_t"                          's64"__int64"                          's64"bool"                             'bool"___safe"                           'callback"___declare"                        'declare"___scheme_value"                   'scheme-value"___scheme_pointer"                 'scheme-pointer"___byte_vector"                    'byte-vector"C_word"                            'scheme-value"___abstract"                       'abstract"___specialize"                     'specialize"___byte"                           'byte"___discard"                        'discard"___in"                            'in"___out"                           'out"___inout"                         'inout"___mutable"                       'mutable"___length"                        'length"size_t"                           'size_t"int"                              'int"unsigned"                         'unsigned"signed"                           'signed"float"                            'float"double"                           'double"short"                            'short"long"                             'long"char"                             'char"void"                             'void"struct"                           'struct"union"                            'union"const"                            'const"class"                            'class"public"                           'public"protected"                        'protected"private"                          'private"volatile"                         'volatile"namespace"                        'namespace"virtual"                          'virtual"explicit"                         'explicit"inline"                           'inline"using"                            'using"@interface"                       'interface"@implementation"                  'implementation"@end"                             'end"@class"                           'objc-class"@protocol"                        'protocol"@public"                          'objc-public"@protected"                       'objc-protected"@private"                         'objc-private"@encode"                          (list 'id "@encode")"@defs"                            (list 'id "@defs")"@selector"                        (list 'id "@selector")"..."                              'dots^[ \t]*#[ ]*define                        (begin (set! pp-mode 'define) 'pp-define)^[ \t]*#[ ]*include                       (begin (set! pp-mode 'include) 'pp-include)^[ \t]*#[ ]*import                        (begin (set! pp-mode 'import) 'pp-import)^[ \t]*#[ ]*ifdef                         (begin (set! pp-mode #t) 'pp-ifdef)^[ \t]*#[ ]*ifndef                        (begin (set! pp-mode #t) 'pp-ifndef)^[ \t]*#[ ]*elif                          (begin (set! pp-mode #t) 'pp-elif)^[ \t]*#[ ]*if                            (begin (set! pp-mode #t) 'pp-if)^[ \t]*#[ ]*else                          (begin (set! pp-mode #t) 'pp-else)^[ \t]*#[ ]*pragma                        (begin (set! pp-mode #t) 'pp-pragma)^[ \t]*#[ ]*endif                         (begin (set! pp-mode #t) 'pp-endif)^[ \t]*#[ ]*error                         (begin (set! pp-mode #t) 'pp-error)^[ \t]*#[ ]*undef                         (begin (set! pp-mode #t) 'pp-undef)#                                  '(op "#")"if"                               'if"else"                             'else@?\"                               (let loop ([cs '()])                                     (let ([c (yygetc)])                                       (cond [(eq? 'eof c)                                              (parsing-error "unexpected end of string constant")]                                             [(char=? c #\\) (loop (cons (yygetc) cs))]                                             [(char=? c #\")                                              (list 'string (list->string (reverse cs))) ]                                             [else (loop (cons c cs))] ) ) )\'\\{digit}{digit}{digit}\'        (list 'char (string->number (substring yytext 3 5) 8))\'\\0\'                            '(char #\nul)\'\\a\'                            '(char #\alarm)\'\\b\'                            '(char #\backspace)\'\\f\'                            '(char #\page)\'\\n\'                            '(char #\newline)\'\\r\'                            '(char #\return)\'\\t\'                            '(char #\tab)\'\\v\'                            '(char #\vtab)\'\\.\'                            (list 'char (string-ref yytext 2))\'.\'                              (list 'char (string-ref yytext 1))({letter}|_)({letter}|_|{digit})*  (list 'id yytext)0(x|X){digit16}+                   (list 'num (string->number (substring yytext 2 (string-length yytext)) 16))0{digit8}+                         (list 'num (string->number (substring yytext 1 (string-length yytext)) 8))[-+]?{digit}+(\.{digit}*)?([eE][-+]?{digit}+)?                                     (list 'num (string->number yytext))"<"                                (if (eq? pp-mode 'include)                                        (let loop ()<br>                                          (let ([c (yygetc)])<br>                                           (cond [(eq? 'eof c) (parsing-error <span class="cpp-literal">"unexpected end of include file name"</span>)]<br>                                                 [(<span class="cpp-keyword">char</span>=? #\> c)<br>                                                  (set! pp-mode #f)<br>                                                  `(i-string ,(list->string (reverse s))) ]<br>                                                 [<span class="cpp-keyword">else</span> (loop (cons c s))] ) ) ) <br>                                        `(op <span class="cpp-literal">"<"</span>) )<br><span class="cpp-literal">"("</span>                                'open-paren<br><span class="cpp-literal">")"</span>                                'close-paren<br><span class="cpp-literal">"["</span>                                'open-bracket<br><span class="cpp-literal">"]"</span>                                'close-bracket<br><span class="cpp-literal">"{"</span>                                'open-curly<br><span class="cpp-literal">"}"</span>                                'close-curly<br><span class="cpp-literal">","</span>                                'comma<br><span class="cpp-literal">";"</span>                                'semicolon<br><span class="cpp-literal">"*"</span>                                'star<br><span class="cpp-literal">"."</span>|<span class="cpp-literal">"+="</span>|<span class="cpp-literal">"-="</span>|<span class="cpp-literal">">>="</span>|<span class="cpp-literal">"<<="</span>|<span class="cpp-literal">"*="</span>|<span class="cpp-literal">"/="</span>|<span class="cpp-literal">"%="</span>|<span class="cpp-literal">"%"</span>|<span class="cpp-literal">"&="</span>|<span class="cpp-literal">"|="</span>|<span class="cpp-literal">"^="</span>|<span class="cpp-literal">"+"</span>|<span class="cpp-literal">"-"</span>|<span class="cpp-literal">"/"</span>|<span class="cpp-literal">">="</span>|<span class="cpp-literal">"<="</span>|<span class="cpp-literal">"=="</span>|<span class="cpp-literal">"<<"</span>|<span class="cpp-literal">">>"</span>|<span class="cpp-literal">"&&"</span>|<span class="cpp-literal">"||"</span>|<span class="cpp-literal">"&"</span>|<span class="cpp-literal">"|"</span>|<span class="cpp-literal">">"</span>|<span class="cpp-literal">"<"</span>|<span class="cpp-literal">"^"</span>|<span class="cpp-literal">"~"</span>|<span class="cpp-literal">"?"</span>|<span class="cpp-literal">"::"</span>|<span class="cpp-literal">":"</span>|<span class="cpp-literal">"="</span>|<span class="cpp-literal">"!="</span>|<span class="cpp-literal">"!"</span><br>                                     (list 'op yytext)<br><<EOF>>                            (begin (set! pp-mode #f) 'stop)<br><<ERROR>>                          (lexer-error (yygetc))<br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>The chunks of code may be a bit confusing.  The second statement of each entry is a piece of Scheme code which, when evaluated at a specific point in the text, will produce the appropriate token.<br><br>So now we're ready to lex!  First, we need to generate our lexing program.<br><br><pre><br>Gambit v4.3.2<br><br>> (include "silex.scm")<br>> (lex "autoffi.l" "autoffi.scm")<br>#t<br></pre><br><br>This took in the specification file "autoffi.l" and generated "autoffi.scm".  Now we can use this program for lexing C code, such as gl.h!<br><br>Here's a simple program using our lexer:<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br>(include <span class="cpp-literal">"autoffi.scm"</span>)<br><br>;; Token generators depend on these definitions<br>(define pp-mode #t)<br>(define (lexer-error c)<br>  (display <span class="cpp-literal">"*** ERROR *** invalid token: "</span>)<br>  (write c)<br>  (newline)<br>  (exit <span class="cpp-number">1</span>))<br><br>(define (lex-gl output-port)<br>  (lexer-init 'port (open-file <span class="cpp-literal">"gl.h"</span>))<br>  (let loop ()<br>    (let ((tok (lexer)))<br>      (write tok output-port)<br>      (newline)<br>      (<span class="cpp-keyword">if</span> (not (eq? tok 'stop))<br>          (loop)))))<br><br>(lex-gl (current-output-port))<br><br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>It basically opens the file "gl.h" and passes it off to the C lexer.  Then we read all of the tokens and write them to standard output.  The output is this:<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br>pp-ifndef<br>(id <span class="cpp-literal">"__gl_h_"</span>)<br>pp-define<br>(id <span class="cpp-literal">"__gl_h_"</span>)<br>pp-ifdef<br>(id <span class="cpp-literal">"__cplusplus"</span>)<br><span class="cpp-keyword">extern</span><br>(string <span class="cpp-literal">"C"</span>)<br>open-curly<br>pp-endif<br>pp-end<br><span class="cpp-keyword">typedef</span><br><span class="cpp-keyword">unsigned</span><br><span class="cpp-keyword">int</span><br>(id <span class="cpp-literal">"GLenum"</span>)<br>semicolon<br><span class="cpp-keyword">typedef</span><br><span class="cpp-keyword">unsigned</span><br><span class="cpp-keyword">char</span><br>(id <span class="cpp-literal">"GLboolean"</span>)<br>semicolon<br><span class="cpp-keyword">typedef</span><br><span class="cpp-keyword">unsigned</span><br><span class="cpp-keyword">int</span><br><br>… snip …<br><br>pp-define<br>(id <span class="cpp-literal">"GL_POLYGON_MODE"</span>)<br>(num <span class="cpp-number">2880</span>)<br>pp-define<br>(id <span class="cpp-literal">"GL_POLYGON_SMOOTH"</span>)<br>(num <span class="cpp-number">2881</span>)<br>pp-define<br>(id <span class="cpp-literal">"GL_POLYGON_STIPPLE"</span>)<br>(num <span class="cpp-number">2882</span>)<br>pp-define<br><br>… snip …<br><br>(id <span class="cpp-literal">"glGetTexParameteriv"</span>)<br>open-paren<br>(id <span class="cpp-literal">"GLenum"</span>)<br>(id <span class="cpp-literal">"target"</span>)<br>comma<br>(id <span class="cpp-literal">"GLenum"</span>)<br>(id <span class="cpp-literal">"pname"</span>)<br>comma<br>(id <span class="cpp-literal">"GLint"</span>)<br>star<br>(id <span class="cpp-literal">"params"</span>)<br>close-paren<br>semicolon<br><span class="cpp-keyword">extern</span><br><span class="cpp-keyword">void</span><br>(id <span class="cpp-literal">"glHint"</span>)<br>open-paren<br>(id <span class="cpp-literal">"GLenum"</span>)<br>(id <span class="cpp-literal">"target"</span>)<br>comma<br>(id <span class="cpp-literal">"GLenum"</span>)<br>(id <span class="cpp-literal">"mode"</span>)<br>close-paren<br>semicolon<br><br>… (thousands of more lines, of course)<br><br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>That's great!  We have "lexed" the OpenGL header gl.h (I just facepalmed myself for using the word "lexed").<br><br><h2>Parsing</h2><br><br>Now we need to write a parser which will take the tokens and construct somewhat of an AST for us.  Since we're only interested in the preprocessor constants, it's pretty easy.  Here's what I came up with.  This program will pick out most preprocessor statements (and typedefs too, just for fun):<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br><br>(define (parser-error err)<br>  (display err)<br>  (exit <span class="cpp-number">1</span>))<br><br>(define (parse input-port output-port)<br>  (define (writer node)<br>    (write node output-port)<br>    (newline))<br>  (let loop ((mode #f) (tokens '()))<br>    (let ((t (read input-port)))<br>      (<span class="cpp-keyword">case</span> t<br>        ((pp-end)<br>         (<span class="cpp-keyword">if</span> (pair? tokens)<br>             (writer (reverse tokens))<br>             (parser-error <span class="cpp-literal">"invalid preprocessor statement: pp-end"</span>))<br>         (loop #f '()))<br>        ((pp-define pp-include pp-<span class="cpp-keyword">if</span><br>                    pp-ifdef pp-ifndef<br>                    pp-<span class="cpp-keyword">else</span> pp-endif<br>                    pp-undef pp-import<br>                    pp-pragma pp-error)<br>         (loop 'pp (list t)))<br>        ((<span class="cpp-keyword">typedef</span>)<br>         (loop '<span class="cpp-keyword">typedef</span> (list t)))<br>        ((semicolon)<br>         (<span class="cpp-keyword">if</span> (not (null? tokens))<br>             (writer (reverse tokens)))<br>         (loop #f '()))<br>        ((stop)<br>         #t)<br>        (<span class="cpp-keyword">else</span><br>         (loop mode (<span class="cpp-keyword">if</span> mode<br>                        (cons t tokens)<br>                        tokens)))))))<br><br>(parse (current-input-port)<br>       (current-output-port))<br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>Running this program, given the input of our lexing program, spits out:<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br>(pp-ifndef (id <span class="cpp-literal">"__gl_h_"</span>))<br>(pp-define (id <span class="cpp-literal">"__gl_h_"</span>))<br>(pp-ifdef (id <span class="cpp-literal">"__cplusplus"</span>))<br>(pp-endif)<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">int</span> (id <span class="cpp-literal">"GLenum"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">char</span> (id <span class="cpp-literal">"GLboolean"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">int</span> (id <span class="cpp-literal">"GLbitfield"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">signed</span> <span class="cpp-keyword">char</span> (id <span class="cpp-literal">"GLbyte"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">short</span> (id <span class="cpp-literal">"GLshort"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">int</span> (id <span class="cpp-literal">"GLint"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">int</span> (id <span class="cpp-literal">"GLsizei"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">char</span> (id <span class="cpp-literal">"GLubyte"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">short</span> (id <span class="cpp-literal">"GLushort"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">unsigned</span> <span class="cpp-keyword">int</span> (id <span class="cpp-literal">"GLuint"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">float</span> (id <span class="cpp-literal">"GLfloat"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">float</span> (id <span class="cpp-literal">"GLclampf"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">double</span> (id <span class="cpp-literal">"GLdouble"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">double</span> (id <span class="cpp-literal">"GLclampd"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">void</span> (id <span class="cpp-literal">"GLvoid"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">long</span> (id <span class="cpp-literal">"GLintptr"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">long</span> (id <span class="cpp-literal">"GLsizeiptr"</span>))<br>(pp-ifndef (id <span class="cpp-literal">"GL_TYPEDEFS_2_0"</span>))<br>(pp-define (id <span class="cpp-literal">"GL_TYPEDEFS_2_0"</span>))<br>(<span class="cpp-keyword">typedef</span> <span class="cpp-keyword">char</span> (id <span class="cpp-literal">"GLchar"</span>))<br>(pp-endif)<br>(pp-ifndef (id <span class="cpp-literal">"GL_GLEXT_LEGACY"</span>))<br>(pp-endif)<br>(pp-define (id <span class="cpp-literal">"GL_LOGIC_OP"</span>) (id <span class="cpp-literal">"GL_INDEX_LOGIC_OP"</span>))<br>(pp-define (id <span class="cpp-literal">"GL_TEXTURE_COMPONENTS"</span>) (id <span class="cpp-literal">"GL_TEXTURE_INTERNAL_FORMAT"</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_1_1"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_1_2"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_1_3"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_1_4"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_1_5"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_2_0"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_VERSION_2_1"</span>) (num <span class="cpp-number">1</span>))<br>(pp-define (id <span class="cpp-literal">"GL_ACCUM"</span>) (num <span class="cpp-number">256</span>))<br>(pp-define (id <span class="cpp-literal">"GL_LOAD"</span>) (num <span class="cpp-number">257</span>))<br>(pp-define (id <span class="cpp-literal">"GL_RETURN"</span>) (num <span class="cpp-number">258</span>))<br>(pp-define (id <span class="cpp-literal">"GL_MULT"</span>) (num <span class="cpp-number">259</span>))<br>(pp-define (id <span class="cpp-literal">"GL_ADD"</span>) (num <span class="cpp-number">260</span>))<br>(pp-define (id <span class="cpp-literal">"GL_NEVER"</span>) (num <span class="cpp-number">512</span>))<br>(pp-define (id <span class="cpp-literal">"GL_LESS"</span>) (num <span class="cpp-number">513</span>))<br>(pp-define (id <span class="cpp-literal">"GL_EQUAL"</span>) (num <span class="cpp-number">514</span>))<br>(pp-define (id <span class="cpp-literal">"GL_LEQUAL"</span>) (num <span class="cpp-number">515</span>))<br>(pp-define (id <span class="cpp-literal">"GL_GREATER"</span>) (num <span class="cpp-number">516</span>))<br>(pp-define (id <span class="cpp-literal">"GL_NOTEQUAL"</span>) (num <span class="cpp-number">517</span>))<br>(pp-define (id <span class="cpp-literal">"GL_GEQUAL"</span>) (num <span class="cpp-number">518</span>))<br><br>… snip …<br><br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>Now we're getting somewhere.  This isn't quite an AST, but lets call it that.  Now we need to analyze the AST and generate our preprocessor constants interface.  We'll need a program that identifies tokens and their sub-parts, and is able to pull data out of each token.  Here's this program:<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br><br>(define (parser-error . args)<br>  (<span class="cpp-keyword">for</span>-each<br>   (lambda (x)<br>     (<span class="cpp-keyword">if</span> (not (string? x))<br>         (write x)<br>         (display x)))<br>   args)<br>  (newline)<br>  #f)<br><br>(define (num-token? token)<br>  (and (pair? token)<br>       (eq? (car token) 'num)<br>       (eq? (length token) <span class="cpp-number">2</span>)))<br><br>(define (make-num token)<br>  (cadr token))<br><br>(define (id-token? token)<br>  (and (pair? token)<br>       (eq? (car token) 'id)<br>       (eq? (length token) <span class="cpp-number">2</span>)))<br> <br>(define (make-id token)<br>  (string->symbol (cadr token)))<br><br>(define (constant-token? token)<br>  (and (pair? token)<br>       (eq? (car token) 'pp-define)<br>       (eq? (length token) <span class="cpp-number">3</span>)))<br><br>(define (make-constant-expr token)<br>  (let ((id-token (cadr token))<br>        (val-token (caddr token)))<br>    (<span class="cpp-keyword">if</span> (not (id-token? id-token))<br>        (parser-error <span class="cpp-literal">"invalid id: "</span> id-token)<br>        (let ((id (make-id id-token))<br>              (val (cond<br>                    ((id-token? val-token) (make-id val-token))<br>                    ((num-token? val-token) (make-num val-token))<br>                    (<span class="cpp-keyword">else</span> (parser-error<br>                           <span class="cpp-literal">"invalid constant value: "</span><br>                           val-token)))))<br>          (and val<br>               `(define ,id ,(<span class="cpp-keyword">if</span> (symbol? val)<br>                                 `(lambda () ,val)<br>                                 val)))))))<br><br>(define (parse input-port output-port)<br>  (let loop ()<br>    (let ((token (read input-port)))<br>      (<span class="cpp-keyword">if</span> (constant-token? token)<br>          (begin<br>            (write (make-constant-expr token) output-port)<br>            (newline)))<br>      (<span class="cpp-keyword">if</span> (not (eq? token #!eof))<br>          (loop)))))<br><br>(parse (current-input-port)<br>       (current-output-port))<br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>Now, give the output of our parser to this, and this generates:<br><br><!–STARTSCRIPT–><!–source lang="cpp"–><div class="source"><pre><br>(define GL_LOGIC_OP (lambda () GL_INDEX_LOGIC_OP))<br>(define GL_TEXTURE_COMPONENTS (lambda () GL_TEXTURE_INTERNAL_FORMAT))<br>(define GL_VERSION_1_1 <span class="cpp-number">1</span>)<br>(define GL_VERSION_1_2 <span class="cpp-number">1</span>)<br>(define GL_VERSION_1_3 <span class="cpp-number">1</span>)<br>(define GL_VERSION_1_4 <span class="cpp-number">1</span>)<br>(define GL_VERSION_1_5 <span class="cpp-number">1</span>)<br>(define GL_VERSION_2_0 <span class="cpp-number">1</span>)<br>(define GL_VERSION_2_1 <span class="cpp-number">1</span>)<br>(define GL_ACCUM <span class="cpp-number">256</span>)<br>(define GL_LOAD <span class="cpp-number">257</span>)<br>(define GL_RETURN <span class="cpp-number">258</span>)<br>(define GL_MULT <span class="cpp-number">259</span>)<br>(define GL_ADD <span class="cpp-number">260</span>)<br>(define GL_NEVER <span class="cpp-number">512</span>)<br>(define GL_LESS <span class="cpp-number">513</span>)<br>(define GL_EQUAL <span class="cpp-number">514</span>)<br>(define GL_LEQUAL <span class="cpp-number">515</span>)<br>(define GL_GREATER <span class="cpp-number">516</span>)<br>(define GL_NOTEQUAL <span class="cpp-number">517</span>)<br>(define GL_GEQUAL <span class="cpp-number">518</span>)<br>(define GL_ALWAYS <span class="cpp-number">519</span>)<br><br>… snip …<br><br><br><br><br><br></pre></div><!–ENDSCRIPT–><br><br>And we have a list of OpenGL preprocessor constants available to us in Scheme!  Notice how we aren't actually using any of the FFI mechanisms.  Since all boundaries between Scheme and C land are typed, we can't really pull out preprocessor constants from C.  We aren't trying to make this FFI compatible across versions of header files either.  The point of automatically generating these interfaces is so that it's easier to re-generate them against any version/platform.<br><br>I hope someone else found this as interesting as I did.  I will be working on parsing typedef's and function declarations as well, which is a natural extension to the above programs.  Having such technology will make it almost painless to be working in Scheme, where I'm usually cut off from all of the math/graphics/etc. C libraries out there.<div>


</div>
Previous Entry typing & building
Next Entry A Sad Day
0 likes 0 comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement
Advertisement