December 2, 2016

Entering the World of Macros

In Rebol languages family, macros have been a topic every now and then, and despite one implementation offered, they never became mainstream. Rebol language, being homoiconic, comes already well-equipped for code transformations at run-time with minimal effort. For example, this short script:
    code: [print "hello"]
    code: reduce [code]
    insert code [loop 2]
    probe code
    do code
when evaluated will output:
    [loop 2 [print "hello"]]
    hello
    hello
For a more sophisticated example, see our GUI live-coding in just a few lines.

Moreover, Rebol being interpreted, the cost of transformations is still paid at run-time.

But with Red, it is a different story. Being compiled, in addition to being interpreted, weighs heavily in favor of a preprocessing facility, in order to leverage compile-time code transformations. It is available now (in the master branch), in the form of a preprocessor with macro capabilities.

Let me restate it for the readers not familiar with macros and source preprocessing:

    The goal is to shift some data and code transformations from run-time to compile-time.

This is the key point of the new features in Red, described below.

Design and implementation

The preprocessing (including macro-time) happens between load and compile phases. It has been designed as a separate phase, with a separate execution context (to the extent allowed by current Red semantics, until we get the module! type implemented). Among other benefits, this enforces hygiene in Red macros. The preprocessor directives syntax relies on the familiar `#` prefixed forms (issue! values), meant to visually stand out from regular code, enforcing the idea of a separate layer of processing.

As Red strives to keep the source code accepted by the compiler and interpreter as interchangeable as possible, the preprocessor can also be run by the interpreter, between load and eval phases, supporting exactly the same features as the compiler version. Actually, they even share the same implementation, a unique file, with a double Rebol/Red header, run by Rebol2 for the toolchain and by Red for the interpreter. Though the preprocessor has to preprocess itself and a few core files used by the compiler setup sequence. In order to solve this chicken and egg problem, some directives have been hardcoded in the preprocessor instead of been implemented as macros.

The preprocessing (including macro expansion) is applied to the whole source fed to the compiler (or interpreter) after the load phase, so it is applied on a parse tree (similar to an AST). This means that no information about objects or other contexts built at run-time is available, it is just a big tree of Red values. Further research is planned in future releases to improve that, if possible. An alternative approach relying on interleaving macro expansion with compile/eval phases (late-time expansion) was considered, but not adopted, due to much higher implementation costs and potentially confusing inter-mixed behaviors, making it hard to grasp and be properly used by a majority of users. As we value simplicity utmostly, we will stick to the current simpler model for now, to ensure the broadest possible usage.

Simple macros

Documentation about preprocessing directives like #if, #either, #switch, #case, ... is available, and should feel familiar to Rebol SDK users, so this article will just focus on the real novel part: macro support.

Macros are basically functions, taking source code references as arguments in order to transform that source code. Most of the time, the returned value from the macro is replacing the macro name and argument at the call site in the source code. Though, this default mode can be overriden, and more control can be given to the macro through pattern-matching macros, explained later below.

For sake of brevity, the Red header is omitted from following code examples. You can copy/paste those code snippets in the console, wrapping them in a do expand [...] call or compile them (just add a Red [] header) using a `red -c -s <script.red>` line.

Simple macros can be defined in a similar way to regular functions, though they need to be preceeded by a #macro directive:
    #macro as-KB: func [n][n * 1024]
    print as-KB 64
When the macro is expanded by the preprocessor, the above source code will result in:
    print 65536
This kind of simple macro is called a named macro in Red's jargon. Once the preprocessor has finished his work, no macro definition or call, nor any preprocessor directive remains in the source code (compiler directives defined as issue! values, are still there though).

Remember that macros take source code values as arguments, without evaluating them, so passing unquoted words or paths is fine:
    #macro capitalize: function [value][
        s: uppercase/part form value 1
        either path? value [to-lit-path s][to-lit-word s]
    ]
    print capitalize hello
    print capitalize hello/macro/world
will result in:
    print 'Hello
    print 'Hello/macro/world
As you can see both func and function constructors are accepted for declaring a macro.

One possible use of macros can then be word-aliasing, for example translating words at compile-time (using French words here):
    #macro si:      func []['if]
    #macro affiche: func []['print]
    #macro vrai:    func []['true]

    si vrai [affiche "Vive Red !"]
will result in:
    if true [print "Vive Red !"]
Now, let's get back to the first example and expand (no pun intended) on it:
    #macro make-KB: func [n][n * 1024]
    #macro make-MB: func [n][make-KB make-KB n]
    print make-MB 1
will result in:
    print 1048576
So macros can freely call other macros, the same way functions would. This is possible because macros are running in a special (hidden) context, which can be accessed and extended (indirectly) by the user using the #do directive:
    #do [kilo: 1024]
    #macro make-KB: func [n][n * kilo]
The #do directive accepts arbitrary code, so local words and functions can be defined there freely, and accessed from macros. This gives a pretty powerful programming layer for the preprocessor, though, it is possible to go even further.

Pattern-matching macros

For Red users, the logical next step once having macros (as defined above), is to be able to combine them with one of the Rebol world's jewels: the Parse DSL. Well that is what pattern-matching macros are. ;-) Instead of matching a word as trigger for the macro, you can replace it with a valid Parse rule (which can call sub-rules defined in #do expressions), defining a pattern that will trigger the macro.

We can now improve our `make-KB` macro with a nicer looking form:
    #macro [number! 'KB] func [s e][to-integer s/1 * 1024]
    print 64 KB
    print 2.5 KB
would result in:
    print 65536
    print 2560
In that example, the macro gets called each time a number! followed by the word! KB is encountered in the source code. They will then both get replaced by the macro returned value. The arguments of pattern-matching macros are always the same: one reference to the starting position (s) of the matched pattern, and one reference to the ending position (e).

Here are a few more examples, showing the true power of Red macros:

Variable-arguments macro
    #macro ['max some [integer!]] func [s e][
        first maximum-of copy/part next s e
    ]
    print max 4 2 3 8 1
would result in:
    print 8

Loop macro

A simple loop macro, extending the existing loop function:
    #macro ['loop [integer! | block!] block!] function [[manual] s e][
        set [spec body] next s
        if integer? spec [return e]    ;-- return position past the pattern
        low: high: none
        step: 1

        unless parse spec [
            word! 
            opt '= set low integer!
            opt [opt 'to set high integer!]
            opt [opt 'step set step integer!]
        ][
            print ["*** LOOP syntax error:" spec]
        ]
        new: reduce either high [
            set-var: to-set-word var: spec/1
            cond: compose [(var) <= (high)]
            repend body [set-var var '+ step]
            [set-var low 'while cond body]
        ][
            ['repeat spec/1 low body]
        ]      
        change/part s new e
    ]

    loop 2 [print "x"]
    loop [i 3][print i]
    loop [i 5 8][print i]
    loop [i = 5 to 10 step 2][print i]
would result in:
    loop 2 [print "x"]
    repeat i 3 [print i]
    i: 5 while [i <= 8] [print i i: i + 1]
    i: 5 while [i <= 10] [print i i: i + 2]
This pattern-matching macro relies on the manual mode ([manual] attribute), where the replacement is done by user code and the returned value needs to be the position in the source code where the expansion process resumes. In this case, loop is also an existing function, so when the argument is an integer, the source is left untouched. With a normal macro, the resuming point would have been the beginning of the pattern, resulting in an infinite loop (again, no pun intended). ;-)


Math expressions folding macro

A powerful way to pre-calculate constant math expressions from your source code could be to use a macro like this one:
    #do [
        p:     none
        op:    ['+ | '- | '* | '** | slash]
        paren: [p: paren! :p into expr]
        fun:   [['sine | 'cosine | 'square-root] expr]
        expr:  [[number! | paren] op expr | number! | paren | fun]
    ]

    #macro [expr] func [[manual] s e][
        if all [e = next s number? s/1][return e]  ;-- skip single number
        change/part s do (copy/part s e) e
        s
    ]

    a: 3 + 2 - 8
    print (3 + 4) * 6
    edge: 100 * cosine 60
    hypotenuse: square-root (3 ** 2) + (4 ** 2)
would result in:
    a: -3
    print 42
    edge: 50.0
    hypotenuse: 5.0


HTML validating macro

HTML tags are a first-class datatype in Red so they can be embedded and manipulated by the core language (like string values). The Red lexer will check the syntax but wouldn't it be nice to also have a minimal check the semantics at compile-time? Let's roll a macro for that:
    #do [
        error: function [pos [block! paren!] msg [block!]][
            print [
                "*** HTML error:" reduce msg lf
                "*** at:" copy/part pos 4
            ]
            halt
        ]
    ]

    #macro tag! function [[manual] s e][
        stack: []
        tag: s/1

        either slash = tag/1 [
            last?: (name: next tag) = last stack
            all [
                not last?
                find stack name
                error s ["overlapping tag" tag]
            ]
            if any [empty? stack not last?][
                error s ["no opening tag for" tag]
            ]
            take/last stack
        ][
            if slash <> last tag [    
                if pos: find tag " " [tag: copy/part tag pos]

                unless find s head insert copy tag slash [
                    error s ["no closing tag for" tag]
                ]            
                append stack tag
            ]
        ]
        next s
    ]

    data: [<html><br/><b>msg<b></html>]
    msg: "Red rocks!"
    print data
The embedded HTML above has an error, the <b> tag is not closed. This macro will catch that error and report it properly during the compilation.


A DSL compiler

Red is already very well-suited for DSL creation, though, the cost of interpreting or compiling DSL has always been paid at run-time so far. With macros, it can be moved to compile-time, when suitable. Here is a simple subset of BASIC language, partially compiled to Red code using a single macro:
    #macro do-basic: function [src /local math][
        output: clear []
        lines:  clear []
        value!: [integer! | string! | word!]
        op:     ['+ | '- | '* | slash]
        comp:   ['= | '<> | '< | '> | '<= | '>=]
        line:   [p: integer! (repend lines [p/1 index? tail output])]
        cmd:    [
            token: 'let word! '= value! opt [copy math [op value!]] (
                emit reduce [to-set-word token/2 token/4]
                if math [emit/part math 2]
            )
            | 'if value! comp value! 'then (
                emit/part token 4
                emit/only make block! 1
                parent: output
                output: last parent
            ) cmd (output: parent)
            
            | 'print value! (emit/part token 2)
            | 'goto integer! (
                line: select/skip lines token/2 2
                emit compose [jump: (line)]
            )
        ]
        emit: function [value /only /part n [integer!]][
            if part [value: copy/part value n]            
            either only [append/only output value][append output value]
        ]
        
        unless parse src [some [line cmd]][
            print ["*** BASIC Syntax error at:" mold token]
        ]
        compose/deep [
            (to-set-word 'eval-basic) function [pc [block!]][
                bind pc 'pc
                while [not tail? pc][
                    do/next pc 'pc
                    if jump [
                        pc: at head pc jump
                        jump: none
                    ]
                ]
            ]
            eval-basic [(output)]
        ]
    ]

    do-basic [
        10 LET A = "hi!"
        20 LET N = 3
        30 PRINT A
        40 LET N = N - 1
        50 IF N > 0 THEN GOTO 30
    ]
will result in:
    eval-basic: function [pc [block!]][
        bind pc 'pc
        while [not tail? pc][
            do/next pc 'pc
            if jump [
                pc: at head pc jump
                jump: none
            ]
        ]
    ]
    eval-basic [
        A: "hi!"
        N: 3
        PRINT A
        N: N - 1
        IF N > 0 [jump: 5]
    ]
Note: I tried to keep this example short, so it ends up as a BASIC source code compiler to Red, but fed to a custom interpreter. Fully compiling that DSL to Red code would be possible, but would require more complex constructs, in order to deal with a GOTO able to arbitrarily jump anywhere, and that is beyond the scope of this article.

Final thoughts

The preprocessor and macros bring great new possibilities to the Red compiler, while still being able to run the same code with the interpreter. Though, as the saying goes, with great power comes great responsibility, so keeping in mind some drawbacks would be wise:
  • As there is no difference in Red between code and data, both can be transformed by the same macros, which is not always desirable. Some mechanisms for limiting the application scope exist in the preprocessor, though there is no guarantee they can cover all use-cases. Stay alert, especially with pattern-matching macros.
  • It is not always easy to reason about and debug macros, unless you have some existing experience. I would suggest not using them until you have a good grasp of Red's toolchain, semantics and fundamental concepts (like homoiconicity).
  • As the Red toolchain is currently run by a Rebol2 interpreter, Rebol is running the compile-time macros, so keep that in mind when writing them. If you want them to run equally well on the interpreter, you need to use only the common subset between Rebol2 and Red. Sooner or later we should move compile-time preprocessing to use a Red engine (thanks to libRed), so this concern is temporary.

Last but not least, for those wondering about syntactic macros (aka readers macros) inclusion in Red, as for AST-macros, they are not strictly necessary, as the Parse DSL already provides us a powerful tool to implement pre-load-time processing. Though they could still bring some extra benefits (like embedding the processing logic within the source code), but could also go against the Red-as-data-format principle, or wreak havoc in IDE support (like messing up syntax coloring and step by step debugging). We need more time to go through each concern and see how to deal with them before adding such a feature.

I hope this long article was useful for those of you who had no past experience with macros and entertaining for those who have. Have fun with this brand new toy and let us know what you think about it on Gitter. Cheers!

See comments on /r/programming.

5 comments:

  1. Great work! Can't wait to explore this area further. Kudos for continuing to innovate and push the boundaries of what the original rebol hinted at.

    ReplyDelete
  2. I continue to be amazed at the design effort, keeping the big picture in mind, that goes into every Red feature. Great stuff.

    ReplyDelete
  3. Rudolf Meijer (meijeru)December 4, 2016 at 11:04 PM

    Congratulations! The power of Red is steadily increasing. Can't wait till 0.6.2. (with libRed) is out...

    ReplyDelete
  4. I began only two days ago, and I have done a simple stuff, and really is awesome, I could do cross compilation from windows to Linux and run !!!
    I am going to study this wonderful language.!
    Thanks !

    ReplyDelete

Fork me on GitHub