To create a new BEAM language, one has many options on what to compile to first, before letting the Erlang compiler take over. Two options are Erlang Abstract Form (EAF) and Core Erlang. Elixir compiles to EAF and Lisp Flavoured Erlang compiles to Core Erlang. The Core Erlang specification is from 2004 and seems a bit too complex. EAF is much simpler to me as it’s a tree made from standard Erlang terms like lists and tuples.

Before getting started on the compiler, spend time learning leex and yecc. There are some fantastic articles linked under References that helped me, and you should read them for more context on this article.

src/scanner.xrl

This is a simple leex file, but it can already tokenize a basic lisp. It’s not good enough for Erie, but it’s a good base.

It can tokenize integers and floats and treats everything else as an atom. Note how it considers commas as whitespace (stolen from Clojure).

Definitions.

Digit      = [0-9]
Alpha      = [A-Za-z_]
Whitespace = [\n\t\s,]

Rules.

\(                 : {token, {'(', TokenLine}}.
\)                 : {token, {')', TokenLine}}.
{Digit}+\.{Digit}+ : {token, {float, TokenLine, TokenChars}}.
{Digit}+           : {token, {int, TokenLine, list_to_integer(TokenChars)}}.
{Alpha}+           : {token, {atom, TokenLine, list_to_atom(TokenChars)}}.
{Whitespace}+      : skip_token.

Erlang code.

Using Elixir, we invoke the resulting scanner module. It’s able to identify each token and its category (float/int/atom). Note the call to to_charlist/1: our scanner module is Erlang and needs special treatment when called from Elixir.

iex(1)> code = """
...(1)> (defmodule Core)
...(1)>
...(1)> (def give_me_one 1)
...(1)> """
iex(2)> {:ok, tokens, _} = code |> String.to_charlist() |> :scanner.string()
{:ok,
 [
   {:"(", 1},
   {:atom, 1, :defmodule},
   {:atom, 1, :Core},
   {:")", 1},
   {:"(", 3},
   {:atom, 3, :def},
   {:atom, 3, :give_me_one},
   {:int, 3, 1},
   {:")", 3}
 ], 4}

src/parser.yrl

After tokenization we need to parse the result, using yecc, into more meaningful forms. Because our lisp is so simple, this parser doesn’t have much to do other than create lists based on the parentheses.

Note that each of the items in the Terminals list should map to the the first element of the tuples produced by our scanner module.

Nonterminals list elems elem group.
Terminals '(' ')' int float atom.
Rootsymbol group.

group -> list       : ['$1'].
group -> list group : ['$1'|'$2'].

list -> '(' ')'       : [].
list -> '(' elems ')' : '$2'.

elems -> elem       : ['$1'].
elems -> elem elems : ['$1'|'$2'].

elem -> int   : '$1'.
elem -> float : '$1'.
elem -> atom  : '$1'.
elem -> list  : '$1'.

Erlang code.

The output of our parser module looks very similar to our original code.

iex(3)> {:ok, list} = :parser.parse(tokens)
{:ok,
 [
   [{:atom, 1, :defmodule}, {:atom, 1, :Core}],
   [{:atom, 3, :def}, {:atom, 3, :give_me_one}, {:int, 3, 1}]
 ]}

We still don’t have Erlang Abstract Form, but we’re ready to convert the output of the parser module into EAF.

lib/erie.ex

This function is highly customized and won’t work with any other code, but it ultimately produces enough EAF to call our lisp function from IEx.

Note the :export line. This is hardcoded, but makes the give_me_one function public and allows us to invoke it.

def translate([form | tail], ret) do
  case form do
    [{:atom, _l1, :defmodule}, {:atom, l2, mod}] ->
      translate(tail, [
        {:attribute, l2, :export, [{:give_me_one, 0}]},
        {:attribute, l2, :module, mod} | ret
      ])

    [{:atom, l1, :def}, {:atom, _l2, name}, {:int, l3, val}] ->
      translate(tail, [
        {:function, l1, name, 0,
          [{:clause, l3, [], [], [{:integer, l3, val}]}]
        } | ret])

    _ ->
      translate(tail, ret)
  end
end

Our final step is to reverse the forms, compile them, and load the resulting binary. We’re using the builtin Erlang functions :compile.forms/1 and :code.load_binary/3. Note the 'nofile' charlist. We’re back to calling Erlang from Elixir, so we need to abide.

iex(4)> forms = list |> Erie.translate([]) |> Enum.reverse()
[
  {:attribute, 1, :module, :Core}
  {:attribute, 1, :export, [give_me_one: 0]},
  {:function, 3, :give_me_one, 0, [{:clause, 3, [], [], [{:integer, 3, 1}]}]},
]
iex(5)> {:ok, mod, bin} = :compile.forms(forms)
{:ok, :Core, <<70, 79, 82, 49, 0, 0, 1, 188, 66, ...>>}
iex(6)> :code.load_binary(mod, 'nofile', bin)
{:module, :Core}

Now that our module is loaded, we can invoke our function.

iex(7)> :Core.give_me_one()
1

And with that, we have successfully tokenized, parsed, and compiled a brand new language on the BEAM.

References