REPL Start

Code for this post is available at 2734fc099d.

A basic REPL will go a long way toward making Erie feel like a true Lisp. It doesn’t need to be fancy and support helper functions, tab-completion, or even the “up arrow” to redo the last expression. All we want (for now) is to read an expression, evaluate it on the fly, print the result, and repeat.

I was able to put together something that reads input and starts a new Task to compile and evaluate it. But this isn’t terribly useful because Erie is missing a standard library. The only functions that can be called are Elixir or Erlang functions (but they work!).

Core(1)> (Elixir.String.split "abc def ghi")
["abc", "def", "ghi"]
Core(2)> ('erlang.atom_to_binary 'derp 'unicode)
"derp"

This is a nice start but defining our own functions in the REPL and calling them would be better.

In Elixir’s REPL, you can do this, but not using def … do … end (unless it’s wrapped in defmodule … do … end). That’s a bit obnoxious, so I want my REPL to handle (def …) gracefully. I do that by stealing a feature of Clojure’s REPL where you’re effectively “in” a module while in the REPL, so any function defined is a part of that module.

I want to make as much use of my Translator module as possible, but as it’s currently written, it requires all function definitions to be inside a module (now I see why Eilixr may have made that decision). To get around this, I wrap each call to (def …) in a new module thats name is based on a counter. For example, this defintion of x in the REPL

Core(1)> (def x [] 1)

would be compiled as if I wrote

(defmodule Core1) (def x [] 1)

And this defintion of y

Core(2)> (def y [] 2)

would be compiled as if I wrote

(defmodule Core2) (def y [] 2)

This works great, but now calling x the intended way produces an error because it’s not defined in the Core module.

Core(1)> (def x [] 1)
Core(2)> (Core.x)

20:27:31.177 [error] Task #PID<0.143.0> started from Erie.Repl terminating
** (UndefinedFunctionError) function :Core.x/0 is undefined (module :Core is not available)

To solve this, we can keep track of all functions defined in the REPL and their associated modules. If one of them is invoked under the Core module, we can intercept the call and translate it to the correct module.

The REPL is by no means complete, but it feels good to be able to write some Erie and see it execute in realtime.

Macro and Other Updates

Code for this post is available at 0646b91480.

Erie can now handle some basic macros. Very basic. It only does macro expansion once, and it doesn’t correctly call Erie definied functions from within a macro.

Despite those limitations it can correctly compile this module. However, cons and + aren’t properly defined as functions, so executing either one_plus_two or the_first will fail.

(defmodule Core)

(defmacro infix [ast]
  ((lambda [list]
    [(Elixir.Enum.at list 1) (Elixir.Enum.at list 0) (Elixir.Enum.at list 2)])
    (Elixir.Enum.at ast 0)))

(def one_plus_two []
  (infix (1 + 2)))

(defmacro a_list [ast]
  ['cons 1 nil])

(def the_first []
  (Elixir.List.first (a_list nil)))

I’m pretty happy with the progress I’ve made so far on Erie, but there is obviously a lot more to do. Some possible next steps are:

  • REPL
  • Types
  • Standard library
  • More advanced macros
  • Syntax highlighting

More advanced macros can definitely wait. A standard library will be very handy, but it’s not preventing me from writing Erie right now. Types are going to be very important soon, but I think a REPL will have the biggest impact. The ability to write small bits of Erie and execute it immediately will help shape the ergonomics and dictate what should be included in the standard library.

Macro Plan

No Lisp is complete without a macro system. If only it were so easy to implement.

Having never written a language before, let alone a macro system, I’m going to try to break it down into manageable pieces. I have no idea if this is the right approach, but we’ll start with it and see where it takes us.

The steps I’m envisioning are

  1. Parse the code.
  2. Find instances of defmacro.
  3. Find invocations of each of those macros.
  4. Convert the output of the parser into an Erie abstract syntax tree.
  5. Execute each macro with the AST as the argument.
  6. Support “unquoting” somehow.
  7. Replace the macros in the AST with their results.
  8. Repeat until all macros have been expanded.

The simplest macro I can think of is comment: replace the inner forms with a single nil.

(defmacro comment [ast]
  nil)

(def loop []
  (comment
    (Enum.map [1 2 3]
      (lambda [x]
        (+ 1 x)))))

Step One

After parsing, we have this.

[
  [
    {:atom, 1, :defmacro},
    {:atom, 1, :comment},
    {:list, 1, [{:atom, 1, :ast}]},
    {:atom, 2, nil}
  ],
  [
    {:atom, 4, :def},
    {:atom, 4, :loop},
    {:list, 4, []},
    [
      {:atom, 5, :comment},
      [
        {:symbol, 6, :"Enum.map"},
        {:list, 6, [{:integer, 6, 1}, {:integer, 6, 2}, {:integer, 6, 3}]},
        [
          {:atom, 7, :lambda},
          {:list, 7, [{:atom, 7, :x}]},
          [{:atom, 8, :+}, {:integer, 8, 1}, {:atom, 8, :x}]
        ]
      ]
    ]
  ]
]

Step Two

We can find instances of defmacro just like we find instances of def. In this case we only find one: comment.

Step Three

We find inovcations of comment by looking through the results of the parser for anything that looks like a function call to comment.

Step Four

Converting the output of the parser into an Erie abstract syntax tree will look something like this.

[
  [:defmacro, :comment, [:ast], nil],
  [:def, :loop, [],
    [:comment, [:"Enum.map", [1, 2, 3], [:lambda, [:x], [:+, 1, :x]]]]
  ]
]

Step Five

To execute the comment macro, we need to complie it as if it were a normal function. That is possible with :compile.forms/1, but as far as I know, it needs to be contained in a module.

Current strategy is to define all macros as functions in their “macro” module which will be prefixed with MACRO. E.g. the module Erie.Core’s macros will be defined as functions in MACRO.Erie.Core.

Step Six

TBD.

Step Seven

The result of the call to comment is nil, so we can replace that list in the parser output with nil. We’ll also remove the macro definition which should leave us with this.

[
  [
    {:atom, 4, :def},
    {:atom, 4, :loop},
    {:list, 4, []},
    {:atom, 5, nil}
  ]
]

Step Eight

In this case nothing. comment doesn’t need any further expansion, so we can now continue the pipeline and pass this to the translator stage which will produce Erlang Abstract Form for final compilation.

Proper let Syntax

Code for this post is available at e6c19d930f.

Most of the basic features of Erie are working, but I am still missing let bindings. I looked for inspiration in the syntax of other Lisps and found a couple of options. Common Lisp and Racket are techincally the same because Racket treats [] as (), but I included both because Racket often uses [] to make certain forms stand out.

None of the styles strike me as particularly better. Clojure has fewer characters, but I like the way CL and Racket group the binding and the value more explicitly.

Clojure

(let [a 1 b 2]
  (+ a b))

Common Lisp

(let ((a 1) (b 2))
  (+ a b))

Racket

(let ([a 1]
      [b 2])
  (+ a b))

Regardless of the syntax, implmenting let should be straight forward. Erlang Abstract Format supports match which is very similar. In Elixir

x = 42

is represented as

{:match, 1, {:var, 1, :x}, {:integer, 1, 42}}

in EAF and could be used to represent let bindings.

But before implementing that, I found this answer on StackOverflow and it revealed a new way to look at let. To sum it up, let doesn’t need to be implemented in the compiler. It can be a macro that converts its forms into a tree of nested lambdas. (Samples are using the currently proposed Erie let syntax.)

(let
  [a 1]
  [b 2]
  (+ a b))

is equivalent to

((lambda [a]
  ((lambda [b]
    (+ a b)) 2)) 1)

So instead of adding let as a feature in the compiler, I’ll save the syntax decision for when macros are implemented.

Erie's Function Signatures

Code for this post is available at 9e7d81b810.

I had written previously that a function in Erie might look like this. After the function name is the list of arguments and their types, followed by the return type, a doc comment, and finally the function body.

(def to_string [result: (Result String Integer)] String
  """
  Convert to result into a printable string to
  be used in interfaces.
  """
  (case result
    [{'ok int} (Integer.to_string int)]
    [{'error str} str]))

After further deliberation, I realized including the argument and return types in the definition makes things a little bit awkward if I want to support pattern matching on the arguments. Not only does it make the types redundant, but the colon syntax gets weird.

(def to_string [{'ok int}: (Result String Integer)] String
  …)

(def to_string [{'error ""}: (Result String Integer)] String
  …)

(def to_string [fallthrough: (Result String Integer)] String
  …)

Including the comment in the function also gets strange when there are multiple clauses for this function. Which of the clauses should the comment be written under?

My latest thinking is to move the types and doc comment out of the definition and into their own lines. Perhaps something like this.

(doc to_string
  """
  Convert a result type into a printable
  string to be used in interfaces.
  """)
(sig to_string (Result String Integer) String)
(def to_string [{'ok int}]
  …)

(def to_string [{'error ""}]
  …)

(def to_string [fallthrough]
  …)

Now the doc and sig lines could be technically written anywhere in the module, but the formatter can move them to the correct location.

Parsing Function Definitions

In the previous article, we converted not quite Erie code into Erlang Abstract Format that could be compiled, loaded, and run. I say “not quite” because the code was intentially simplified to get the lexer and parser working. Now we’re going to parse real Erie code with arguments and types in the function definitions. For now all of the type information will be ignored to get things running.

We’ll start with a 0-arity function. The empty list of arguments is represented with [], the return type with the symbol Integer, and the last element is the hardcoded return value of 1.

I’ve been using this guide from Viktor Söderqvist to help me write the tests’ expected values in Erlang Abstract Format.

test "0 arity" do
  code = """
  (defmodule Core)

  (def give_me_one [] Integer 1)
  """

  {:ok, forms} = Parser.parse(code)

  assert {:ok,
          [
            {:attribute, 1, :module, :"Erie.Core"},
            {:attribute, 1, :export, [give_me_one: 0]},
            {:function, 3, :give_me_one, 0,
              [
                {:clause, 3, [], [], [{:integer, 3, 1}]}
              ]}
          ]} == Translator.to_eaf(forms)
end

The first thing we need to do is read the name of the module. We can be naïve and expect this to always be the first line. The output from our parser will look something like this. Note the new :symbol category that didn’t exist in the previous article.

[
  [{:atom, 1, :defmodule}, {:symbol, 1, :Core}],
  [
    {:atom, 3, :def},
    {:atom, 3, :give_me_one},
    [],
    {:symbol, 3, :Integer},
    {:atom, 3, 1}
  ]
]

The new :symbol category is applied to any atom that begins with ' or a capital letter i.e. 'my_symbol or MySymbol. To translate this code to EAF, we’ll build a specialized function that matches the entire defmodule list. We automatically prepend Erie. to the module name, similar to what Elixir does, to provide some namespacing.

def extract_module([[{:atom, _, :defmodule}, {:symbol, line, mod}] | rest]) do
  mod = ("Erie." <> Atom.to_string(mod)) |> String.to_atom()
  {:ok, mod, line, rest}
end

Once we have the module, we can translate each function. Here we’re matching on the def atom, the name of the function, an empty argument list, and ignoring the return type. We’re also storing the function name and arity in a seperate structure so we can export it later to make it publicy invokable.

def translate(struct, [form | tail], ret) do
  case form do
    [{:atom, _, :def}, {:atom, line, name}, [], {:symbol, _l2, _ret_type} | body] ->
      body = translate_body(body)

      %{struct | functions: [{name, 0} | struct.functions]}
      |> translate(tail, [
        {:function, line, name, arity, [{:clause, line, [], [], body}]}
        | ret
      ])
  end
end

def translate_body([{:integer, line, val} | rest], accum) do
  translate_body(rest, [{:integer, line, val} | accum])
end

def translate_body([], accum) do
  accum
en

This is able to successfully translate the 0-arity function, which is great, but obviously not enough. Next we’ll translate a 1-arity function that returns the same integer that was passed as an argument.

test "1 arity" do
  code = """
  (defmodule Core)

  (def identity ['x Integer] Integer x)
  """

  {:ok, forms} = Parser.parse(code)

  assert {:ok,
          [
            {:attribute, 1, :module, :"Erie.Core"},
            {:attribute, 1, :export, [identity: 1]},
            {:function, 3, :identity, 1,
              [
                {:clause, 3, [{:var, 3, :x}], [], [{:var, 3, :x}]}
              ]}
          ]} == Translator.to_eaf(forms)
end

To handle the identity function, we need to translate the arguments and the body. Popping the first element(s) off their respective list, converting to EAF, then rebuilding the list has a been a very simple and straightforward solution.

def translate(struct, [form | tail], ret) do
  case form do
    [{:atom, _, :def}, {:atom, line, name}, params, {:symbol, _l2, _return_type} | body] ->
      body = translate_body(body)
      params = translate_params(params)
      arity = Enum.count(params)

      %{struct | functions: [{name, arity} | struct.functions]}
      |> translate(tail, [
        {:function, line, name, arity, [{:clause, line, params, [], body}]}
        | ret
      ])
  end
end

def translate_body(list) do
  list |> translate_body([]) |> Enum.reverse()
end

def translate_body([{:atom, line, val} | rest], accum) do
  translate_body(rest, [{:var, line, val} | accum])
end

def translate_body([], accum) do
  accum
end

def translate_params(list) do
  list |> translate_params([]) |> Enum.reverse()
end

def translate_params([{:symbol, line, name}, {:symbol, _, _type} | rest], accum) do
  translate_params(rest, [{:var, line, name} | accum])
end

def translate_params([], accum) do
  accum
end

With a couple of tweaks to our .yrl and .xrl files and this code, we can parse something that actually looks like Erie. It also puts us in a good place to extend the parsing code and handle more advanced scenarios.

Parsing Erie to EAF

To create a new BEAM language, one has many options on what to compile to first, before letting the Erlang compiler take over. Two options are Erlang Abstract Form (EAF) and Core Erlang. Elixir compiles to EAF and Lisp Flavoured Erlang compiles to Core Erlang. The Core Erlang specification is from 2004 and seems a bit too complex. EAF is much simpler to me as it’s a tree made from standard Erlang terms like lists and tuples.

Before getting started on the compiler, spend time learning leex and yecc. There are some fantastic articles linked under References that helped me, and you should read them for more context on this article.

src/scanner.xrl

This is a simple leex file, but it can already tokenize a basic lisp. It’s not good enough for Erie, but it’s a good base.

It can tokenize integers and floats and treats everything else as an atom. Note how it considers commas as whitespace (stolen from Clojure).

Definitions.

Digit      = [0-9]
Alpha      = [A-Za-z_]
Whitespace = [\n\t\s,]

Rules.

\(                 : {token, {'(', TokenLine}}.
\)                 : {token, {')', TokenLine}}.
{Digit}+\.{Digit}+ : {token, {float, TokenLine, TokenChars}}.
{Digit}+           : {token, {int, TokenLine, list_to_integer(TokenChars)}}.
{Alpha}+           : {token, {atom, TokenLine, list_to_atom(TokenChars)}}.
{Whitespace}+      : skip_token.

Erlang code.

Using Elixir, we invoke the resulting scanner module. It’s able to identify each token and its category (float/int/atom). Note the call to to_charlist/1: our scanner module is Erlang and needs special treatment when called from Elixir.

iex(1)> code = """
...(1)> (defmodule Core)
...(1)>
...(1)> (def give_me_one 1)
...(1)> """
iex(2)> {:ok, tokens, _} = code |> String.to_charlist() |> :scanner.string()
{:ok,
 [
   {:"(", 1},
   {:atom, 1, :defmodule},
   {:atom, 1, :Core},
   {:")", 1},
   {:"(", 3},
   {:atom, 3, :def},
   {:atom, 3, :give_me_one},
   {:int, 3, 1},
   {:")", 3}
 ], 4}

src/parser.yrl

After tokenization we need to parse the result, using yecc, into more meaningful forms. Because our lisp is so simple, this parser doesn’t have much to do other than create lists based on the parentheses.

Note that each of the items in the Terminals list should map to the the first element of the tuples produced by our scanner module.

Nonterminals list elems elem group.
Terminals '(' ')' int float atom.
Rootsymbol group.

group -> list       : ['$1'].
group -> list group : ['$1'|'$2'].

list -> '(' ')'       : [].
list -> '(' elems ')' : '$2'.

elems -> elem       : ['$1'].
elems -> elem elems : ['$1'|'$2'].

elem -> int   : '$1'.
elem -> float : '$1'.
elem -> atom  : '$1'.
elem -> list  : '$1'.

Erlang code.

The output of our parser module looks very similar to our original code.

iex(3)> {:ok, list} = :parser.parse(tokens)
{:ok,
 [
   [{:atom, 1, :defmodule}, {:atom, 1, :Core}],
   [{:atom, 3, :def}, {:atom, 3, :give_me_one}, {:int, 3, 1}]
 ]}

We still don’t have Erlang Abstract Form, but we’re ready to convert the output of the parser module into EAF.

lib/erie.ex

This function is highly customized and won’t work with any other code, but it ultimately produces enough EAF to call our lisp function from IEx.

Note the :export line. This is hardcoded, but makes the give_me_one function public and allows us to invoke it.

def translate([form | tail], ret) do
  case form do
    [{:atom, _l1, :defmodule}, {:atom, l2, mod}] ->
      translate(tail, [
        {:attribute, l2, :export, [{:give_me_one, 0}]},
        {:attribute, l2, :module, mod} | ret
      ])

    [{:atom, l1, :def}, {:atom, _l2, name}, {:int, l3, val}] ->
      translate(tail, [
        {:function, l1, name, 0,
          [{:clause, l3, [], [], [{:integer, l3, val}]}]
        } | ret])

    _ ->
      translate(tail, ret)
  end
end

Our final step is to reverse the forms, compile them, and load the resulting binary. We’re using the builtin Erlang functions :compile.forms/1 and :code.load_binary/3. Note the 'nofile' charlist. We’re back to calling Erlang from Elixir, so we need to abide.

iex(4)> forms = list |> Erie.translate([]) |> Enum.reverse()
[
  {:attribute, 1, :module, :Core}
  {:attribute, 1, :export, [give_me_one: 0]},
  {:function, 3, :give_me_one, 0, [{:clause, 3, [], [], [{:integer, 3, 1}]}]},
]
iex(5)> {:ok, mod, bin} = :compile.forms(forms)
{:ok, :Core, <<70, 79, 82, 49, 0, 0, 1, 188, 66, ...>>}
iex(6)> :code.load_binary(mod, 'nofile', bin)
{:module, :Core}

Now that our module is loaded, we can invoke our function.

iex(7)> :Core.give_me_one()
1

And with that, we have successfully tokenized, parsed, and compiled a brand new language on the BEAM.

References

New Language on the BEAM

Elixir is a enjoyable, productive language. OTP provides wonderful libraries, and BEAM is a kick-ass virtual machine. What else would you want as an Elixir/Erlang/OTP developer?

Static type system? Gleam and Alpaca have that.

More parentheses? Lisp Flavoured Erlang has plenty of those.

Great Erlang/OTP interop? Elixir is already very good at that.

What I want is something that combines all of the above. I’ve been inspired by Racket and Clojure, but love working with OTP, so I’m going to write a new language. Because it’s for me, I’m going to name it after myself and call it Erie.

I think it might look something like this. Pardon the lack of syntax highlighting. That will come later.

(defmodule MyNs.MyModule
  [use: MyNs.MacroModule
   import: Ecto.Query
   alias: MyNs.Other.Long.ModuleName])

(deftype (Result e a)
  {'ok a}
  {'error e})

(deftype (Optional a)
  a
  nil)

(def to_string [result: (Result String Integer)] String
  (case result
    [{'ok int} (Integer.to_string int)]
    [{'error str} str]))

(def join_with_oxford_comma [list: (List String)] String
  """
  Combines `list` into a grammatically correct string.
  "one" or "one and two" or "one, two, and three".
  """
  (case list
    [[] ""]
    [[one] one]
    [[one two] (string-append one " and " two)]
    [list
      (let
        [last (Enum.at list -1)
         all-but (Enum.take list (range 0..-1))]
        (string-append (Enum.join all-but ",") " and " last))]))

A few features to note.

  • Polymorphic type system.
  • Basic types are more similar to Elixir’s (lists, tuples, maps, and strings as binaries).
  • Pattern matching will be front and center.

Everything is subject to change as it gets built, but after “writing” a few dozen lines of Erie, this seems like a good place to start.

Deploy and run a load-balanced container with `docker push`

I have some problems when running on AWS Elastic Container Service. These are not showstopper problems, but they create enough friction that there is a potential product to fix them.

The number of pieces needed to get a container running in ECS is quite high. You need an EC2 instance, an ECS cluster, an ECS service, a load balancer, a target group, then a task definition. If you do not know what all of those are, you will be studying the documentation for hours, then experimenting to get your setup correct. After all of that, hopefully you finally have a container running in your infrastructure.

Now imagine your application is written in Erlang/Elixir and you want to cluster your nodes. This requires a number of different ports, but ECS will assign them random host ports (e.g. 4369 on the container may be exposed as 32567 on the host). Random hosts ports are not a problem when you can use an ELB and a target group, but ECS only allows one per service. Now you either need to setup your own service discovery with something like Consul, or lock yourself into one container per EC2 host by forcing the same port numbers in the container as well as the EC2 host (i.e. 4369 on the container is exposed as 4369 on the host).

There must be some room for convention over configuration here. If I push a container to a docker registry, why can’t it be deployed automatically just from that? If my container exposes port 80, can’t a load balancer be setup automatically to forward port 80 traffic to my containers? How about an automatic hostname with SSL? Smart auto-scaling should be built in with no configuration needed. If I expose multiple ports in my container, how about automatic service discovery?

It is time to start interviewing others who deploy containers to see if there is a real problem here.

Elm Workshop

I am hosting a two-day workshop in Boston on May 10 and 11 for the Elm programming language.

It has been almost one year since I started producing Elmseeds. I’ve really enjoyed putting them together to try to teach people the power of Elm. The way I see it, the more people who write Elm, the more likely I am to write Elm for a living. I’ve been doing what I can to speed up that process, but I want to take it to the next level, so I’m hosting a workshop.

It is a two-day workshop, hosted in downtown Boston with free lunch, snacks, and coffee. On the first day, we will walk through basic and intermediate topics including Elm the language, the architecture, navigation and http requests. The second day is advanced topics that help you get Elm into production: animations, testing, native modules, deployment, and more. By the end of the workshop, you will be prepared to return to work and embed Elm in an existing product seamlessly.

Sound interesting? See details on Eventbrite.

Archive