about:drewcsillag

Jan 16, 2023 - 10 minute read - scaling configuration

Scaling Configuration: Object Building Languages

YAML, go templated YAML, HOCON, HCL(2), GCL, JSON. Just some of the common languages used for configuring the world we live in, because they need to be human readable, not just machine readable. At first, it’s all fine, but then to ensure consistency and reuse, they add templating features, and the templating languages become more complex to handle the emerging needs as projects require more.

Some evolve to be Turing complete, some specifically eschew Turing completeness, as they need to ensure that at runtime, nothing bad happens. As the templating advances, things get harder to understand and debug as the configuration language rarely provides sufficient tooling. It can be hard to understand what the result of all the templating actually is. Eventually the templates, and sometimes their consumers, become impenetrable. That’s before you get to some of the odd quirks of some of them, to ensure you get the quoting or indentation correct as you’re templating. Either way, as things scale up, configuration gets messy.

There’s definitely a better way..

The title spoils what I propound, but it’s Object Building Languages (OBLs). If you’ve seen CDK8s, Pystachio, Bazel (sorta, as it’s intermediate form isn’t really exposed), or others of the sort, you’ll have seen the concepts. For the rest, the idea is that we use a normal programming language to instantiate configuration objects, and then generate the form that’s actually used by whatever the runtime system is.

The advantages are that you can standardize things, get the modularity you need, without having a poorly bolted on programming language to deal with. Not only that, but by picking an existing language as your base language, you get all the standard tooling for that language for debugging, editing, and so forth. Additionally, because the end system isn’t interpreting it directly, you can test the output (e.g. YAML) of the DSL to ensure that it’s actually doing what you think it is. This last one is critical when refactoring larger setups because you can be sure that the output didn’t change.

Because it’s also a real programming language, you can also add whatever validation you need, enforce any opionions your org has, etc. rather than trying to wedge these into a conventional configuration language that never quite seems to fit.

So when there’s a OBL for your use case, you’d be wise to consider it. But if not, or if you don’t like the OBL that you can find, writing one isn’t that hard. For an example, I’ll build a mini version of a Tekton DSL with Python as the base for a Task.

As to base language choice, Python has the advantage that just about everyone knows Python as it’s widely used, has a good debugger and tooling is plentiful. Typescript could be a good choice also, but I don’t know enough to be sure. Compiled languages like Go or Rust could probably be used also, but the overhead there is probably more than it’s worth.

In any case, the DSL we’re going through would look like this:

build = Step(
    name = "build",
    image = "ubuntu:20.04",
    script = """
git clone git@github.com:foo/bar
cd bar
go build
""")

Task(
    name = "build-only",
    description = "",
    steps = [build],
)

That would generate our desired output:

apiVersion: tekton.dev/v1beta
kind: Task
metadata:
  name: build-only
spec:
  steps:
  - name: build
    image: ubuntu:20.04
    script: |2

      git clone git@github.com:foo/bar
      cd bar
      go build

Actually, for our Step here, I’d really like to make a standard way of making steps, as they should all use ubuntu:20.04:

def stdstep(name, script):
    return Step(name = name,
        image = "ubuntu:20.04",
        script = script)

build = stdstep(
    name = "build",
    script = """
git clone git@github.com:foo/bar
cd bar
go build
"""
)

A little overkill for our case, but illustrates the point.

Back to the implementation: it’s evident we need these classes

from ruamel.yaml import YAML, RoundTripRepresenter
import sys

tasks = {}
class Task:
    def __init__(self, name, description, steps):
        self.name = name
        self.description = description
        self.steps = steps
        tasks[self.name] = self

class Step:
    def __init__(self, name, image, script):
        self.name = name
        self.image = image
        self.script = script

The tasks dictionary might be a bit unexpected, but you need a way to figure out ultimately what to output. You could take the bazel approach and do everything by name, and it wouldn’t be wrong, but it is more complex as you may have to deal with namespacing and other issues that Python’s code namespacing deals with all by itself.

With those in place, it’s just a matter of executing the configuration script

ns = {
    "Task": Task,
    "Step": Step,
}

f = open(sys.argv[1]).read()
## compile it so filename info will be retained and reported in the event of error
co = compile(f, sys.argv[1], 'exec')
exec(f, ns)
vs = [v.dict() for v in tasks.values()]
y.dump_all(vs, stream=stdout)

For the simple case, that’s all you need, you run your DSL runner against the configuration file and voilà, properly formatted YAML. Oh, not quite, we never implemented the dict() methods it calls.

class Task:
    ...
     def dict(self):
        return {
            "apiVersion": "tekton.dev/v1beta",
            "kind": "Task",
            "metadata": {"name": self.name},
            "spec": {
                "steps": [steps[i].dict() for i in self.steps]
            }
        }
class Step:
    ...
    def dict(self):
        return {
            "name": self.name,
            "image": self.image,
            "script": self.script
        }       

Ok, now you can run the dslrunner and get output. The downside here as the DSL maintainer is that you need to make sure your DSL exposes the necessary parts of the underlying YAML schema, but it’s only occasional toil after initially writing it, as your users will let you know if there’s something they need that you don’t expose. Or they may shoot you a PR because looking at the runner, it’s very clear what needs to happen. You could cheese out and add a **kwargs to the constructor and add that dict to the dict() method work, but I don’t recommend it as it then makes it so people can specify incorrect attributes to the objects and nothing will tell them.

If you have some sort of schema file for your output form, it can be a decent idea to write something to translate from the schema to the Python classes you’d require, rather than having to write these out by hand.

First time right considerations

If you choose Python as your base DSL, there are some things you should consider at the beginning, as tightening this kind of thing down after the fact can be difficult/practically impossible:

Overriding the __import__ hook

Allowing arbitrary imports can lead to … unexpected behavior, especially depending on exactly when evaluation of the DSL occurs, vs. when it’s consumed. So you may want to turn off import entirely, or alternatively, have an allowlist of acceptable things users are allowed to import; only adding to it when you’ve considered the use case. Another way to deal with this is to import the modules you wish to expose and add them to the top level namespace, or make an import-like function to load them in from where you’ve imported them already in the runner.

If you don’t allow import, you’ll still need a way to include things from other files if you want the templating and consistency that caused you to set out to do this in the first place. Because you can control the namespace that the DSL scripts run in, you could implement something like this:

def make_ns(): 
    new_ns = {
        "Task": Task,
        "Step": Step,
    }
    new_ns['load'] = lambda spec, *names: load(new_ns, spec, names)

    return new_ns

ns = make_ns() # the top level DSL script namespace

def load(ns, some_kind_of_path_specifier, list_of_names):
    #get script_text however you choose
    script_text = get_the_script(some_kind_of_path_specifier) 
    load_ns = make_ns() # make a new namespace
    co = compile(script_text, path, 'exec')
    exec(co, load_ns) #exec the script in the new namespace
    #extract the names from the loaded namespace into the caller's namespace
    for name in list_of_names: 
        ns[name] = load_ns[name]

This would enable for us to put stdstep into a separate file. For the current example, we’ll put it in common.in and then in the top level DSL file,

load("common.in", "stdstep")

and the rest continues as before. Not shown in the example, but you’d want loads to be relative to the file loading it, and so you’d need to add bookkeeping to track that and do the right thing.

Depending on your environment, you might want something more than just a file path as the path specifier. For example / might not be / in the filesystem, but relative to some other root. You could have it be a url, or even use bazel-style targets as the specifier, as long as you know how to retrieve the configuration stored there.

Consider what’s in __builtins__

The __builtin__ namespace has some things in there you might want to disallow also. The open, compile, eval, and exec should be at the top of your list, but examine the rest to see what else to consider. Things that reach outside the interpreter are the things you want be wary of.

How Paranoid?

You could also consider disallowing use of while loops or try/except or other constructs if you were so inclined by doing AST examination after compiling the source files. There’s the ast module which can help here.

Mistakes

Obviously, people make mistakes. Something you can add to catch some is argument validation in the object constructors. Validation is one kind, but syntax errors are another. One nice thing about this implementation is that you get usable errors for the mere cost of explicitly compileing the script before execing it.

One example: a syntax error in a loaded file:

Traceback (most recent call last):
  File "/Users/drew/devel/dsl-py/dslrunner.py", line 63, in <module>
    exec(co, ns)
  File "DSL3.in", line 3, in <module>
    load("common.in", "stdstep")
  File "/Users/drew/devel/dsl-py/dslrunner.py", line 46, in <lambda>
    new_ns['load'] = lambda spec, names: load(new_ns, spec, names)
  File "/Users/drew/devel/dsl-py/dslrunner.py", line 53, in load
    co = compile(script_text, path, 'exec')
  File "common.in", line 2
    Step(name = name,
        ^
SyntaxError: '(' was never closed

Another example, if I put a validation in the Task constructor to check that the name isn’t None and I change the input to pass None for the name, you get a usable error:

Traceback (most recent call last):
  File "/Users/drew/devel/dsl-py/dslrunner.py", line 63, in <module>
    exec(co, ns)
  File "DSL3.in", line 20, in <module>
    Task(
  File "/Users/drew/devel/dsl-py/dslrunner.py", line 11, in __init__
    raise ValueError("name cannot be None")
ValueError: name cannot be None

While there is some noise from the dslrunner in the tracebacks, the error points in the DSL code are clearly delineated.

As things get built out, you might consider a traceback filter to filter out the noisy parts to make thing easier for your users.

Summary

As configuration language evolve, they don’t evolve especially well. They evolve into complexity, they’re hard to refactor safely, and get hard to debug because they don’t have the tooling that more common languages like Python have. With a little code, most of it the objects you expose, you can write a flexible DSL preprocessor that can evolve much better as you have the language tooling available to help.

The full dslrunner.py from this post:

from ruamel.yaml import YAML, RoundTripRepresenter
import sys

tasks = {}
steps = {}

class Task:
    def __init__(self, name, description, steps) -> None:
        self.name = name
        self.description = description
        self.steps = steps
        tasks[self.name] = self
    
    def dict(self):
        return {
            "apiVersion": "tekton.dev/v1beta",
            "kind": "Task",
            "metadata": {"name": self.name},
            "spec": {
                "steps": [steps[i].dict() for i in self.steps]
            }
        }

class Step:
    def __init__(self, name, image, script) -> None:
        self.name = name
        self.image = image
        self.script = script
        steps[self.name] = self

    def dict(self):
        return {
            "name": self.name,
            "image": self.image,
            "script": self.script
        }

def make_ns(): 
    new_ns = {
        "Task": Task,
        "Step": Step,
    }
    new_ns['load'] = lambda spec, *names: load(new_ns, spec, names)

    return new_ns

def load(ns, path, list_of_names):
    script_text = open(path).read() #this is whatever you choose
    load_ns = make_ns() # make a new namespace
    co = compile(script_text, path, 'exec')
    exec(co, load_ns) #exec the script in the new namespace
    for name in list_of_names: #extract the names from the loaded namespace into the top level namespace
        ns[name] = load_ns[name]

ns = make_ns()

f = open(sys.argv[1]).read()
## compile it so filename info will be retained in the event of error
co = compile(f, sys.argv[1], 'exec')
exec(co, ns)

vs = [v.dict() for v in tasks.values()]

# make it so the YAML output looks closer to the way a human would write it
def repr_str(dumper: RoundTripRepresenter, data: str):
    if '\n' in data:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

y = YAML()
y.representer.add_representer(str, repr_str)

y.dump_all(vs, stream = sys.stdout)