Container Reuse

Overview

Conducto nodes run your commands in containers. This page will show you how to control which nodes reuse containers, and which nodes get fresh ones. Mostly it comes down to the use of container_reuse_context, which is a node parameter.

If your commands don't make filesystem changes, then you don't need to care about this. To learn how to pass data between nodes without relying on a shared filesystem, check out data stores instead.

The Default Strategy: Global Reuse

Reusing containers is faster than creating new ones. So unless you tell it otherwise, Conducto will reuse containers from elsewhere in the pipeline.

This strategy is global because it will reuse a container from anywhere in the pipeline. By contrast, the local reuse strategy limits container reuse certain parts of your pipeline.

Global Reuse isn't Strict

Global reuse will not delay creating a container in hopes of using fewer. The number of containers that is created by this strategy depends on:

  • When is a container needed?
  • When do containers become available?
  • Are existing containers compatible with needed ones?

If you find yourself thinking too hard about how global reuse will happen, it's probably time to either stop relying on container reuse at all, or switch to controlling container reuse manually.

Same File, Different Node

Global reuse is simple when you have a Serial Node with children that have identical parameters. When one node completes, its container becomes available for the next node.

In the example below the nodes named 1, 2, and 3 each increment a number in a file, and the nodes named ==3? test to see if the number is 3.

Two serial node parents, both running the same test--the second one fails

If a node will use the local reuse strategy, it gets an icon to the left of its state indicator.

first unset/==3? succeeds because the initial container has been reused twice by the time it runs:

  • 1 sets it to 1
  • 2 increments to 2
  • 3 increments to 3
  • ==3? checks that it's 3

The second batch continues to reuse the original container, so it starts at 3 and is incremented thrice more by the time second unset/==3? runs.

  • 1 increments to 4
  • 2 increments to 5
  • 3 increments to 6
  • ==3? is suprised that it's not 3

Since all nodes are in the same container reuse context (i.e. the global one) we can visualize this as a single unbroken tree.

The pipeline tree as a single reuse context

In order to make the second test pass, let's tell Conducto to use exactly one container per batch.

import conducto as co
CRC = co.ContainerReuseContext

with co.Serial() as root:
    with co.Serial(name="first", container_reuse_context=CRC.NEW):        nodes(3) # adds the three increment nodes and the test node
    with co.Serial(name="second", container_reuse_context=CRC.NEW):        nodes(3)

Container Reuse Contexts

As we saw in pipeline structure and controlling a pipeline, node parameters like image or cpu are inherited from a node's parent if not explicitly set. container_reuse_context is different from these because its value is not inherited.

Setting container_reuse_context on one node sets aside one reuse context. A node's children don't inherit the value because if they did, setting it once would create many reuse contexts.

Setting it to 'new' on a parent creates a local container reuse context among its children. That is, all of those children must use a single container, and that container wont be reused elsewhere.

The pipeline tree, segmented into two reuse contexts

The dotted lines in the diagram above mean that the two subtrees form separate container reuse contexts. The links to the root are used for other purposes, but not for container sharing.

In the Conducto web app, a colored square icon to the left of the node state column indicates the root of a container reuse context. The darkened square icons go with children that are inheriting the local container reuse strategy from such a parent.

Two serial nodes parents, both running the same test--both pass

You should put nodes in separate reuse contexts if it's important that their filesystems not interact. In this case we give the second batch its own container, so the counter starts at 1 in both batches.

If you want to disallow reuse entirely, you can set container_reuse_context to 'new' on your Exec nodes. This will put each node in its own reuse context--which will mean that every command gets a fresh container.

Separate reuse domains

Previously, ==3? failed because there was too much container reuse, this time it fails because there was not enough.

The pipeline tree, segmented into four reuse contexts

The incremented file doesn't appear in the container that runs the test command, since each container is new.

Local Reuse is Strict

The local reuse strategy will delay creating a container until one becomes available. This means that Parallel nodes in a local container reuse context might as well be Serial nodes.

The root of each local container reuse context is special. It determines the parameters for all other nodes in the context. If you try to override them, your overrides will be ignored. This ensures that a single container can indeed be used by each node in the context.

Escaping to Global

You can have as many local reuse contexts as you like, but a pipeline only has one global reuse context. It is possible to place nodes in a global context even if their parents are in a local context. This is done by setting container_reuse_context to 'global'.

In the pipeline definition, it looks like this:

with co.Parallel():
    nodes(n) # global by default
    with co.Serial(name="local", container_reuse_context=CRC.NEW):
        nodes(n) # local by inheritance
        with co.Serial(name="nested global", container_reuse_context=CRC.GLOBAL):            nodes(n) # global by inheritance

In the Conducto web app you can see that the local reuse context indicators don't appear for the final batch, even though it has a grandparent with container_reuse_context set to 'new'. This is because the node with container_reuse_context set to 'global' is a more immediate parent. So the last four Exec nodes end up back in the global context, just like the first four.

Children of a nested global context reuse containers from those near the root

In the example above, the first node in the nested global context turned up a 2. This is because it reused a container from one of the nodes near the root.

The section in the middle, on the other hand, was isolated in a local container reuse context, which is why its test passed.

Parallelism

One final comment about the previous example: The first test failed. This happened because the root was a Parallel node, so it didn't wait for node '1' to finish before creating containers for nodes '2', '3' and '==3?'.

This example shows why it's a bad idea to run commands in parallel that depend on each other. Since you can't control their order, you might get unpredictable results.

Conclusion

When it comes to sharing data between nodes, you've got options. By default, Conducto will do its best to speed things up by reusing containers where it can. If you don't rely on the filesystem for communicating between nodes, this might be ideal. But if it's not what you want, you can change it by segmenting your pipeline into subtrees and changing how container reuse works in each subtree.

Armed with these capabilities, you should now have complete control over the filesystem that your commands run with. Also, now that you understand how containers are created in context with a pipeline, you're ready to start asking Conducto to create standalone containers for a given node, which is how you debug a Conducto pipeline.

Concepts

API's

Example Pipelines