Safe cleaup after experiment deletion or error

Llmcc-lab · Jul 8, 2024

Hi all,

Consider this post a general ask for advice around handling errors and experiment deletions. We run an ion trap, and at times when writing new experiments we will have exceptions raised. This can be problematic if the exception is raised at an inconvenient time, such as when the ion is pumped into a dark state, and depending on how long it takes for us to reset everything it may cause the ion to heat-up and be ejected. To help avoid this, what I would like to do is have a code structure that handles exceptions (and exceptionless deletions) such that we run some code that will bring the hardware into a better state before raising the exception.

My first attempts of this were to have experiments be written as a seperate class from the runtime experiment, where the runtime experiment handles the exceptions. For example

from artiq.experiment import *
from artiq_common_methods.experiments import Example, Termination

class GracefulTermination(EnvExperiment):

    def build(self):
        self.experiment = Example(self)
        self.setattr_device("scheduler")
        self.termination = Termination(self)

    def run(self):
        try:
            self.experiment.run()
        except Exception as e:
            self.termination.yb171(e)
        except SystemExit as e:
            self.termination.yb171(e)

In this example the GracefulTermination experiment is what is run, but in it's run method it calls self.experiment.run(). It then just has try, except checks to handle the exceptions. If an exception is called then it calls a termination method which would set the hardware in such a way that the ion remains stable after the error, then it raises the exception.

The current issue with this implimentation is that it doesn't handle exceptionless deletions, where you would use the scheduler to delete an experiment, and it does so successfully without raising any errors. In this case the hardware may still be left in an underisable state. I don't know of a good way to intercept this and handle it correctly.

I'd also like to know if there is a better way of doing this? Is there a method I can write that would overwright a parent method similar to what we do with run and EnvExperiment (from what I've seen there isn't)? Is there a way of having the scheduler handle it etc.

Kind regards,

Liam

fsagbuya · Jul 9, 2024

Hi, lmcc-lab,

Your current approach with try-except is a good start for catching most exceptions and running cleanup code. In addition to that, you can leverage the scheduler.request_termination() and exeption TerminationRequested to perform a graceful termination of your experiment. AFAIK, there's currently no built-in way to intercept exceptionless deletions in ARTIQ. The scheduler will delete the experiment without raising an exception. To address this, here are some few options:

If you are manually deleting an experiment, you can always perform a cleanup before that.
Implementing periodic state checks to verify hardware state and cleanup if necessary.
Modifying some parts of the ARTIQ codebase to implement a cleanup callback or add an exeption in scheduler's deleter.

sb10q · Jul 9, 2024

See also idle kernels.

Llmcc-lab · Jul 10, 2024

sb10q Idle kernels seem like a good way of doing this (if I understand what it's doing). From the documentation, it looks like it's a way of specifying the state of the hardware "whenever it is not connected to the host via Ethernet".

If I have artiq_master running, would this mean that the host is always connected via Ethernet, and so the idle kernel wouldn't be used? Or is it that the idle kernel is used whenever an experiment isn't being run?

I guess I'd like to understand a basic example of how the idle kernel could be used. For example, could an idle experiment be

from artiq.experiment import *
from artiq_common_methods.experiments import IdleExperiment

class IdleState(EnvExperiment):

    def build(self):
        self.setattr_device('core')
        self.experiment = IdleExperiment(self)
    
    def prepare(self):
        self.experiment.ion_trapped = self.get_dataset('ion_trapped')

    @kernel
    def run(self):
        self.core.reset()
        self.experiment.run()

Where IdleExperiment contains methods that will be run depending on the ion trapped (these methods need to be on kernel only according to docs). So when an experiment is finished running, would this experiment be run?

As a simple example instead, if we had an idle experiment like so

from artiq.experiment import *

class ExampleIdle(EnvExperiment):
    
    def build(self):
        self.setattr_device('core')
        self.setattr_device("led")

    @kernel
    def run(self):
        self.core.reset()
        self.led.on()

then would the LED come on after every experiment?

Llmcc-lab · Jul 10, 2024

fsagbuya Thank you for your recommendations. Your first point

fsagbuya you can always perform a cleanup before that.

I'm not sure what you mean by that. Would you be able to give an example?

fsagbuya Modifying some parts of the ARTIQ codebase to implement a cleanup callback or add an exeption in scheduler's deleter.

This might be an interesting way of handling it (especially if I just say "this is how we are going to write experiments from now on"), I'll see if idle kernels are what I want first though

architeuthis · Jul 10, 2024

Idle kernels seem like a good way of doing this (if I understand what it's doing). From the documentation, it looks like it's a way of specifying the state of the hardware "whenever it is not connected to the host via Ethernet".

I believe they run automatically (and loop) while no other kernel is running or scheduled, regardless of whether or not the master is connected. That sentence in the manual should probably be changed...

I guess I'd like to understand a basic example of how the idle kernel could be used.

The archetypal example idle kernel is artiq/examples/kasli/idle_kernel.py, i.e. here.