Weird slack consumption for many events

Hhk99 · 24 Apr

Hi,

I've run into some issues when trying to ramp the amplitude of a laser controlled by a SUServo. I'm submitting a relatively large number of steps for a controlled slope. However, when I have a large number of steps there appears to be a point beyond which all the slack is suddenly consumed.

Having investigated further, I've distilled it down to a simple example of just a train of N TTL pulses (see below). There are a couple of different behaviors I don't understand.

Why is the slack suddenly consumed for large N?

This graph shows the consumed slack for number of pulses, with a jump at 65 pulses.

Why does the critical value of N vary erratically with the number of TTLs being toggled?

1 TTL: 65
2 TTLs: 257
3 TTLs: 129
4 TTLs: 129
5 TTLs: 65
6 TTLs: 86

Ndscan:

class ElapsedTimer(ExpFragment):

    def build_fragment(self):
        self.setattr_device("core")
        self.core: Core

        self.ttl3: TTLInOut = self.get_device("ttl3")
        self.ttl5: TTLInOut = self.get_device("ttl5")
        self.ttl6: TTLInOut = self.get_device("ttl6")
        self.ttl7: TTLInOut = self.get_device("ttl7")
        self.ttl9: TTLInOut = self.get_device("ttl9")
        self.ttl10: TTLInOut = self.get_device("ttl10")
        self.ttl11: TTLInOut = self.get_device("ttl11")

        self.setattr_result("clock_time")

        self.setattr_param("num_pulses", IntParam, "num_pulses", 1000)
        self.num_pulses: IntParam

    @kernel
    def run_once(self):
        self.core.reset()
        delay(1 * s)

        start_rtio = self.core.get_rtio_counter_mu()
        start_slack = now_mu() - start_rtio

        for _ in range(self.num_pulses.get()):
            with parallel:
                # self.ttl3.pulse(duration=0.1 * ms)
                # self.ttl5.pulse(duration=0.1 * ms)
                # self.ttl6.pulse(duration=0.1 * ms)
                # self.ttl7.pulse(duration=0.1 * ms)
                self.ttl9.pulse(duration=0.1 * ms)
                self.ttl10.pulse(duration=0.1 * ms)
                self.ttl11.pulse(duration=0.1 * ms)
            delay(0.1 * ms)

        end_rtio = self.core.get_rtio_counter_mu()
        end_slack = now_mu() - end_rtio

        print(
            "NDSCAN ELAPSED TIMER:",
            self.num_pulses.get(),
            "\nelapsed counter time",
            self.core.mu_to_seconds(end_rtio - start_rtio) * 1000,
            "ms\nelapsed slack time",
            self.core.mu_to_seconds(end_slack - start_slack) * 1000,
            "ms\nstart slack",
            self.core.mu_to_seconds(start_slack) * 1000,
            "ms\nend slack",
            self.core.mu_to_seconds(end_slack) * 1000,
            "ms",
        )

        self.clock_time.push(self.core.mu_to_seconds(end_rtio - start_rtio))

    def get_default_analyses(self):
        return [
            OnlineFit(
                "line",
                data={
                    "x": self.num_pulses,
                    "y": self.clock_time,
                },
            ),
        ]


ScanElapsedTime = make_fragment_scan_exp(ElapsedTimer)

Pure Artiq:

class ElapsedTimer(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.core: Core

        self.ttl10: TTLInOut = self.get_device("ttl10")
        self.ttl11: TTLInOut = self.get_device("ttl11")
        self.ttl9: TTLInOut = self.get_device("ttl9")
        self.ttl3: TTLInOut = self.get_device("ttl3")

        self.setattr_argument(
            "num_pulses",
            NumberValue(
                default=1000,
                precision=0,
                scale=1,
                step=1,
                min=1,
                max=10000,
                type="int",
            ),
            tooltip="Number of pulses to send",
        )

    @kernel
    def run(self):
        self.core.reset()
        delay(1 * s)

        start_rtio = self.core.get_rtio_counter_mu()
        start_slack = now_mu()-start_rtio

        for _ in range(self.num_pulses):
            with parallel:
                self.ttl3.pulse(duration=0.1 * ms)
                self.ttl9.pulse(duration=0.1 * ms)
                self.ttl10.pulse(duration=0.1 * ms)
                self.ttl11.pulse(duration=0.1 * ms)
            delay(0.1 * ms)

        end_rtio = self.core.get_rtio_counter_mu()
        end_slack = now_mu()-end_rtio

        print(
            "ELAPSED TIMER:",
            self.num_pulses,
            "\nelapsed counter time",
            self.core.mu_to_seconds(end_rtio - start_rtio)*1000,
            "ms\nelapsed slack time",
            self.core.mu_to_seconds(end_slack-start_slack)*1000,
            "ms\nstart slack",
            self.core.mu_to_seconds(start_slack)*1000,
            "ms\nend slack",
            self.core.mu_to_seconds(end_slack)*1000,
            "ms",
        )

Ooccheung · 25 Apr

hk99 Why is the slack suddenly consumed for large N?

SED lane capacity. I suggest you to read about the RTIO event scheduling architecture here. But here are a few main points:

The depth of each FIFO buffer is 128 by default, each pulse generates 2 events
If you run out of FIFO space, the kernel will stall and wait until space is available.

So after saturating the FIFO space, events need to be consumed before allowing more pulses to queue up.
You can expect the first pulse to occur approximately 1 second into the experiment due to the specified 1 s delay.
The sudden drop of slack is basically the kernel waiting for this to happen.

Ooccheung · 25 Apr

hk99 2. Why does the critical value of N vary erratically with the number of TTLs being toggled?

I assume you have read the docs by now.

You should be able to queue in more pulses for the 1 TTL loop if you enabled event spreading.

The loop with 3 TTLs and 4 TTLs queued in the same amount of events because the FIFO capacity is not fully utilized by event spreading when only controlling 3 TTLs. You may want to work out the SED lane that each RTIO event uses, steps-by-steps, on a piece of paper to observe the lane consumption pattern.

In the case of 3 TTLs loop, yours event loop "can be interpreted" as utilizing 4 spaces in lane N and 2 space in lane N+1, and move the cursor to N+2. There are 8 SED lanes. Since 8 and 2 are not co-primes, this consumption pattern will not be evened out by large number of recurrence.

The end result is even lanes are more heavily utilized than odd lanes.

Regarding other number of TTL that doesn't exhibit this kind of non-co-prime property (i.e. even large N), you can just calculate the naive large N by dividing the total SED lane capacity by RTIO events per iteration.

Hhk99 · 25 Apr

occheung

Thank you for such detailed replies. This makes a lot of sense now, I had assumed initially it was to do with lane consumption but the variation with TTL number threw me off.