Hi, this is very embarrassing. I understand that there are many posts out there (like this one) with similar issues, but I find it hard to wrap my head around what needs to be done for my situation. This is a long post to explain my situation, so I shall bold my main questions to highlight them.

I have a digital delay generator (DDG) that gives out five TTL output channels with well defined phases between them. These pulses fire off at 10 Hz. We use four of these channels to trigger the flashlamps and Q-switches of our pulsed YAG lasers, and to keep the temporal separation of all the triggered laser pulses well defined. The fifth TTL channel signals t=0 for each new cycle in the 10 Hz cycle. I wish to synchronize Kasli (cu3) with the DDG, so that all the TTLs that my Kasli sends out are synchronized to my laser pulses triggered by the DDG.

I could, in principle, use Kasli to trigger the DDG, but each time the DDG resets its phase, the lasers take a few seconds to warm up to the new timings, so it is preferable to use the DDG to trigger Kasli, and have the DDG going continuously at 10 Hz. The lasers will keep firing at 10 Hz, and we will not use most of these pulses, but this is how the pulsed YAG lasers like to work.

I feel that I must be missing something very trivial, but is there a function that blocks in run() and releases the block immediately after it receives a rising edge in a TTL in?

I talked to another ARTIQ user and I obtained (and modified minimally) this code from them:

class MissingTrigger(Exception):
    pass

class ExternalTrigger(HasEnvironment):
    def build(self, trigger=None, t_timeout = 100*ms):
        self.setattr_device("core")
        self.trigger = trigger
        self.t_timeout = t_timeout

    def prepare(self):
        self.t_timeout_mu = self.core.seconds_to_mu(self.t_timeout)
        self.t_buffer_mu = self.core.seconds_to_mu(20*us)

    @kernel
    def wait_for_trigger(self):
        t_gate_open = now_mu()
        self.trigger._set_sensitivity(1)
        # Loop until all old (before current gate open) events are consumed, or
        # there is a timeout
        t_trig_mu = 0
        while True:
            # Wait for a trigger event for up to t_timeout_mu before returning
            t_trig_mu = rtio_input_timestamp(now_mu() + self.t_timeout_mu, self.trigger.channel)
            # If event if a timeout in the current gate period
            if t_trig_mu < 0 or t_trig_mu >= t_gate_open:
                break
        t_wall = self.core.get_rtio_counter_mu()
        at_mu(t_wall + self.t_buffer_mu)
        self.trigger._set_sensitivity(0)
        if t_trig_mu < 0:
            raise MissingTrigger()
        return t_trig_mu

class TTLin_block(EnvExperiment):
    def build(self):
        self.setattr_device("core")

        # Get all TTL out
        self.ttls = [ self.get_device("ttl"+str(i)) for i in range(4,64) ]
        self.start = self.get_device("ttl0")

    def prepare(self):
        self.lt = ExternalTrigger(self, self.start)
        self.lt.prepare()

    @kernel
    def run(self): # works, but 300 ns jitter
        self.core.reset()
        self.lt.wait_for_trigger()
        self.ttls[0].pulse(5*ms)

and I use my oscilloscope to look at the t=0 signal and the TTL pulse output from my ARTIQ crate, and I see a jitter in the temporal separation between these two pulses with a standard deviation of about 300 ns. The above code implements a blocking function with wait_for_trigger(), which is what I am looking for, but I am hoping to be able to push the jitter down to or below 50 ns. There is a >20 µs delay between the two pulses because of the self.t_buffer_mu, but I can easily handle this. Is there a more "RTIO-efficient" way to do this such that the timing jitter is reduced?

The timing jitter is there because you are not using the returned timestamp correctly - look at the last at_mu timing instruction, it has the non-deterministic RTIO counter value as input.
If you were not using that value, jitter would be <1ns.

Something like this should work:

while self.trigger.timestamp_mu(now_mu()) >= 0:  # flush buffer
    pass
delay(1.*us)  # slack
close_mu = self.trigger.gate_rising(101.*ms)
trigger_mu = self.trigger.timestamp_mu(close_mu)
if trigger_mu < 0:
    raise NoTrigger
at_mu(trigger_mu)

    Alternately, you could do all the timing with ARTIQ and set an idle kernel that maintains a constant duty cycle for the YAG when no experiment is explicitly running.

      sb10q Thank you! It works after some slight modifications. I am seeing jitter on a scale less than 1 ns (I believe my oscilloscope cannot even resolve the real jitter from here on). I just have some clarifying follow up questions below, once again highlighting some of the more important ones with bold text.

      I made some modifications to the code because of some RTIOUnderflows that I am seeing:

      while self.start.timestamp_mu(now_mu()) >= 0:  # flush buffer
          pass
      delay(5.*us)  # slack ; 1 µs is not enough, slack at -4000
      close_mu = self.start.gate_rising(101.*ms)
      trigger_mu = self.start.timestamp_mu(close_mu)
      if trigger_mu < 0:
          raise MissingTrigger()
      # at_mu(trigger_mu + self.core.seconds_to_mu(5.*us)) # will give large negative slack ~-100000
      at_mu(trigger_mu + self.t_buffer_mu) # this works
      
      self.ttls[0].pulse(5*ms)

      where self.t_buffer_mu = self.core.seconds_to_mu(5*us) is defined in prepare().

      I hope the fact that I need a delay longer than 1.*us does not mean that there is something wrong with my system. I also need a 5.*us delay with the at_mu(trigger_mu) to get around the -4000 slack RTIOUnderflow. Do you know why I would need this extra 5 µs delay with my setup (cu3)?

      Am I also right to assume that self.core.seconds_to_mu(5.*us) takes time to evaluate, so it is better to evaluate it beforehand, assign it to some variable, and invoke the variable like what I am doing above?

      I feel like I need to understand this better so that I can avoid similar mistakes in the future. When you say that at_mu in my original code has a non-deterministic RTIO counter value as input, are you referring to self.core.get_rtio_counter_mu()? And by non-deterministic, I assume you meant that get_rtio_counter_mu() only evaluates when it is invoked, so it is not deterministic ; whereas timestamp_mu(close_mu) is deterministic despite the fact that it still depends on when the next input event happens during gate_rising(101.*ms)?

        jbqubit This is certainly something that has crossed our minds; reducing the number of auxiliary instruments also makes everything way more elegant. I just do not want to be accused of not doing my homework before asking questions, but I do not mind receiving a slight push from more experienced users. I am still in the process of learning about all the types of kernels (e.g. idle, startup).

        The manual says that the idle kernel only runs when the core device is not connected to a PC via Ethernet. My impression from your reply is that the idle kernel runs when the kernel is idling. This would give me a constant 10 Hz cycle with well defined phases between the pulses, but when the kernel needs to run a job, wouldn't the 10 Hz cycle be disrupted? I believe I would have to tell the job to run its own 10 Hz cycle, unless the idle kernel can run in parallel with the submitted jobs. And it is also not immediately clear to me at this point how the submitted job knows about the phase of the 10 Hz cycle from the idle kernel, so that it knows when to jump in with the TTL output pulses that need to be synchronized with the 10 Hz cycle.

        It is just that every time the 10 Hz cycle gets restarted, the lasers take a few seconds to warm up to the new timing, and this would introduce a significant amount of dead time in the experiment. The solution implemented in my post above is good enough for our applications, but I am definitely interested to explore other solutions just to see what the system is capable of.

        The manual is incomplete. The idle kernel also runs automatically when there is no other kernel scheduled.

        If you want a task to run with consistent duty cycle spanning multiple experiment kernels (including the idle kernel), use the rtio_counter which is continuously incremented every 8 ns in most setups. Coerce all your experiments to rtio_counter%12500000=0. You may need to schedule an additional YAG pulse at the end of some kernels to account for the kernel-switch latency.

        ngkiaboon I hope the fact that I need a delay longer than 1.*us does not mean that there is something wrong with my system.

        Delays were just examples, I didn't tune them.

        ngkiaboon Am I also right to assume that self.core.seconds_to_mu(5.*us) takes time to evaluate, so it is better to evaluate it beforehand, assign it to some variable, and invoke the variable like what I am doing above?

        Compiler should do constant folding already and it shouldn't be necessary.

        ngkiaboon I feel like I need to understand this better so that I can avoid similar mistakes in the future. When you say that at_mu in my original code has a non-deterministic RTIO counter value as input, are you referring to self.core.get_rtio_counter_mu()?

        Yes. See https://m-labs.hk/artiq/manual/rtio.html