Difficulty tracking cause of underflow in experiment

EEnrique_Garcia · 20 Feb

Hi ARTIQ community,

I’ve recently been trying to implement a dynamical decoupling sequence using a TTL channel and the phaserboard (upconverter varient). The sequence looks like this:

The green (TTL) initialization and readout are typically about 1 us and a few hundred nanoseconds respectively. The RF pulses are generally in the 10’s of nanoseconds (50-100) if we disregard the minimum pulse width from the phaser board. Tau is swept variable, but generally starts in the low 10’s of nanoseconds as well. Ultimately the pulse sequence is very fast.

To achieve longer experiments without underflow, I’ve implemented it all with DMA, where a single pulse sequence is recorded for all values of tau, and are then repeated however times we specify:

   @kernel
   def record(self, name, index):
       with self.core_dma.record(name):
           # self.core.break_realtime() DO NOT BREAK REALTIME IN DMA
           delay(self.post_rf)
           cursor = now_mu()
           self.phaser0.channel[1].oscillator[0].set_amplitude_phase(self.rf_amplitude, phase=0.)
           delay(self.rf_pi2_time)
           self.phaser0.channel[1].oscillator[0].set_amplitude_phase(0.0, phase=0.)
           # delay(self.readout_padding)  
           for iteration in range(self.N_iterations):  #rf sequence for every iteration
               delay(self.tau_list[index])
               self.phaser0.channel[1].oscillator[0].set_amplitude_phase(self.rf_amplitude, phase=0.)
               delay(self.rf_pi_time)
               self.phaser0.channel[1].oscillator[0].set_amplitude_phase(0.0, phase=0.)
               # delay(self.readout_padding)
               delay(self.tau_list[index])
               delay(self.tau_list[index])
               self.phaser0.channel[1].oscillator[0].set_amplitude_phase(self.rf_amplitude, phase=0.)
               delay(self.rf_pi_time)
               self.phaser0.channel[1].oscillator[0].set_amplitude_phase(0.0, phase=0.)
               # delay(self.readout_padding)
               delay(self.tau_list[index])


           self.phaser0.channel[1].oscillator[0].set_amplitude_phase(self.rf_amplitude, phase=0.)
           delay(self.rf_pi2_time)
           self.phaser0.channel[1].oscillator[0].set_amplitude_phase(0.0, phase=0.)
           delay(self.readout_padding)
           # green laser sequence
           at_mu(cursor)
           delay(self.ttl_start) #ttl/phaser delay compensation
           cursor = now_mu()

           self.ttl6.pulse(self.green_init_duration)
           delay(self.rf_pi2_time)
           delay(self.all_N_pulse_repetition_length[index])
           delay(self.rf_pi2_time)
           delay(self.readout_padding)


           cursor = now_mu()  # updates to right where readout happens
           self.ttl6.on()
           delay(self.counting_duration)
           self.ttl6.off()


           at_mu(cursor)
           delay(self.ttl_delay) #Photon Counting/Green delay compensation
           self.ttl0_counter.set_config(count_rising=True, count_falling=False, send_count_event=False,
                                        reset_to_zero=False)
           delay(self.counting_duration)
           self.ttl0_counter.set_config(count_rising=False, count_falling=False, send_count_event=False,
                                        reset_to_zero=False)
           delay(self.post_rf)

Naming is a little off here but self.post_rf is just delay padding on either end of the experiment.

In the context of the run statement in the experiment:

   @kernel
   def run(self):
       self.turn_off_laser()
       self.phaser_init()
       self.phaser0.channel[1].set_duc_frequency((self.freq_resonant - self.freq_center) * MHz)  # set frequency (offset from center)
       self.phaser0.duc_stb()
       print("list of all times:", self.tau_list, "data size:", self.data_size)
       self.set_dataset("times", self.tau_list, broadcast=False)
       self.set_dataset("on", np.full(self.data_size, np.nan), broadcast=False)
       self.core.reset()
       self.core.break_realtime()
       for i in range(self.data_size):
           print(i)
           self.core.break_realtime()
           handle = self.core_dma.get_handle(self.DMA_names[i]) 
           delay(500*us)
           self.ttl0_counter.set_config(count_rising=False, count_falling=False, send_count_event=False,
                                        reset_to_zero=True)
           for j in range(self.n_cycles):
               self.core_dma.playback_handle(handle)
           self.ttl0_counter.set_config(count_rising=False, count_falling=False, send_count_event=True,
                                        reset_to_zero=False)
           self.mutate_dataset("on", i, self.ttl0_counter.fetch_count())
       self.core.break_realtime()
       self.ttl6.off()
       self.delete_all() #delete DMA recordings
       self.core.break_realtime()
       self.phaser0.set_cfg(dac_txena=0)
       print(self.data_size)
       print("experiment done")

The main issue I have is that there seems to be a hard cutoff for the N_iterations (N in figure) we specify in the code. Typically, I deal with underflow by adding delay between each sequence repetition that pads the slack loss during the experiment. However, I find that at a certain N_iterations value (62), no matter how much delay I pad the experiment by, it will always underflow.

ARTIQ will always say that the underflow occurs within the handle playback operation:

root:Terminating with exception (RTIOUnderflow: RTIO underflow at 2852735500885580 mu, channel 23)
Core Device Traceback:
Traceback (most recent call first):
File "ksupport/lib.rs", line 416, column 18, in (Rust function)
<unknown>
^
File "C:\Users\DQP\repos\dqpcontrol\artiq-master\repository\dds_enrique_DMA.py", line 75, in artiq_worker_dds_enrique_DMA.DDS_DMA.run(..., ...) (RA=+0xc80)
self.core_dma.playback_handle(handle)
File "<artiq>\coredevice\dma.py", line 114, in ?? (RA=+0x155c)
dma_playback(now_mu(), ptr)
artiq.coredevice.exceptions.RTIOUnderflow(0): RTIO underflow at 2852735500885580 mu, channel 23

End of Core Device Traceback
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\master\worker_impl.py", line 343, in main
exp_inst.run()
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\coredevice\core.py", line 140, in run
self._run_compiled(kernel_library, embedding_map, symbolizer, demangler)
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\coredevice\core.py", line 130, in run_compiled
self.comm.serve(embedding_map, symbolizer, demangler)
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\coredevice\comm_kernel.py", line 716, in serve
self.serve_exception(embedding_map, symbolizer, demangler)
File "C:\ProgramData\anaconda3\envs\artiq\lib\site-packages\artiq\coredevice\comm_kernel.py", line 698, in _serve_exception
raise python_exn
artiq.coredevice.exceptions.RTIOUnderflow: RTIO underflow at 2852735500885580 mu, channel 23

Analysis with gtkWave after exporting the vcd shows that the experiment seems to underflow at the first TTL pulse. (this is at large time value due to other experiments also getting in the VCD)

My assumption is that this is tied to the DMA playback in some capacity, but can’t seem to point to anything specific that I could change in my code to address it.

I've tried adding delays in many spots to compensate for the slack, but none have worked. Some things I have tried:

Delays at various stages before playback_handle is called
Increasing delay at start/end of sequence within the DMA
Increasing starting Tau and pi_pulse time (100’s ns)
removing print statements

Hopefully I’ve provided a good amount of information, but if you want to help and need any more additional information, I will be happy to provide it. Thanks!

Rrjo · 24 Feb

I don't think absolute timestamps (now_mu, at_mu) will work in DMA context. Are you sure your code is fine for small N_iterations? Even if they did, you'd have a lot of SED lane churn which is not helpful.

EEnrique_Garcia · 25 Feb

rjo Thanks for the response! From my experience, the absolute timestamps do work for the sequences in DMA, and I've used this strategy to more simply compensate for relative delays between channels in my experiments before. The alternative (that I know of) would be to wrap with sequential statements in a with parallel block, which I implemented this morning. Unfortunately, I found that the N_iterations limitation was lower for the parallel/sequential method than for the current version with the same parameters. I may explore a bit more to see if there are advantages to it but currently I'm not aware of any. As for small N_iterations (and larger ones as well), I've viewed the output waveforms on an oscilloscope before to verify the proper operation. Here's an example from an N_iterations = 2 I captured last week:

As for your other comment on the lane churn, my understanding of the lanes is still pretty elementary, and so I mostly understand that I should avoid scheduling too many events at the same time to avoid sequencing errors. Could you explain the "lane churn" a little more, and how the current code implementation has a lot of it? A link to somewhere to read about it would work to. I appreciate the feedback!

Ooccheung · 25 Feb

Enrique_Garcia Analysis with gtkWave after exporting the vcd shows that the experiment seems to underflow at the first TTL pulse. (this is at large time value due to other experiments also getting in the VCD)

I can't tell much for that VCD screenshot. All I see is slack < 0 (so it underflow) somewhere.

My guess is that the SED lane was full when issuing the RF sequence, so it needs to wait for the FIFO to be available, and then underflow when you go back in time to deal with the green laser.

If you want to track the cause of underflow, I recommend giving a bit more detail on the RTIO event/observable trend in the stack vs RTIO event plot and analyze. For example,

It is obvious in this case that we have used-up some particular SED FIFO space, and the sequence is gradually losing slack anyway, so it causes an underflow.