Regarding Zotino, we are having a difficult time ramping faster than 20 us with Zotino using for loops without getting underflow errors. As can be seen in the provided code, adding additional slack does not seem to alleviate this problem. We wanted to ask if this is possible or if DMA would be required for this? This is the code we were using:

        class ZotinoTestLoopVersion(EnvExperiment):
            def build(self):
               self.setattr_device("core")
                self.setattr_device("zotino0") #This is the Output Channel 1
                self.dac=self.zotino0 #Defines outputchannel_1

            @kernel
            def run(self):

                self.core.reset() #Clears the core
                self.dac.init()
                delay(1*ms) #Delay
                times = self.core.seconds_to_mu(20*us)
                data = [0]*100
                for i in range(0,99,1):
                    delay_mu(times)
                    data[i] = self.dac.voltage_to_mu(i/10)
                self.core.break_realtime()
                delay(2*s)
                for i in range(100):
                    for i in range(0,99,1):
                        delay_mu(times)
                        self.dac.set_dac_mu([data[i]],[7])
                    delay_mu(times)
                    self.dac.set_dac_mu([0],[7])

You should try this with DMA, there is an example of TTL loops in the manual.

We tried DMA with Zotino and got the timing down to 1.5 us. Zotino is 1 MS/s, so I don't know why it failed to output anything below 1.5 us. There wasn't an error. Fastino should go twice as fast.
Using the DMA method, I timed how long it took to record a ramp with 100 points. It took 20ms for the record function to run on the kernel. That seems like a long time for only a 100 points. I eventually will have many ramps in an experimental sequence, and I cannot have it taking 1 second just to load the ramps. Any suggestions?
My code is below:

 from artiq.experiment import *
 import time

 class DMA_zotino(EnvExperiment):
       def build(self):
           self.setattr_device("core")
           self.setattr_device("core_dma")
           self.setattr_device("zotino0") 
           self.dac=self.zotino0 

 @kernel
 def record(self):
     with self.core_dma.record("ramp"):
         for i in range(0,100,1):
             delay(1.5*us)
             self.dac.set_dac([i/10],[7])
         self.dac.set_dac_mu([0],[7])

 @kernel
 def run(self):
     self.core.reset()
     self.dac.init()
     timeee = self.core.get_rtio_counter_mu()
     nowtime = now_mu()
     self.record()
     print(self.core.mu_to_seconds(self.core.get_rtio_counter_mu() - timeee))
     print(self.core.mu_to_seconds(now_mu() - nowtime))
    
     pulses_handle = self.core_dma.get_handle("ramp")
     self.core.break_realtime()
     # delay(40*ms)
     for i in range(100000):
         self.core_dma.playback_handle(pulses_handle)

    jonhood why are you starting the experiment and then recording? .... you can optimise this. What I have done is to record all the DMA pulses beforehand, get the handle and attribute these to predefined variable list where each variable is for a stage in the experimental sequence - then within the actual run section (where you start nowtime()), playback them . I am also running many ramps in an experimental sequence and here is a snippet of my code.

    `@kernel
    def run(self):
        self.core.reset()
        self.record()
        for m in self.list_of_active_stages:
            
            self.list_of_pulses_handle[m] = self.core_dma.get_handle(self.list_of_stages[m])

      han94ros Thank you for the advice! Our only concern with pre-recording all of our ramps at the beginning of the experiment would be running out of memory in artiq. How many ramps/points do you preload in your experiment? Have you run into any memory overflow problems yet? Do you happen to know the size of the available memory in artiq?

        Our entire experiment lasts only 500 ms. If the record() function takes several seconds, then that will limit how quickly we can repeat the experiment. It also just seems like recording 1000 points shouldn't take 200ms. Han94ros, how long does your record function take? Thanks for your help.

          dpeana I have preloaded 140 ramps with 25 points each. The memory specifications can be found in the wiki for the Kasli, I suppose you have Kasli version 2. It uses the Artix-7 FPGA so I would recommend checking its datasheet, it should be 720 Kb. I don't know how to check the memory usuage and if you find out can you please let me know. Aside from that I don't see why it should not be able to hold several lists of floating point values in memory, it really shouldn't be a problem if you are only making use of the one urukul device (ad9910) and a zotino.

            jonhood I am recording before running the experiment as I shown in the code I have posted. I am in the process of testing the entire control sequence now and will get back to you if there are issues.

            jonhood You should create a separate initialisation script to initialise the hardware e.g Urukul and Zotino as they can take 100s of microseconds. That script can then be used as the base class for the subsequent experiments. Likewise for optimisation of the individual variables I have found it is best to set these parameters in a separate scheduler script making use of get_argument function to override the original values and scheduler.submit rather than running optimisation within the main experiment. Other than complicating the script it will obviously increase the kernel compilation time.

            han94ros The memory specifications can be found in the wiki for the Kasli, I suppose you have Kasli version 2. It uses the Artix-7 FPGA so I would recommend checking its datasheet, it should be 720 Kb.

            Kasli is using SDRAM to store the DMA sequences, not the on-chip FPGA memory.

              sb10q Sorry for the incorrect information, could you please delete my original reply and this message as well.