Large mismatch between PC clock time and Kasli kernel time

Ddpeana · Nov 8, 2021

I replaced now_mu()'s with rtio_get_counter and the results are practically identical. This implies to me that there is a deeper problem and that the mismatch is real. It should be noted as well that if the 'delay(1000ms)' is set to be less than ~ 900ms, I get underflow and if it is > 1000ms (i.e. 3000ms), the timing mismatch still remains 1000*ms.

sb10q · Nov 9, 2021

sb10q And note that with "seamless handover" the remaining of your 1000ms delay rolls into the next kernel, and break_realtime() only ever increases the time cursor position.

Have you considered the consequences of this?

sb10q · Nov 9, 2021

Is it correct that if you remove the 1000ms delay (which should not be necessary) then you get a underflow error?
Where does the underflow occur? How large is the negative slack?
Normally if you use break_realtime in the second kernel, it should not underflow if you do dds.set() immediately afterwards.

Ddpeana · Nov 9, 2021

In answer to your first question, I am not sure I understand what you mean by consequences. I understand that that delay will carry over into the next kernel so the minimum time for the experiment will be given by that delay. But that is not the issue I am describing as I discussed in my answer immediately above this. I would expect that if it was just carry over from a previous kernel's delay, that the time would match the kernel delay time. i.e. If I had a 3000 * ms delay, I would measure a 3000 * ms pc time. That is not what I am observing. I am observing that for x * ms delay where x > 1000 * ms, I get the same 1000 * ms delay regardless of what x . However, if x < 1000*ms, I get underflow.

In answer to your second question, if I remove the 1000 * ms delay, I DO get underflow:

	from artiq.experiment import * 
	import numpy as np
	import time

	class Speedtest(EnvExperiment):
		def build(self):
			self.setattr_device("core")

			self.setattr_device("urukul2_cpld")
			self.urukul2_ch3 = self.get_device("urukul2_ch3")
			
		def run(self):
			self.kernel_init()
			t  = time.time()
			kernel_time = self.kernel_run()
			print("pc time (ms)", (time.time() - t)*1000)
			print("kernel time (ms)",kernel_time*1000)
		 
		@kernel
		def kernel_init(self):
			self.core.break_realtime()
			self.core.reset()
			self.urukul2_cpld.init()
			self.urukul2_ch3.init()
			delay(1*ms)
			
			self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777)
			self.urukul2_ch3.set_att(0.0)
			self.urukul2_ch3.sw.on()
			
			# delay(1100*ms)
		
		@kernel
		def kernel_run(self):
			t = self.core.get_rtio_counter_mu()
			self.core.break_realtime()
			self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777) # Adds 750ms
			self.urukul2_ch3.set_att(0.0) # Adds 120ms
			self.urukul2_ch3.sw.on()
			kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
			return kernel_time

This code yields:

INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19637 
ERROR:worker(19637,speedtest.py):root:Terminating with exception (RTIOUnderflow: RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu)
Core Device Traceback (most recent call last):
  File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository/speedtest.py", line 37, in artiq_worker_speedtest.Speedtest.kernel_run(..., ...) (RA=+0xa34)
	self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777) # Adds 750ms
  File "<artiq>\coredevice/ad9910.py", line 717, in ... artiq.coredevice.ad9910.AD9910.set<artiq.coredevice.ad9910.AD9910>(...) (inlined)
	return self.pow_to_turns(self.set_mu(
  File "<artiq>\coredevice/ad9910.py", line 524, in ... artiq.coredevice.ad9910.AD9910.set_mu<artiq.coredevice.ad9910.AD9910>(...) (inlined)
	self.write64(_AD9910_REG_PROFILE0 + profile,
  File "<artiq>\coredevice/ad9910.py", line 314, in ... artiq.coredevice.ad9910.AD9910.write64<artiq.coredevice.ad9910.AD9910>(...) (inlined)
	self.bus.set_config_mu(urukul.SPI_CONFIG, 8,
  File "<artiq>\coredevice/spi2.py", line 169, in ... artiq.coredevice.spi2.SPIMaster.set_config_mu<artiq.coredevice.spi2.SPIMaster>(...) (inlined)
	rtio_output((self.channel << 8) | SPI_CONFIG_ADDR, flags |
  File "ksupport/rtio.rs", line 67, column 14, in (Rust function)
	<unknown>
				 ^
artiq.coredevice.exceptions.RTIOUnderflow(0): RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu
Traceback (most recent call last):
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\master\worker_impl.py", line 300, in main
	exp_inst.run()
  File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository\speedtest.py", line 15, in run
	kernel_time = self.kernel_run()
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
	return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\core.py", line 137, in run
	self.comm.serve(embedding_map, symbolizer, demangler)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 644, in serve
	self._serve_exception(embedding_map, symbolizer, demangler)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 636, in _serve_exception
	raise python_exn
artiq.coredevice.exceptions.RTIOUnderflow: RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu
ERROR:master:artiq.master.scheduler:got worker exception in run stage, deleting RID 19637

sb10q · Nov 10, 2021

Let's simplify the code and look at problems one by one.

In kernel_init: You never need break_realtime immediately before (or after) reset. Remove.
In kernel_run: Try adding a small delay (e.g. 1ms) immediately after break_realtime and see if this resolves the underflow.
Most likely not the issue here but you want to use time.monotonic(), not time.time().
Why are you setting the frequency in kernel_init already? Can this be removed to simplify the example further?
After making these modifications, what is the output of the program?

Ddpeana · Nov 10, 2021

Your requested changes. I think removing the dds.set() commands removed the underflow problem but the 1000 * ms delay is still present. The first block represents break_realtime() + 1 ms delay. The second block represents 1100 ms without break_realtime(). The third block represents 1 ms without break_realtime().

from artiq.experiment import *
import numpy as np
import time

class Speedtest(EnvExperiment):
    def build(self):
        self.setattr_device("core")

        self.setattr_device("urukul0_cpld")
        self.dds = self.get_device("urukul0_ch3")
        
    def run(self):
        self.kernel_init()
        t  = time.monotonic()
        kernel_time = self.kernel_run()
        print("pc time (ms)", (time.monotonic() - t)*1000)
        print("kernel time (ms)",kernel_time*1000)
     
    @kernel
    def kernel_init(self):
        self.core.reset()
        self.urukul0_cpld.init()
        self.dds.init()
        delay(1*ms)
                
    @kernel
    def kernel_run(self):
        t = self.core.get_rtio_counter_mu()
        self.core.break_realtime()
        delay(1*ms)
        self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
        self.dds.set_att(0.0) # Adds 120ms
        self.dds.sw.on()
        kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
        return kernel_time

yields

INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19683
INFO:worker(19683,speedtest.py):print:pc time (ms) 1078.9999999979045
INFO:worker(19683,speedtest.py):print:kernel time (ms) 0.14468

and

from artiq.experiment import *
import numpy as np
import time

class Speedtest(EnvExperiment):
    def build(self):
        self.setattr_device("core")

        self.setattr_device("urukul0_cpld")
        self.dds = self.get_device("urukul0_ch3")
        
    def run(self):
        self.kernel_init()
        t  = time.monotonic()
        kernel_time = self.kernel_run()
        print("pc time (ms)", (time.monotonic() - t)*1000)
        print("kernel time (ms)",kernel_time*1000)
     
    @kernel
    def kernel_init(self):
        self.core.reset()
        self.urukul0_cpld.init()
        self.dds.init()
        delay(1*ms)
                
    @kernel
    def kernel_run(self):
        t = self.core.get_rtio_counter_mu()
        # self.core.break_realtime()
        delay(1100*ms)
        self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
        self.dds.set_att(0.0) # Adds 120ms
        self.dds.sw.on()
        kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
        return kernel_time

yields

INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19684
INFO:worker(19684,speedtest.py):print:pc time (ms) 1061.9999999980791
INFO:worker(19684,speedtest.py):print:kernel time (ms) 0.147608

and

from artiq.experiment import *
import numpy as np
import time

class Speedtest(EnvExperiment):
    def build(self):
        self.setattr_device("core")

        self.setattr_device("urukul0_cpld")
        self.dds = self.get_device("urukul0_ch3")
        
    def run(self):
        self.kernel_init()
        t  = time.monotonic()
        kernel_time = self.kernel_run()
        print("pc time (ms)", (time.monotonic() - t)*1000)
        print("kernel time (ms)",kernel_time*1000)
     
    @kernel
    def kernel_init(self):
        self.core.reset()
        self.urukul0_cpld.init()
        self.dds.init()
        delay(1*ms)
                
    @kernel
    def kernel_run(self):
        t = self.core.get_rtio_counter_mu()
        # self.core.break_realtime()
        delay(1*ms)
        self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
        self.dds.set_att(0.0) # Adds 120ms
        self.dds.sw.on()
        kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
        return kernel_time

yields

INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19686
ERROR:worker(19686,speedtest.py):root:Terminating with exception (RTIOUnderflow: RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu)
Core Device Traceback (most recent call last):
  File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository/speedtest.py", line 31, in artiq_worker_speedtest.Speedtest.kernel_run(..., ...) (RA=+0x9f4)
    self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
  File "<artiq>\coredevice/ad9910.py", line 717, in ... artiq.coredevice.ad9910.AD9910.set<artiq.coredevice.ad9910.AD9910>(...) (inlined)
    return self.pow_to_turns(self.set_mu(
  File "<artiq>\coredevice/ad9910.py", line 524, in ... artiq.coredevice.ad9910.AD9910.set_mu<artiq.coredevice.ad9910.AD9910>(...) (inlined)
    self.write64(_AD9910_REG_PROFILE0 + profile,
  File "<artiq>\coredevice/ad9910.py", line 314, in ... artiq.coredevice.ad9910.AD9910.write64<artiq.coredevice.ad9910.AD9910>(...) (inlined)
    self.bus.set_config_mu(urukul.SPI_CONFIG, 8,
  File "<artiq>\coredevice/spi2.py", line 169, in ... artiq.coredevice.spi2.SPIMaster.set_config_mu<artiq.coredevice.spi2.SPIMaster>(...) (inlined)
    rtio_output((self.channel << 8) | SPI_CONFIG_ADDR, flags |
  File "ksupport/rtio.rs", line 67, column 14, in (Rust function)
    <unknown>
                 ^
artiq.coredevice.exceptions.RTIOUnderflow(0): RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu
Traceback (most recent call last):
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\master\worker_impl.py", line 300, in main
    exp_inst.run()
  File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository\speedtest.py", line 15, in run
    kernel_time = self.kernel_run()
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
    return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\core.py", line 137, in run
    self.comm.serve(embedding_map, symbolizer, demangler)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 644, in serve
    self._serve_exception(embedding_map, symbolizer, demangler)
  File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 636, in _serve_exception
    raise python_exn
artiq.coredevice.exceptions.RTIOUnderflow: RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu
ERROR:master:artiq.master.scheduler:got worker exception in run stage, deleting RID 19686

sb10q · Nov 10, 2021

The underflow is expected if you remove break_realtime in the second kernel
kernel_run takes 148µs to execute on the core device. There is nothing seriously wrong with that.
It takes 1s to compile and load kernel_run. This is surprisingly long for such a simple kernel - I'm getting 1.2s (on a cheap Core i3 laptop).
Most of the time (about 1.1s on i3 laptop) is spent compiling self.dds.set(frequency = 50*MHz, amplitude = 0.777).
This can be easily reproduced without hardware by using artiq_compile.

Ddpeana · Nov 10, 2021

sb10q Is there any way to reduce this long compilation time? It only seems to really affect the DDS in this excessive way. If I remember correctly, Zotino.set() gave 100-200 ms. The rest of our experiment takes ~ 300 ms and waiting 1 + seconds every step of the experiment slows us down significantly. Is there a way to prevent recompiling every step?

Our experiment is similar to the speedtest code except the kernel call is itself in a for loop for variable scanning and averaging similar to:

from artiq.experiment import *
import numpy as np
import time

class Speedtest(EnvExperiment):
    def build(self):
        self.setattr_device("core")
        self.setattr_device("urukul0_cpld")
        self.dds = self.get_device("urukul0_ch3")
        
    def run(self):
        self.kernel_init()
        for i in range(10):
            t  = time.monotonic()
            kernel_time = self.kernel_run()
            print("pc time (ms)", (time.monotonic() - t)*1000)
            print("kernel time (ms)",kernel_time*1000)
     
    @kernel
    def kernel_init(self):
        self.core.reset()
        self.urukul0_cpld.init()
        self.dds.init()
        delay(1*ms)
                
    @kernel
    def kernel_run(self):
        t = self.core.get_rtio_counter_mu()
        self.core.break_realtime()
        delay(1*ms)
        self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
        self.dds.set_att(0.0) # Adds 120ms
        self.dds.sw.on()
        kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))        
        return kernel_time

this code yields:

INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19699
INFO:worker(19699,speedtest.py):print:pc time (ms) 1110.000000000582
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14796
INFO:worker(19699,speedtest.py):print:pc time (ms) 1063.0000000091968
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.1474
INFO:worker(19699,speedtest.py):print:pc time (ms) 1092.9999999934807
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.147872
INFO:worker(19699,speedtest.py):print:pc time (ms) 1093.9999999973224
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.1474
INFO:worker(19699,speedtest.py):print:pc time (ms) 1077.9999999940628
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14649600000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1078.0000000086147
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14785600000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1092.9999999934807
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14986400000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1093.9999999973224
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.148608
INFO:worker(19699,speedtest.py):print:pc time (ms) 1046.9999999913853
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14816800000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1062.000000005355
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.146

Its fine if it takes a few seconds to compile at very the beginning but I am wondering if there is a more efficient way to do this that prevents recompilation every run.

Jjonhood · Nov 10, 2021

1 second is a very long time to wait every time we want to set the dds in a kernel session. It also makes underflow errors difficult to manage without a lot of break_realtime()s. If we call multiple kernel functions then we only get the 1 s delay the first time we call dds.set(). But we get the 1s delay if we then go back to the computer and and then back to kernel. Does anyone know how we can speed this up?

sb10q · Nov 11, 2021

Is there any way to reduce this long compilation time?

Yes, NAC3 (on i9-9900K) takes 47µs to compile dds.set().

han94ros · Nov 11, 2021

jonhood Why are you going back to the computer? can you not set all the frequency values in the script and cycle through them so that you only have to compile the kernel session once?

Jjonhood · Nov 11, 2021

sb10q That is great! will nac3 be incorporated in artiq 7? how do we use it?

sb10q · Nov 11, 2021

NAC3 is currently experimental software, which will be released in ARTIQ-8 and initially for Linux/Nix only. nix shell github:m-labs/artiq/nac3 but many things still need more work at the moment, including porting most of the drivers.
To be clear, I do not recommend that you try it unless you are interested in the technical details.

Jjonhood · Nov 11, 2021

han94ros The run function (not kernel) has a for loop to scan variables. That code cannot be in kernel because we use the multiscanmanager to iterate. I don't see any other way around that.

Jjonhood · Nov 11, 2021

sb10q Well that is huge improvement! Nice work.

sb10q · Nov 12, 2021

jonhood I don't see any other way around that.

You can have the kernel make a RPC to get the next scan point. Instead of using a for loop to iterate, you can call next() explicitly on the iterator.
https://docs.python.org/3/library/functions.html#next
If you have complicated nested for loops, you can also make your own coroutine/iterator by using the yield statement.
If you don't have too many points, you can also just collect them into a list and send it to the device. So you would avoid the RPC latency between points.

han94ros · Nov 12, 2021

"If you don't have too many points, you can also just collect them into a list and send it to the device. So you would avoid the RPC latency between points."

This is the option I used to allow successive points to run within microsecond rather millisecond temporal resolution and allow for the inclusion of other functionalities e.g setting of RF on several channels or combinations of single tone and modulated outputs.