I replaced now_mu()'s with rtio_get_counter and the results are practically identical. This implies to me that there is a deeper problem and that the mismatch is real. It should be noted as well that if the 'delay(1000ms)' is set to be less than ~ 900ms, I get underflow and if it is > 1000ms (i.e. 3000ms), the timing mismatch still remains 1000*ms.
Large mismatch between PC clock time and Kasli kernel time
Is it correct that if you remove the 1000ms delay (which should not be necessary) then you get a underflow error?
Where does the underflow occur? How large is the negative slack?
Normally if you use break_realtime
in the second kernel, it should not underflow if you do dds.set()
immediately afterwards.
- Edited
In answer to your first question, I am not sure I understand what you mean by consequences. I understand that that delay will carry over into the next kernel so the minimum time for the experiment will be given by that delay. But that is not the issue I am describing as I discussed in my answer immediately above this. I would expect that if it was just carry over from a previous kernel's delay, that the time would match the kernel delay time. i.e. If I had a 3000 * ms delay, I would measure a 3000 * ms pc time. That is not what I am observing. I am observing that for x * ms delay where x > 1000 * ms, I get the same 1000 * ms delay regardless of what x . However, if x < 1000*ms, I get underflow.
In answer to your second question, if I remove the 1000 * ms delay, I DO get underflow:
from artiq.experiment import *
import numpy as np
import time
class Speedtest(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("urukul2_cpld")
self.urukul2_ch3 = self.get_device("urukul2_ch3")
def run(self):
self.kernel_init()
t = time.time()
kernel_time = self.kernel_run()
print("pc time (ms)", (time.time() - t)*1000)
print("kernel time (ms)",kernel_time*1000)
@kernel
def kernel_init(self):
self.core.break_realtime()
self.core.reset()
self.urukul2_cpld.init()
self.urukul2_ch3.init()
delay(1*ms)
self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777)
self.urukul2_ch3.set_att(0.0)
self.urukul2_ch3.sw.on()
# delay(1100*ms)
@kernel
def kernel_run(self):
t = self.core.get_rtio_counter_mu()
self.core.break_realtime()
self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777) # Adds 750ms
self.urukul2_ch3.set_att(0.0) # Adds 120ms
self.urukul2_ch3.sw.on()
kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
return kernel_time
This code yields:
INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19637
ERROR:worker(19637,speedtest.py):root:Terminating with exception (RTIOUnderflow: RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu)
Core Device Traceback (most recent call last):
File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository/speedtest.py", line 37, in artiq_worker_speedtest.Speedtest.kernel_run(..., ...) (RA=+0xa34)
self.urukul2_ch3.set(frequency = 100*MHz, amplitude = 0.777) # Adds 750ms
File "<artiq>\coredevice/ad9910.py", line 717, in ... artiq.coredevice.ad9910.AD9910.set<artiq.coredevice.ad9910.AD9910>(...) (inlined)
return self.pow_to_turns(self.set_mu(
File "<artiq>\coredevice/ad9910.py", line 524, in ... artiq.coredevice.ad9910.AD9910.set_mu<artiq.coredevice.ad9910.AD9910>(...) (inlined)
self.write64(_AD9910_REG_PROFILE0 + profile,
File "<artiq>\coredevice/ad9910.py", line 314, in ... artiq.coredevice.ad9910.AD9910.write64<artiq.coredevice.ad9910.AD9910>(...) (inlined)
self.bus.set_config_mu(urukul.SPI_CONFIG, 8,
File "<artiq>\coredevice/spi2.py", line 169, in ... artiq.coredevice.spi2.SPIMaster.set_config_mu<artiq.coredevice.spi2.SPIMaster>(...) (inlined)
rtio_output((self.channel << 8) | SPI_CONFIG_ADDR, flags |
File "ksupport/rtio.rs", line 67, column 14, in (Rust function)
<unknown>
^
artiq.coredevice.exceptions.RTIOUnderflow(0): RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu
Traceback (most recent call last):
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\master\worker_impl.py", line 300, in main
exp_inst.run()
File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository\speedtest.py", line 15, in run
kernel_time = self.kernel_run()
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\core.py", line 137, in run
self.comm.serve(embedding_map, symbolizer, demangler)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 644, in serve
self._serve_exception(embedding_map, symbolizer, demangler)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 636, in _serve_exception
raise python_exn
artiq.coredevice.exceptions.RTIOUnderflow: RTIO underflow at 2883970193352 mu, channel 65542, slack -3840 mu
ERROR:master:artiq.master.scheduler:got worker exception in run stage, deleting RID 19637
Let's simplify the code and look at problems one by one.
- In
kernel_init
: You never needbreak_realtime
immediately before (or after)reset
. Remove. - In
kernel_run
: Try adding a small delay (e.g. 1ms) immediately afterbreak_realtime
and see if this resolves the underflow. - Most likely not the issue here but you want to use
time.monotonic()
, nottime.time()
. - Why are you setting the frequency in
kernel_init
already? Can this be removed to simplify the example further? - After making these modifications, what is the output of the program?
- Edited
Your requested changes. I think removing the dds.set() commands removed the underflow problem but the 1000 * ms delay is still present. The first block represents break_realtime() + 1 ms delay. The second block represents 1100 ms without break_realtime(). The third block represents 1 ms without break_realtime().
from artiq.experiment import *
import numpy as np
import time
class Speedtest(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("urukul0_cpld")
self.dds = self.get_device("urukul0_ch3")
def run(self):
self.kernel_init()
t = time.monotonic()
kernel_time = self.kernel_run()
print("pc time (ms)", (time.monotonic() - t)*1000)
print("kernel time (ms)",kernel_time*1000)
@kernel
def kernel_init(self):
self.core.reset()
self.urukul0_cpld.init()
self.dds.init()
delay(1*ms)
@kernel
def kernel_run(self):
t = self.core.get_rtio_counter_mu()
self.core.break_realtime()
delay(1*ms)
self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
self.dds.set_att(0.0) # Adds 120ms
self.dds.sw.on()
kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
return kernel_time
yields
INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19683
INFO:worker(19683,speedtest.py):print:pc time (ms) 1078.9999999979045
INFO:worker(19683,speedtest.py):print:kernel time (ms) 0.14468
and
from artiq.experiment import *
import numpy as np
import time
class Speedtest(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("urukul0_cpld")
self.dds = self.get_device("urukul0_ch3")
def run(self):
self.kernel_init()
t = time.monotonic()
kernel_time = self.kernel_run()
print("pc time (ms)", (time.monotonic() - t)*1000)
print("kernel time (ms)",kernel_time*1000)
@kernel
def kernel_init(self):
self.core.reset()
self.urukul0_cpld.init()
self.dds.init()
delay(1*ms)
@kernel
def kernel_run(self):
t = self.core.get_rtio_counter_mu()
# self.core.break_realtime()
delay(1100*ms)
self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
self.dds.set_att(0.0) # Adds 120ms
self.dds.sw.on()
kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
return kernel_time
yields
INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19684
INFO:worker(19684,speedtest.py):print:pc time (ms) 1061.9999999980791
INFO:worker(19684,speedtest.py):print:kernel time (ms) 0.147608
and
from artiq.experiment import *
import numpy as np
import time
class Speedtest(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("urukul0_cpld")
self.dds = self.get_device("urukul0_ch3")
def run(self):
self.kernel_init()
t = time.monotonic()
kernel_time = self.kernel_run()
print("pc time (ms)", (time.monotonic() - t)*1000)
print("kernel time (ms)",kernel_time*1000)
@kernel
def kernel_init(self):
self.core.reset()
self.urukul0_cpld.init()
self.dds.init()
delay(1*ms)
@kernel
def kernel_run(self):
t = self.core.get_rtio_counter_mu()
# self.core.break_realtime()
delay(1*ms)
self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
self.dds.set_att(0.0) # Adds 120ms
self.dds.sw.on()
kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
return kernel_time
yields
INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19686
ERROR:worker(19686,speedtest.py):root:Terminating with exception (RTIOUnderflow: RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu)
Core Device Traceback (most recent call last):
File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository/speedtest.py", line 31, in artiq_worker_speedtest.Speedtest.kernel_run(..., ...) (RA=+0x9f4)
self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
File "<artiq>\coredevice/ad9910.py", line 717, in ... artiq.coredevice.ad9910.AD9910.set<artiq.coredevice.ad9910.AD9910>(...) (inlined)
return self.pow_to_turns(self.set_mu(
File "<artiq>\coredevice/ad9910.py", line 524, in ... artiq.coredevice.ad9910.AD9910.set_mu<artiq.coredevice.ad9910.AD9910>(...) (inlined)
self.write64(_AD9910_REG_PROFILE0 + profile,
File "<artiq>\coredevice/ad9910.py", line 314, in ... artiq.coredevice.ad9910.AD9910.write64<artiq.coredevice.ad9910.AD9910>(...) (inlined)
self.bus.set_config_mu(urukul.SPI_CONFIG, 8,
File "<artiq>\coredevice/spi2.py", line 169, in ... artiq.coredevice.spi2.SPIMaster.set_config_mu<artiq.coredevice.spi2.SPIMaster>(...) (inlined)
rtio_output((self.channel << 8) | SPI_CONFIG_ADDR, flags |
File "ksupport/rtio.rs", line 67, column 14, in (Rust function)
<unknown>
^
artiq.coredevice.exceptions.RTIOUnderflow(0): RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu
Traceback (most recent call last):
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\master\worker_impl.py", line 300, in main
exp_inst.run()
File "C:\Users\hoodl\Documents\GitHub\Artiq\Experiments\repository\speedtest.py", line 15, in run
kernel_time = self.kernel_run()
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\core.py", line 137, in run
self.comm.serve(embedding_map, symbolizer, demangler)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 644, in serve
self._serve_exception(embedding_map, symbolizer, demangler)
File "C:\Users\hoodl\.conda\envs\artiq-6\lib\site-packages\artiq\coredevice\comm_kernel.py", line 636, in _serve_exception
raise python_exn
artiq.coredevice.exceptions.RTIOUnderflow: RTIO underflow at 192374553848 mu, channel 2, slack -1086875216 mu
ERROR:master:artiq.master.scheduler:got worker exception in run stage, deleting RID 19686
- Edited
- The underflow is expected if you remove
break_realtime
in the second kernel kernel_run
takes 148µs to execute on the core device. There is nothing seriously wrong with that.- It takes 1s to compile and load
kernel_run
. This is surprisingly long for such a simple kernel - I'm getting 1.2s (on a cheap Core i3 laptop). - Most of the time (about 1.1s on i3 laptop) is spent compiling
self.dds.set(frequency = 50*MHz, amplitude = 0.777)
. - This can be easily reproduced without hardware by using
artiq_compile
.
- Edited
sb10q Is there any way to reduce this long compilation time? It only seems to really affect the DDS in this excessive way. If I remember correctly, Zotino.set() gave 100-200 ms. The rest of our experiment takes ~ 300 ms and waiting 1 + seconds every step of the experiment slows us down significantly. Is there a way to prevent recompiling every step?
Our experiment is similar to the speedtest code except the kernel call is itself in a for loop for variable scanning and averaging similar to:
from artiq.experiment import *
import numpy as np
import time
class Speedtest(EnvExperiment):
def build(self):
self.setattr_device("core")
self.setattr_device("urukul0_cpld")
self.dds = self.get_device("urukul0_ch3")
def run(self):
self.kernel_init()
for i in range(10):
t = time.monotonic()
kernel_time = self.kernel_run()
print("pc time (ms)", (time.monotonic() - t)*1000)
print("kernel time (ms)",kernel_time*1000)
@kernel
def kernel_init(self):
self.core.reset()
self.urukul0_cpld.init()
self.dds.init()
delay(1*ms)
@kernel
def kernel_run(self):
t = self.core.get_rtio_counter_mu()
self.core.break_realtime()
delay(1*ms)
self.dds.set(frequency = 50*MHz, amplitude = 0.777) # Adds 750ms
self.dds.set_att(0.0) # Adds 120ms
self.dds.sw.on()
kernel_time = self.core.mu_to_seconds((self.core.get_rtio_counter_mu() - t))
return kernel_time
this code yields:
INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:Speedtest', RID is 19699
INFO:worker(19699,speedtest.py):print:pc time (ms) 1110.000000000582
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14796
INFO:worker(19699,speedtest.py):print:pc time (ms) 1063.0000000091968
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.1474
INFO:worker(19699,speedtest.py):print:pc time (ms) 1092.9999999934807
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.147872
INFO:worker(19699,speedtest.py):print:pc time (ms) 1093.9999999973224
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.1474
INFO:worker(19699,speedtest.py):print:pc time (ms) 1077.9999999940628
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14649600000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1078.0000000086147
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14785600000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1092.9999999934807
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14986400000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1093.9999999973224
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.148608
INFO:worker(19699,speedtest.py):print:pc time (ms) 1046.9999999913853
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.14816800000000002
INFO:worker(19699,speedtest.py):print:pc time (ms) 1062.000000005355
INFO:worker(19699,speedtest.py):print:kernel time (ms) 0.146
Its fine if it takes a few seconds to compile at very the beginning but I am wondering if there is a more efficient way to do this that prevents recompilation every run.
1 second is a very long time to wait every time we want to set the dds in a kernel session. It also makes underflow errors difficult to manage without a lot of break_realtime()s. If we call multiple kernel functions then we only get the 1 s delay the first time we call dds.set(). But we get the 1s delay if we then go back to the computer and and then back to kernel. Does anyone know how we can speed this up?
Is there any way to reduce this long compilation time?
Yes, NAC3 (on i9-9900K) takes 47µs to compile dds.set()
.
- Edited
NAC3 is currently experimental software, which will be released in ARTIQ-8 and initially for Linux/Nix only. nix shell github:m-labs/artiq/nac3
but many things still need more work at the moment, including porting most of the drivers.
To be clear, I do not recommend that you try it unless you are interested in the technical details.
- Edited
jonhood I don't see any other way around that.
You can have the kernel make a RPC to get the next scan point. Instead of using a for loop to iterate, you can call next()
explicitly on the iterator.
https://docs.python.org/3/library/functions.html#next
If you have complicated nested for loops, you can also make your own coroutine/iterator by using the yield
statement.
If you don't have too many points, you can also just collect them into a list and send it to the device. So you would avoid the RPC latency between points.
- Edited
"If you don't have too many points, you can also just collect them into a list and send it to the device. So you would avoid the RPC latency between points."
This is the option I used to allow successive points to run within microsecond rather millisecond temporal resolution and allow for the inclusion of other functionalities e.g setting of RF on several channels or combinations of single tone and modulated outputs.