rjo The experiment further below compares the speed of parameter loading from the coredevice's cache to the speed of parameter loading from RPC calls. These are typical results for a Kasli SoC v1.1.1 with a direct Ethernet cable to the host machine:
---- Kasli SoC parameter cache for 200 parameters ----
get_parameters(): 1.9 ms (run once at beginning of sequence, mostly irrelevant)
core_cache.put(): 0.04 ms (no clue why so quick)
core_cache.get(): 3.5 ms (synchronous)
---- Individual RPC-calls for 200 parameters ----
200 x get_parameter(): 24.2 ms (synchronous)
Questions
CoreCache.put(key, value)
accepts only a list of integers. Is it technically possible to modify the firmware so that a few more or even all fundamental data types can be saved into the cache? Most important to us would be TFloat
and in second place TBool
. If this is technically possible, please tell me if you would consider doing this as a funded development! (If yes again, I would like to send you an email to do this.)
- Calling
CoreCache.put(key, value)
consecutively several times returns after a few tens of microseconds no matter if we store 100 or 1000 parameters in the cache. Why do those calls return so fast?
- The duration of a single call to
CoreCache.get(key)
increases quickly with the number of parameters. As a result: At 100 parameters, consecutive calls to CoreCache.get(key)
are 10 times faster than consecutive RPC calls; at 1000 parameters, only 2 times faster. (You can verify these factors using my experiment code below.) Can this scaling of CoreCache.get(key)
's duration be improved upon or is it enforced by the Kasli SoC's CPU speed?
Experiment
Simply choose values for self.N_param
and self.verbose
inside the build-function and then run the experiment on any Kasli or Kasli SoC:
from artiq.experiment import EnvExperiment, kernel, rpc
from artiq.language.types import TNone, TBool, TInt32, TStr, TList, TTuple
from artiq.language.units import ns, us, ms, s, MHz
import numpy as np
import time
class TestCache(EnvExperiment):
def build(self) -> TNone:
self.setattr_device("core") # artiq.coredevice.core.Core
self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
self.N_param = 200
self.verbose = False
self.counter = 0
def get_parameters(self) -> TTuple([TList(TStr), TList(TInt32)]):
return (["a"*i for i in range(1, self.N_param+1)], [i for i in range(1, self.N_param+1)])
def get_parameter(self) -> TInt32:
self.counter += 1
return self.counter
@rpc(flags={"async"})
def set_time(self, label : TStr) -> TNone:
setattr(self, label, time.time())
@rpc(flags={"async"})
def print_times(self) -> TNone:
time.sleep(1*s)
print(f"---- Kasli SoC parameter cache for {self.N_param} parameters ----")
print("get_parameters():", np.around(1e3*(self.t1-self.t0), 1), "ms (run once at beginning of sequence, mostly irrelevant)")
print("core_cache.put():", np.around(1e3*(self.t3-self.t2), 2), "ms (no clue why so quick)")
print("core_cache.get():", np.around(1e3*(self.t5-self.t4), 1), "ms (synchronous)")
print(f"---- Individual RPC-calls for {self.N_param} parameters ----")
print(self.N_param, "x get_parameter():", np.around(1e3*(self.t6-self.t5), 1), "ms (synchronous)")
@kernel
def run(self) -> TNone:
self.core.reset()
batch_list, indiv_list = [0], [0]
self.set_time("t0")
labels, values = self.get_parameters()
self.set_time("t1")
delay(500*ms)
self.core.wait_until_mu(now_mu()) # make sure previous operation has completed
self.set_time("t2")
for i in range(self.N_param):
self.core_cache.put(labels[i], [values[i]])
self.set_time("t3")
delay(500*ms)
self.core.wait_until_mu(now_mu()) # make sure previous operation has completed
self.set_time("t4")
for i in range(self.N_param):
if self.verbose:
batch_list = batch_list + [self.core_cache.get(labels[i])[0]]
else:
x = self.core_cache.get(labels[i])[0]
self.set_time("t5")
for i in range(self.N_param):
if self.verbose:
indiv_list = indiv_list + [self.get_parameter()]
else:
x = self.get_parameter()
self.set_time("t6")
if self.verbose:
print("batch_list =", batch_list)
print("indiv_list =", indiv_list)
self.print_times()