Question

Is there a way to load an experimental parameter and its value (e.g. {"MOT_frequ" : 180e6}) into the memory of the Kasli SoC master at run-time (e.g. by calling an @rpc-function) and to then retrieve that parameter's value at run-time from any and all @kernel-functions that run on the master without calling more @rpc-functions?

Background info

Imagine if you could run the same experiment in an infinite loop while changing parameter values on the fly, all without re-compiling. The following structure comes to mind:

  • Experimental parameters and their values are saved on the host machine in .toml files, which are monitored by a watchdog (on pypi.org).
  • The host machine keeps all parameters and their values loaded into RAM at all times and refreshes the relevant ones every time a .toml file is saved by the user.
  • Every @kernel-function that contains at least 1 experimental parameter is written so that it obtains all relevant parameter values at run-time from the host machine by calling an @rpc-function, which returns the relevant parameter values.

Clearly, the average experiment will call tens of not hundreds of @rpc-functions within a single run. If those calls over Ethernet return too slowly, an RTIO underflow will be caused.

Solution: Wouldn't it be ideal to load all parameters and their values into the memory of the Kasli SoC master once per run by calling a single @rpc-function at the beginning of the experiment? And to then access these parameters from any and all @kernel-functions on the master without calling any more @rpc-functions?

Relevant posts

dtsevas changed the title to Transfer values to master once and then access them from all kernel functions .

rjo The experiment further below compares the speed of parameter loading from the coredevice's cache to the speed of parameter loading from RPC calls. These are typical results for a Kasli SoC v1.1.1 with a direct Ethernet cable to the host machine:

---- Kasli SoC parameter cache for 200 parameters ----
get_parameters(): 1.9 ms (run once at beginning of sequence, mostly irrelevant)
core_cache.put(): 0.04 ms (no clue why so quick)
core_cache.get(): 3.5 ms (synchronous)
---- Individual RPC-calls for 200 parameters ----
200 x get_parameter(): 24.2 ms (synchronous)

Questions

  1. CoreCache.put(key, value) accepts only a list of integers. Is it technically possible to modify the firmware so that a few more or even all fundamental data types can be saved into the cache? Most important to us would be TFloat and in second place TBool. If this is technically possible, please tell me if you would consider doing this as a funded development! (If yes again, I would like to send you an email to do this.)
  2. Calling CoreCache.put(key, value) consecutively several times returns after a few tens of microseconds no matter if we store 100 or 1000 parameters in the cache. Why do those calls return so fast?
  3. The duration of a single call to CoreCache.get(key) increases quickly with the number of parameters. As a result: At 100 parameters, consecutive calls to CoreCache.get(key) are 10 times faster than consecutive RPC calls; at 1000 parameters, only 2 times faster. (You can verify these factors using my experiment code below.) Can this scaling of CoreCache.get(key)'s duration be improved upon or is it enforced by the Kasli SoC's CPU speed?

Experiment

Simply choose values for self.N_param and self.verbose inside the build-function and then run the experiment on any Kasli or Kasli SoC:

from artiq.experiment import EnvExperiment, kernel, rpc
from artiq.language.types import TNone, TBool, TInt32, TStr, TList, TTuple
from artiq.language.units import ns, us, ms, s, MHz
import numpy as np
import time

class TestCache(EnvExperiment):

    def build(self) -> TNone:
        self.setattr_device("core") # artiq.coredevice.core.Core
        self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
        self.N_param = 200
        self.verbose = False
        self.counter = 0

    def get_parameters(self) -> TTuple([TList(TStr), TList(TInt32)]):
        return (["a"*i for i in range(1, self.N_param+1)], [i for i in range(1, self.N_param+1)])

    def get_parameter(self) -> TInt32:
        self.counter += 1
        return self.counter

    @rpc(flags={"async"})
    def set_time(self, label : TStr) -> TNone:
        setattr(self, label, time.time())
    
    @rpc(flags={"async"})
    def print_times(self) -> TNone:
        time.sleep(1*s)
        print(f"---- Kasli SoC parameter cache for {self.N_param} parameters ----")
        print("get_parameters():", np.around(1e3*(self.t1-self.t0), 1), "ms (run once at beginning of sequence, mostly irrelevant)")
        print("core_cache.put():", np.around(1e3*(self.t3-self.t2), 2), "ms (no clue why so quick)")
        print("core_cache.get():", np.around(1e3*(self.t5-self.t4), 1), "ms (synchronous)")
        print(f"---- Individual RPC-calls for {self.N_param} parameters ----")
        print(self.N_param, "x get_parameter():", np.around(1e3*(self.t6-self.t5), 1), "ms (synchronous)")

    @kernel
    def run(self) -> TNone:
        self.core.reset()
        batch_list, indiv_list = [0], [0]
        self.set_time("t0")
        labels, values = self.get_parameters()
        self.set_time("t1")
        delay(500*ms)
        self.core.wait_until_mu(now_mu()) # make sure previous operation has completed
        self.set_time("t2")
        for i in range(self.N_param):
            self.core_cache.put(labels[i], [values[i]])
        self.set_time("t3")
        delay(500*ms)
        self.core.wait_until_mu(now_mu()) # make sure previous operation has completed
        self.set_time("t4")
        for i in range(self.N_param):
            if self.verbose:
                batch_list = batch_list + [self.core_cache.get(labels[i])[0]]
            else:
                x = self.core_cache.get(labels[i])[0]
        self.set_time("t5")
        for i in range(self.N_param):
            if self.verbose:
                indiv_list = indiv_list + [self.get_parameter()]
            else:
                x = self.get_parameter()
        self.set_time("t6")
        if self.verbose:
            print("batch_list =", batch_list)
            print("indiv_list =", indiv_list)
        self.print_times()

rjo I just sent an email to sales@m-labs.hk to request a quote for a funded development. (Assessing if TFloat could be stored in the Kasli SoC's on-board cache.)

@rjo Do you have any thoughts on my questions 2 and 3 above?