We have a system where a kasli SoC v1.1 (master) connects directly to 3 kasli v2.0.2 (satellites). After updating the gateware to the latest release-8 branch (on all kaslis) we have problems with the startup kernel below:
from artiq.experiment import *
class StartupKernel(EnvExperiment):
def build(self):
print("Generating startup kernel...")
self.core = self.get_device("core")
@kernel
def run(self):
self.core.reset()
print("Starting startup-kernel...")
self._wait_for_drtio_destinations()
print("Startup-kernel finished.")
self.core.wait_until_mu(now_mu())
@kernel
def _wait_for_drtio_destinations(self):
for d in range(4):
while not self.core.get_rtio_destination_status(d):
delay(100 * ms)
self.core.wait_until_mu(now_mu())
The master cannot establish the link to the satellites and it prints on the UART:
[ 2.853581s] INFO(runtime::comms): Loading startup kernel...
[ 2.859356s] INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
[ 2.866080s] INFO(runtime::comms): Starting startup kernel...
[ 2.871910s] INFO(ksupport::kernel::core1): kernel starting
[ 2.877566s] INFO(ksupport::kernel::api): kernel: Starting startup-kernel...
[ 5.091775s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 25.098963s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] ping failed
[ 25.105131s] INFO(runtime::rtio_mgt::drtio): [LINK#2] link RX became up, pinging
[ 45.111915s] ERROR(runtime::rtio_mgt::drtio): [LINK#2] ping failed
[ 45.318912s] INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[ 65.325877s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
[ 65.332044s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 85.338840s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] ping failed
[ 85.345000s] INFO(runtime::rtio_mgt::drtio): [LINK#2] link RX became up, pinging
[ 105.351800s] ERROR(runtime::rtio_mgt::drtio): [LINK#2] ping failed
[ 105.558792s] INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[ 125.565761s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
[ 125.571922s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 145.578716s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] ping failed
[ 145.584877s] INFO(runtime::rtio_mgt::drtio): [LINK#2] link RX became up, pinging
[ 165.591676s] ERROR(runtime::rtio_mgt::drtio): [LINK#2] ping failed
[ 165.798672s] INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[ 185.805640s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
[ 185.811800s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 205.818598s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] ping failed
[ 205.824764s] INFO(runtime::rtio_mgt::drtio): [LINK#2] link RX became up, pinging
[ 225.831560s] ERROR(runtime::rtio_mgt::drtio): [LINK#2] ping failed
[ 226.038552s] INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[ 246.045516s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
[ 246.051677s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link RX became up, pinging
[ 246.259513s] INFO(runtime::rtio_mgt::drtio): [LINK#1] remote replied after 1 packets
[ 246.470514s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] failed to load routing table (timed out)
[ 246.678521s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] failed to set rank (timed out)
[ 246.686333s] INFO(runtime::rtio_mgt::drtio): [LINK#1] link initialization completed
[ 246.694058s] INFO(runtime::rtio_mgt::drtio): [LINK#2] link RX became up, pinging
[ 266.700477s] ERROR(runtime::rtio_mgt::drtio): [LINK#2] ping failed
[ 266.906480s] ERROR(runtime::rtio_mgt::drtio): [DEST#2] communication failed (timed out)
While on the satellites the UART prints [ 4.589290s] WARN(satman): received unexpected aux packet
multiple times.
On the other hand, if I rewrite the startup kernel such that it does not wait for self.core.get_rtio_destination_status(d)
and does nothing the link can be established without a problem.
I think what is happening is that self.core.get_rtio_destination_status(d)
always returns False. And somehow, while the startup kernel is running, communication to the satellites fails.
How can we fix this?