I had issues with a Kasli 2.0 (configured as a satellite) crashing randomly after operating it for a few days. The master (a Kasli 1) suddenly lost the link (which caused "aux_packet" errors in the dashboard) to the satellite and reestablished it after ~ 10 minutes. After some investigation, I found out that the satellite Kasli is much hotter (>120°C) compared to the master Kasli (~ 75 °C). I read out the temperature via artiq_flash
, but I modified the scripts that are called, so that the respective Kasli is not restarted (verified via the serial monitor).
This is the temperature graph of the satellite over a couple of hours. You can see, that the temperature is pretty high in general and that there are spikes and drops of the temperature. Red circles indicate an initialization of all devices on the satellite and blue circle indicate a crash of the satellite.
There is a space of one full rack units above and below the ARTIQ crate and the back is completely open. By adding a relatively large fan on top, I was able to reduce the temperature enough that no crashes happen, but it is still above 100 °C.
What is the expected space around the crate with a Kasli 2, so that the temperature is in a range around 70°C? Why is the Kasli 2 so much hotter?