Hi,
our Kasli occasionally starts making the following noise:
https://berkeley.box.com/s/xas154cegcm20iopufm6e2t97ykuhs9l
It doesn't seem to affect operations (yet). Any idea what it could be? It sounds to low frequency for a fan.
Malte
Hi,
our Kasli occasionally starts making the following noise:
https://berkeley.box.com/s/xas154cegcm20iopufm6e2t97ykuhs9l
It doesn't seem to affect operations (yet). Any idea what it could be? It sounds to low frequency for a fan.
Malte
Actually this does sound like a fan problem, maybe something brushing against the blades.
It was indeed the tiny fan on our slave Kasli v2.0. We saw overheating issues and checked the temperature which showed ~ 100°C. We replaced the fan (twice) and have the slave temperature reading <80°C, but after replacing the fan we see a VERY inconsistent behavior of the ARTIQ system.
More specifically we can no longer reliably establish a connection between the master and the slave. Sometimes randomly replugging the power cables or the SFP connectors helps to establish the connection ([DEST#1] destination is up shows up in the master coremgmt log), sometimes the destination goes down right after checking the coremgmt log, sometimes it works to run the sinara tester and the destination 1 goes down when I try to run something custom on the ARTIQ box, sometimes the master doesn't even ping the slave, sometimes it does, but only once, sometimes the master keeps pinging the slave but the pings all fail... Overall a very frustrating experience seeing such a behavior after just replacing a fan.
I have a little bit of debugging available. I have the log of the slave when it successfully establishes the connection with the master and then the connection dies after I run the coremgmt log command:
This seems to indicate that the "uplink is down". The master doesn't seem to show any problems, I can always get the log and the temperature is around 50°C.
In addition I have the log of the same situation from the master:
Could this inconsistent behavior be due to the fact that the slave (Kasli) temperature is still relatively high (70°C - 80°C)? What other debugging can I do? Please help
sb10q We exchanged the fan with the sunon fan that's recommended in other forum posts (it is more powerful than the initial fan) and see no change in the behavior of the artiq system. The temperature of the Kasli is still ~ 80C and the RTIO connection fails after ~ 10s of minutes (or ~ 1-2hours when starting from a completely room temperature box).
We tried new SFP cables (passive and active) without any success.
Is it possible that the heat sink that the fan is attached to on the Kasli board looses physical connection to the board? How can we replace this, if applicable?
Where does the temperature sensor sit that shows up when getting the Kasli temperature reading?
How can we debug this problem further?