- Edited
It is a typical occurrence that if ARTIQ has been running for about a day with even only a few simple monitoring experiments running in the background, if I then try to submit a new experiment I get this error:
artiq.master.worker:worker exception details
Traceback (most recent call last):
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\worker.py", line 252, in _worker_action
completed = await self._handle_worker_requests()
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\worker.py", line 238, in _handle_worker_requests
await self._send(reply)
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\worker.py", line 169, in _send
raise WorkerTimeout(
artiq.master.worker.WorkerTimeout: Timeout sending data to worker (RID 27875) During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\scheduler.py", line 268, in _do
completed = await run.run()
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\scheduler.py", line 34, in worker_method
return await m(*args, **kwargs)
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\worker.py", line 278, in run
completed = await self._worker_action({"action": "run"})
File "C:\Users\ssr1\.conda\envs\artiq-677\lib\site-packages\artiq\master\worker.py", line 254, in _worker_action
raise WorkerWatchdogTimeout
artiq.master.worker.WorkerWatchdogTimeout
This always happens when I launch a new experiment in the morning after no new experiments have been launched overnight, so I am not sure whether the problem is related to ARTIQ/experiments that have been running for a long time, or the fact that no new experiments have been launched in a long period of time.
The issue is sometimes resolved by resubmitting the experiment a few times and eventually it doesn't time out. The issue is always resolved by rebooting ARTIQ. These solutions are unsuitable for us at the moment and going forward. Could anyone explain why this issue arises?
UPDATE: I set up an experiment in the background to schedule a simple experiment (logs numbers from 1 to 100) every hour. This experiment was launched at 14:00, in the meantime during the day other experiments were regularly being submitted until around 18:00 and ARTIQ was working fine. After that, the hourly experiments worked until 21:00 when the time out error occurred. So it seems like even if ARTIQ is being kept "active" by having experiments submitted regularly the timeout still occurs.