We are having a near-identical set of errors as in this post by RianneSL on the log and when we attempt to run some of the example code on our new system.

We just installed the firmware downloaded using AFWS, so I'd have expected the firmware to then match our artiq version. How does one go about checking the firmware version to compare to the artiq version?

log message:

[   122.479191s] ERROR(runtime::session): idle kernel aborted: unexpected request RunAborted from kernel CPU
[   122.488418s]  INFO(runtime::session): no connection, starting idle kernel
[   122.532253s]  INFO(kernel): panic at ksupport/lib.rs:523:5: Exception(LoadFault) at PC 0x45060164, trap value 0x45061010

Error message when running rtio.py:

> artiq_run rtio.py
Traceback (most recent call last):
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\Scripts\artiq_run-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\frontend\artiq_run.py", line 224, in main
    return run(with_file=True)
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\frontend\artiq_run.py", line 210, in run
    raise exn
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\frontend\artiq_run.py", line 203, in run
    exp_inst.run()
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\language\core.py", line 54, in run_on_core
    return getattr(self, arg).run(run_on_core, ((self,) + k_args), k_kwargs)
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\coredevice\core.py", line 140, in run
    self._run_compiled(kernel_library, embedding_map, symbolizer, demangler)
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\coredevice\core.py", line 130, in _run_compiled
    self.comm.serve(embedding_map, symbolizer, demangler)
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\coredevice\comm_kernel.py", line 706, in serve
    self._read_header()
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\coredevice\comm_kernel.py", line 249, in _read_header
    sync_byte = self._read(1)[0]
  File "C:\Users\jar jar binks\miniconda3\envs\Kartiq\lib\site-packages\artiq\coredevice\comm_kernel.py", line 237, in _read
    raise ConnectionResetError("Core device connection closed unexpectedly")
ConnectionResetError: Core device connection closed unexpectedly

Additionally, when we had previously tried running rtio.py, we got an error message which explicitly noted a version mismatch, which is not present now. It looked like this before:

WARNING:artiq.coredevice.comm_kernel: Mismatch between gateware (7.0.beta) and software (7.8123.3038639) versions
…
ConnectionResetError: Core device connection closed unexpectedly

We have since upgraded the gateware.

Any idea what might be causing this?

We are now experiencing this issue as well with our variant illinois2 crate, so I am interested to follow developments here.

One way I know to check the gateware version is by serial monitoring the boot log via the Kasli's UART, for example with PuTTY. I can check that both our firmware and software versions match as artiq 7.8123.3038639 (neither are beta); nonetheless, this error persists.

    Thanks wpc3, I must have just missed that line in the UART output. Looking at it, our gateware and software versions indeed match:

     __  __ _ ____         ____
    |  \/  (_) ___|  ___  / ___|
    | |\/| | \___ \ / _ \| |
    | |  | | |___) | (_) | |___
    |_|  |_|_|____/ \___/ \____|
    
    MiSoC Bootloader
    Copyright (c) 2017-2022 M-Labs Limited
    
    Bootloader CRC passed
    Gateware ident 7.8123.3038639;ucsb5master
    Initializing SDRAM...
    Read leveling scan:
    Module 1:
    00000000001111111111100000000000
    Module 0:
    00000000000111111111110000000000
    Read leveling: 15+-5 16+-5 done
    SDRAM initialized
    Memory test passed
    
    Booting from flash...
    Starting firmware.
    [     0.000015s]  INFO(runtime): ARTIQ runtime starting...
    [     0.003941s]  INFO(runtime): software ident 7.8123.3038639;ucsb5master
    [     0.010477s]  INFO(runtime): gateware ident 7.8123.3038639;ucsb5master
    [     0.017077s]  INFO(runtime): log level set to INFO by default
    [     0.022764s]  INFO(runtime): UART log level set to INFO by default
    [     0.139579s]  WARN(runtime::rtio_clocking): rtio_clock setting not recognised. Falling back to default.
    [     0.147682s]  INFO(runtime::rtio_clocking): using internal 125MHz RTIO clock
    [     0.424173s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
    [     4.573411s]  INFO(board_artiq::si5324):   ...locked
    [     4.578547s]  INFO(runtime::rtio_clocking::crg): Using internal RTIO clock
    [     4.609707s]  INFO(runtime): network addresses: MAC=e8-eb-1b-45-67-0e IPv4=192.168.1.75 IPv6-LL=fe80::eaeb:1bff:fe45:670e IPv6=no configured address
    [     4.623470s]  INFO(board_artiq::drtio_routing): could not read routing table from configuration, using default
    [     4.632185s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2 0; 3: 3 0; }
    [     4.660356s]  INFO(runtime::mgmt): management interface active
    [     4.672569s]  INFO(runtime::session): accepting network sessions
    [     4.685619s]  INFO(runtime::session): running startup kernel
    [     4.690119s]  INFO(runtime::session): no startup kernel found
    [     4.695889s]  INFO(runtime::session): no connection, starting idle kernel

    I ran into something similar to this when working on a MWE to test https://github.com/m-labs/artiq/issues/1969.

    Similarly to this post and the other post by @RianneSL (see link in the first conversion of this post), I am not able to run led.py with a recent artiq version (https://github.com/m-labs/artiq/commit/b89584632225f8cf10e6557b001d80eee45f93b4) on our Kasli-2.0. A ConnectionResetError happens. However, with a minor tweak in the code it runs. A simplified MWE is shown below:

    from artiq.experiment import *
    
    
    class LED(EnvExperiment):
        def build(self):
            self.setattr_device("core")
            self.a = 1
    
        @kernel
        def run(self):
            self.core.reset()
            self.a  # this line is critical for the code to run

    The above code would not work with the self.a line replaced by b = 1, print("A") or delay(10.). Also, if I add "a" in kernel_invariants, the code would not work either. It seems that somehow the kernel function requires using an instance variable that is not a kernel invariant to avoid the ConnectionResetError?

    Thank you! This also fixed our ConnectionResetError.

    19 days later

    I found the same error, when I installed a new system over the weekend and the fix also worked in our case. But after flashing startup.py and idle.py onto the master, the error still appeared on the serial monitor. I fixed this by using print(self.a) instead of just self.a in the kernel, so I think there was some optimization going on during compilation, that removed the "nonsensical" statement. I am a bit surprised, that artiq_compile and running the same experiment from the dashboard apparently produce different binaries like this. Is this expected behaviour and where does it come from?

    22 days later

    I tried to reproduce this, but couldn't provoke the error, even when using a build exactly equivalent to 7.8123.3038639;ucsb5master (though I don't have access to your AFWS files). Calling artiq_run in a loop with the above test case runs fine for thousands of repetitions even without the self.a line in the kernel.

    Could somebody who still experiences the issue people please provide a more detailed list of steps to reproduce the issue (preferably at https://github.com/m-labs/artiq/issues/1975)? Ideally also with the most recent version from Git, if you are set up to do that.

    I can reproduce the error on Windows using conda. I freshly flashed our boards with ARTIQ-7.8123.3038639 and run into the same problems without the self.a hack. Running a simple script returns:

    ConnectionResetError: Core device connection closed unexpectedly

    with the artiq_coremgmt reporting the LoadFault exception:

    [  3089.777900s]  INFO(kernel): panic at ksupport/lib.rs:523:5: Exception(LoadFault) at PC 0x45060164, trap value 0x45061010
    [  3089.787890s] ERROR(runtime::session): session aborted: unexpected request RunAborted from kernel CPU

    When inserting the hack from above and runnning the script again, these error messages disappear.
    If I run the hacked script via the dashboard the experiment is excecuted, but I get a warning

    INFO:dashboard:root:ARTIQ dashboard version: 7.8123.3038639
    INFO:dashboard:root:ARTIQ dashboard connected to moninj_proxy (::1)
    INFO:dashboard:artiq.dashboard.experiments:Submitted 'repo:tests/TestSimple', RID is 272
    ERROR:dashboard:artiq.dashboard.moninj:failed to connect to moninj. Is aqctl_moninj_proxy running?
    Traceback (most recent call last):
      File "C:\tools\miniconda3\envs\artiq-7\lib\site-packages\artiq\dashboard\moninj.py", line 693, in mi_connector
        await new_mi_connection.connect(self.mi_addr, self.mi_port)
      File "C:\tools\miniconda3\envs\artiq-7\lib\site-packages\artiq\coredevice\comm_moninj.py", line 32, in connect
        self._reader, self._writer = await async_open_connection(
      File "C:\tools\miniconda3\envs\artiq-7\lib\site-packages\sipyco\keepalive.py", line 80, in async_open_connection
        reader, writer = await asyncio.open_connection(host, port, *args, **kwargs)
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\streams.py", line 47, in open_connection
        transport, _ = await loop.create_connection(
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\base_events.py", line 1064, in create_connection
        raise exceptions[0]
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\base_events.py", line 1049, in create_connection
        sock = await self._connect_sock(
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\base_events.py", line 960, in _connect_sock
        await self.sock_connect(sock, address)
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\proactor_events.py", line 705, in sock_connect
        return await self._proactor.connect(sock, address)
      File "C:\tools\miniconda3\envs\artiq-7\lib\site-packages\qasync\_windows.py", line 43, in _process_events
        value = callback(transferred, key, ov)
      File "C:\tools\miniconda3\envs\artiq-7\lib\asyncio\windows_events.py", line 604, in finish_connect
        ov.getresult()
    ConnectionRefusedError: [WinError 1225] The remote computer refused the network connection

    Starting aqctl_moninj_proxy manually does not help the situation.

    Additionally the core is not available during the prepare. Excecuting e.g. print(f"{self.core.get_rtio_destination_status(10) = }") during the prepare method returns the same error message from above with or without the self.a workaround:

    ConnectionResetError: Core device connection closed unexpectedly
    • dpn replied to this.

      I can reproduce the same behavior running artiq_master and artiq_run on a Linux virtual machine.

      Running artiq_dashboard on Linux repetively gives the following error message:

      I don't however know whether the first one regarding libGL this is not more related to VirtualBox. Code can be run from the dashboard with the hack as seen under Windows.

      • dpn replied to this.

        ThorstenGroh Additionally the core is not available during the prepare.

        The core device shouldn't be accessed during prepare, nor should any other devices – previous experiments in the pipeline are still running then. In fact, we might want to enforce that more directly by making it raise an exception when trying to run kernels then. The idea of prepare is to provide a way to do possibly expensive setup work while another experiment is still using the hardware

        More on the LoadFault issue on GitHub.

        ThorstenGroh I don't however know whether the first one regarding libGL this is not more related to VirtualBox.

        Yes, this would very likely be related to the setup of the distro in your VM, not ARTIQ itself.

        5 days later

        As it turns out, this is related to the LLD (LLVM linker) version – as a workaround until an official fix is available, you can install LLVM 11, 12 or 13 instead of version 14 or 15.

        Should now be fixed, thanks @dpn !
        So just update the ARTIQ software to get rid of the problem.