Hi @harry

Because you're not experiencing the same error as me, what firmware are you using for your Kasli? I wonder if the version of firmware I have might be causing the problem? This is the only other idea I have at the moment for resolving this issue.

    LukeBaker Since you have indicated you're using the Beta version of ARTIQ, I used ARTIQ v6.7268.e31ee1f0.beta which is newer than the one you have. I guess you can try to rebuild the gateware and firmware using the latest ARTIQ build on ONE of our Nix channels:

    Essentially, to enter the ARTIQ environment on your lab computer, you just need to enter a Nix shell using:

    $ nix-channel --add https://your-desired-channel-url
    $ nix-channel --update
    $ nix-shell "<artiq-fast/shell-dev.nix>"

    Then, download/obtain the ARTIQ description JSON file (inside of which the IP address should be modified to match your current network setting), connect your lab computer USB port to your Kasli USB/JTAG port, and then run the following to rebuild and reflash the gateware/firmware onto Kasli:

    [nix-shell]$ python -m artiq.gateware.targets.kasli_generic /path/to/description.json
    [nix-shell]$ artiq_flash -d artiq_kasli --srcbuild

    Afterwards, run the following command and you should obtain a log showing the correct ARTIQ version number:

    [nix-shell]$ artiq_coremgmt log

    Let's see if reflashing the gateware/firmware would help.

    12 days later

    Hi @harry

    I'm just updating you on the progress I've been making with the DMA issue.
    I've managed to udpate our nix-shell environment, the Kasli firmware and our conda environment (That we use to run experiments) to version 6.7345... which is the latest version, I believe.

    Unfortunately I'm now experiencing a different error. When I try to run an experiment from the dashboard (Just a simple Urukul signal output), the master responds with a TimeoutError and no signal is generated on my oscilloscope.

    I was also going to attach the coremgmt log, but when I try to retrieve it I also get the same Timeout Error.

    I'm going to try and switch around the Kasli boards to check if it's a hardware issue (When I'm back in the office), but I'm a bit stuck on what to do on the software side of things. I can't seem to get V6.7345... to respond. Should I try reverting back to your suggested version of V6.7268?

      LukeBaker It could be a connection problem - if you ping your crate with the original IP address and get 100% packet loss. First, please double-check the JSON file contains the correct IP address for the crate:

      {
          "target": "kasli",
          "variant": ...,
          "hw_rev": ...,
          "base": "standalone",
          "core_addr": "crate_ip_address",
          "peripherals": [
              ...
          ]
      }

      Then, please do the following to store the same IP address in Kasli's flash memory:

      # Ensure there is no existing file named `kasli.config` because that will be overwritten by the following commands.
      [nix-shell]$ artiq_mkfs -s ip crate_ip_address kasli.config
      [nix-shell]$ artiq_flash -f kasli.config storage
      # Reinitialise Kasli.
      [nix-shell]$ artiq_flash start
      # Optional: run this to also make sure your device database file is using the right Kasli configuration.
      [nix-shell]$ artiq_ddb_template /path/to/description.json -o device_db.py

      Afterwards (wait a few seconds), ping the crate with this IP address again, and see if the timeout issue is gone.

      7 days later

      Hi @harry

      I managed to get the crate connected to the right IP address from your instructions. I tested the crate with Artiq beta version on the Firmware and beta version in my artiq conda environment (Both the same version). However, even when testing with the same code, I still received a timeout error. The same effect occurs if I make the 'hotfix' delay value go below 9us. (Above 9us there is no timeout error). It looks like the same error is occurring even with the beta version.

      Further, I managed to setup a new Kasli box with two TTLs and a different Kasli board. I created the same pattern as before and tested their output. However, I still get the same timeout error on the new board. (Additionally, as above, the timeout delay occurs only when the hotfix delay is below 9us).

      This could mean two things. Either I should roll back to a slightly older version of the beta, or the Kasli boards we've ordered are slightly older versions and we'll need to get them replaced? What's the revision number for the physical Kasli board you're using that doesn't cause a timeout error? It may be that our board is slightly older and we'll need to get it swapped out.

      I can't think of any other reason this problem would be occurring across multiple different boards and multiple different Artiq versions.

      Hi @harry ,

      Sorry to update again. The desync error is indeed a problem, but what I neglected to realise was the RTIOunderflow error the we've both been experiencing for these pulse pattern is a much bigger problem in and of itself (Sorry, this completely flew over my head), because we need pulse patterns like these not to cause an RTIOunderflow at all (Regardless of the desyncing issue). I'm going to create some simpler examples and post a new thread. I think it's possible the Desync issue is a lower tier problem with the root cause being in the RTIOunderflow error itself.

      @LukeBaker Great to have heard back from you. My Kasli board is v1.1, and Urukul is v1.3. These boards are compatible with both release-5 and beta versions of ARTIQ.

      I wonder whether or not you can still reproduce the "timeout" error - you can try to use release-5 first. When this error happens, can you verify that Pinging the Kasli still works? If it doesn't, there's most likely a connection issue between Kasli and your computer. Please test the connection first with Pinging - if Pinging is successful, please show me a log of the whole "timeout" error.

      TimeoutError is completely different from RTIOUnderflow, and it normally isn't related to how your experiment code controls the peripherals. To make it clearer for understanding, please consider attaching a full console log. Thanks!