Does the experiment keep running indefinitely or is it interrupted by a RTIOUnderflow error?
What is in the device log (artiq_coremgmt log or UART if you have it connected) after the issue has occurred?
Does the issue also occur with a delay longer than 100ns?

The experiment runs indefinately. However, there is an RTIO underflow error if the hotfix delay I have at the end of the for loop is too small (Around 5*us instead of 20*us)

I've printed out the device log from the coremgmt with the log level set the TRACE at the base of this reply.

I set the delay values between pulses to 200*ns, 500*ns and 1000*ns and each time the same desyncing effect occured.

I also checked the ch2 connection (Which is connected to the same urukul board as ch3), to see how those pulses compare. They are the same as the pulses from ch1.

[     0.000009s]  INFO(runtime): ARTIQ runtime starting...
[     0.003933s]  INFO(runtime): software ident 6.7191.8451e58f.beta;squartikul
[     0.010911s]  INFO(runtime): gateware ident 6.7191.8451e58f.beta;squartikul
[     0.017888s]  INFO(runtime): log level set to INFO by default
[     0.023616s]  INFO(runtime): UART log level set to INFO by default
[     0.029999s]  INFO(runtime::rtio_clocking): using internal RTIO clock (by default)
[     0.315159s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     2.773536s]  INFO(board_artiq::si5324):   ...locked
[     2.803878s]  INFO(runtime): network addresses: MAC=04-91-62-c7-28-92 IPv4=192.168.1.70 IPv6-LL=fe80::691:62ff:fec7:2892 IPv6=no configured address
[     2.817994s]  INFO(runtime::mgmt): management interface active
[     2.832275s]  INFO(runtime::session): accepting network sessions
[     2.847695s]  INFO(runtime::session): running startup kernel
[     2.852199s]  INFO(runtime::session): no startup kernel found
[     2.857978s]  INFO(runtime::session): no connection, starting idle kernel
[     2.864848s]  INFO(runtime::session): no idle kernel found
[ 42009.939904s]  INFO(runtime::moninj): new connection from 192.168.1.38:62557
[ 42014.043161s]  INFO(runtime::session): new connection from 192.168.1.38:62558
[ 42014.094936s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 42014.289664s]  INFO(runtime::session): no connection, starting idle kernel
[ 42014.295749s]  INFO(runtime::session): no idle kernel found
[ 42067.017094s]  INFO(runtime::session): new connection from 192.168.1.38:62560
[ 42067.070355s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 42154.666019s]  INFO(runtime::session): no connection, starting idle kernel
[ 42154.671841s]  INFO(runtime::session): no idle kernel found
[ 42157.180964s]  INFO(runtime::session): new connection from 192.168.1.38:62562
[ 42157.232598s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 42215.089980s]  INFO(runtime::session): no connection, starting idle kernel
[ 42215.095813s]  INFO(runtime::session): no idle kernel found
[ 42217.717122s]  INFO(runtime::session): new connection from 192.168.1.38:62564
[ 42217.768745s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 42260.277710s]  INFO(runtime::session): no connection, starting idle kernel
[ 42260.283514s]  INFO(runtime::session): no idle kernel found
[ 42263.068035s]  INFO(runtime::session): new connection from 192.168.1.38:62565
[ 42263.119470s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43483.361821s]  INFO(runtime::session): no connection, starting idle kernel
[ 43483.367628s]  INFO(runtime::session): no idle kernel found
[ 43510.120251s]  INFO(runtime::session): new connection from 192.168.1.38:62595
[ 43510.172441s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43510.367329s]  INFO(runtime::session): no connection, starting idle kernel
[ 43510.373395s]  INFO(runtime::session): no idle kernel found
[ 43544.942403s]  INFO(runtime::session): new connection from 192.168.1.38:62596
[ 43544.993971s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43545.188497s]  INFO(runtime::session): no connection, starting idle kernel
[ 43545.195103s]  INFO(runtime::session): no idle kernel found
[ 43560.728391s]  INFO(runtime::session): new connection from 192.168.1.38:62600
[ 43560.780166s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43560.973473s]  INFO(runtime::session): no connection, starting idle kernel
[ 43560.979540s]  INFO(runtime::session): no idle kernel found
[ 43605.672793s]  INFO(runtime::session): new connection from 192.168.1.38:62602
[ 43605.724219s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43605.918707s]  INFO(runtime::session): no connection, starting idle kernel
[ 43605.924771s]  INFO(runtime::session): no idle kernel found
[ 43827.268321s]  INFO(runtime::session): new connection from 192.168.1.38:62604
[ 43827.320078s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43853.303875s]  INFO(runtime::session): no connection, starting idle kernel
[ 43853.309794s]  INFO(runtime::session): no idle kernel found
[ 43855.865369s]  INFO(runtime::session): new connection from 192.168.1.38:62606
[ 43855.916984s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 43856.205176s]  INFO(runtime::session): no connection, starting idle kernel
[ 43856.211243s]  INFO(runtime::session): no idle kernel found
[ 44510.594148s]  INFO(runtime::session): new connection from 192.168.1.38:62624
[ 44510.646072s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 44510.943421s]  INFO(runtime::session): no connection, starting idle kernel
[ 44510.949493s]  INFO(runtime::session): no idle kernel found
[ 44521.989072s]  INFO(runtime::session): new connection from 192.168.1.38:62625
[ 44522.040490s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 44543.174618s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62626
[ 44555.545606s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62627
[ 44615.540896s]  INFO(runtime::session): no connection, starting idle kernel
[ 44615.546774s]  INFO(runtime::session): no idle kernel found
[ 44623.207472s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62628
[ 48014.922075s]  INFO(runtime::moninj): new connection from 192.168.1.38:62730
[ 48020.530730s]  INFO(runtime::session): new connection from 192.168.1.38:62731
[ 48020.583081s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48034.195707s]  INFO(runtime::session): no connection, starting idle kernel
[ 48034.201543s]  INFO(runtime::session): no idle kernel found
[ 48036.199839s]  INFO(runtime::session): new connection from 192.168.1.38:62733
[ 48036.252554s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48047.948435s]  INFO(runtime::session): no connection, starting idle kernel
[ 48047.954271s]  INFO(runtime::session): no idle kernel found
[ 48052.337354s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62734
[ 48107.967394s]  INFO(runtime::session): new connection from 192.168.1.38:62735
[ 48108.019112s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48139.632549s]  INFO(runtime::session): no connection, starting idle kernel
[ 48139.638375s]  INFO(runtime::session): no idle kernel found
[ 48141.912322s]  INFO(runtime::session): new connection from 192.168.1.38:62736
[ 48141.964151s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48179.381268s]  INFO(runtime::session): no connection, starting idle kernel
[ 48179.387094s]  INFO(runtime::session): no idle kernel found
[ 48181.957910s]  INFO(runtime::session): new connection from 192.168.1.38:62737
[ 48182.009982s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48211.625857s]  INFO(runtime::session): no connection, starting idle kernel
[ 48211.631682s]  INFO(runtime::session): no idle kernel found
[ 48213.560019s]  INFO(runtime::session): new connection from 192.168.1.38:62738
[ 48213.611694s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48244.468614s]  INFO(runtime::session): no connection, starting idle kernel
[ 48244.474441s]  INFO(runtime::session): no idle kernel found
[ 48246.377791s]  INFO(runtime::session): new connection from 192.168.1.38:62739
[ 48246.429479s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48278.266611s]  INFO(runtime::session): no connection, starting idle kernel
[ 48278.272436s]  INFO(runtime::session): no idle kernel found
[ 48280.254595s]  INFO(runtime::session): new connection from 192.168.1.38:62741
[ 48280.306483s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48376.060909s]  INFO(runtime::session): no connection, starting idle kernel
[ 48376.066738s]  INFO(runtime::session): no idle kernel found
[ 48377.959778s]  INFO(runtime::session): new connection from 192.168.1.38:62743
[ 48378.011727s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48378.311866s]  INFO(runtime::session): no connection, starting idle kernel
[ 48378.317961s]  INFO(runtime::session): no idle kernel found
[ 48386.816180s]  INFO(runtime::session): new connection from 192.168.1.38:62744
[ 48386.868065s]  INFO(runtime::kern_hwreq): resetting RTIO
[ 48391.532928s]  INFO(runtime::session): no connection, starting idle kernel
[ 48391.538754s]  INFO(runtime::session): no idle kernel found
[ 48597.231486s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62746
[ 48597.237114s]  INFO(runtime::mgmt): changing log level to TRACE
[ 48606.676908s]  INFO(runtime::mgmt): new connection from 192.168.1.38:62747
12 days later

Hi @sb10q

I was hoping I could get an update as to weather you think this is a problem that can be fixed currently? This may not be solveable ony my end until lockdown is over.

Hello @LukeBaker! I tried adopting your non-DMA code to generate the 100ns pulses using two channels on a single Urukul, added with appropriate amount of delay (e.g. 20us) between each group of 6 pulses per channel. However, I haven't been able to replicate your desync issue. All signals always looked correctly-timed and no suspicious logged messages were present. Here's an example of oscilloscope output:

Could you please confirm whether or not the desync issue happens on your setup if only 1 Urukul card is used?

(Edit: ARTIQ Version used: v6.7268.e31ee1f0.beta)

5 days later

Hi @harry ,

Sorry for the late response. It is true that if the time delay after each pulse sequence is high enough, then the desync doesn't occur. It won't occur with a 20us delay for a 2.4us long pulse pattern. However, as the time delay is reduced, say to 10us, the desync does occur (For the same length pulse pattern). This means that there are irregularities in the output pulses (For a reason I'm not yet sure about). This doesn't matter in most cases, but it does cause an Underflow error when using the DMA recording feature, which is something we'd like to use as it allows for quicker pulse patterns.

I've attached an image of the same pulse pattern with a delay length of 10us using two channels from a single Urukul card.

    LukeBaker From what I've gathered, you experienced the desync issue whenever the RTIO underflow error was raised, regardless of the use of DMA.

    With my own code, an RTIO underflow error still happens if the delay is too short (e.g. 10us), but I have never experienced any desync issue.

    My oscilloscope captures the first few groups of the 2.4us pulse, showing no absurdity. To facilitate our help, would you please offer the following:

    • Please test without using DMA, i.e. similar to what you showed on this reply, and confirm that the desync issue still happens when an underflow error would follow.
    • Please confirm that your oscilloscope is also capturing the first group of pulse that are emitted before the program stops at an underflow error.
    • Please attach the exact experiment code where your desync issue happens (i.e. when you took that newest screenshot).

    We would need to see how exactly the issue you're experiencing could be reproduced on our side. Thank you.

    4 days later

    Hi @harry

    When not using the DMA, if the delay hotfix is, specifically, 8.4966us, I receive no underflow error and desynced pulses. If the delay is 8.4965us, I receive about half a second of pulses, then an underflow error (Which I guess would make sense if the desync was pushing the RTIO counter forward?). I captured one of the pulses from a 8.4965us delay, showing that the pulses desync before the underflow error occurs. This confirms you're first statement.

    Below is the full code for the image in my previous post and the image above, with the hotfix delay changed in each case.

    from artiq.experiment import *
    
    class Exp_DMATesting_NoDMA_WithSequence(EnvExperiment):
        """Exp DMA Testing No DMA With Sequence"""
        
        def build(self):
            self.setattr_device("core")
            
            self.u0 = self.get_device("ttl_urukul4_sw3")
            self.u1 = self.get_device("ttl_urukul1_sw1")
            self.u2 = self.get_device("ttl_urukul1_sw0")
            
            self.u0a = self.get_device("urukul4_ch3")
            self.u1a = self.get_device("urukul1_ch1")
            self.u2a = self.get_device("urukul1_ch0")
        
        @kernel
        def run(self):
            self.core.reset()
            
            # Initialize Urukuls
            self.u0a.cpld.init()
            self.u0a.init()
            self.u1a.cpld.init()
            self.u1a.init()
            self.u2a.cpld.init()
            self.u2a.init()
            
            # Set low attenuation
            self.u0a.set_att(1.0)
            self.u1a.set_att(1.0)
            self.u2a.set_att(1.0)
            
            # Set frequency much higher than pulse length
            self.u0a.set(120*MHz, amplitude=1.0)
            self.u1a.set(120*MHz, amplitude=1.0)
            self.u2a.set(120*MHz, amplitude=1.0)
            
            # Create a simple pulse pattern without the use of the DMA
            while(True):
                for i in range(50):
                    for i in range(3):
                        self.u1.pulse(100*ns)
                        delay(100*ns)
                        self.u2.pulse(100*ns)
                        delay(100*ns)
                        self.u1.pulse(100*ns)
                        delay(100*ns)
                        self.u2.pulse(100*ns)
                        delay(100*ns)
                    delay(10*us)

      (Edit: made correction on my descriptions about the signals I get on the oscilloscope: those are NOT erroneous)

      LukeBaker I tested with your code on one Urukul card, and I still cannot reproduce any desync error.

      Some clarification about my previous statement: in situations where an RTIO underflow follows, I cannot get a desync effect on the pair of signals, but always one of the signals simply becomes continuous according to the frequency and amplitude setting. See below for some examples. But I have never seen a situation where the signals are out of order (desynced) while both signals are the TTL pulses.


      • Upper fig: delay set to 8.5us (first run), RTIO underflow occurs, u1 emits continuous 120MHz wave and takes place before u2; oscilloscope trigger set on u2.
      • Lower fig: delay set to 8.5us (same code, second run), RTIO underflow occurs, u2 emits continuous 120MHz wave and takes place before u1; oscilloscope trigger set on u1.

      On the other hand, in situations where RTIO underflow does not follow, e.g. delay is longer than 8.5us, no desync effect or erroneous signals can be produced based on your code.

      With my own understanding, the code when calling pulse() on the TTL won't submit additional delays before turning the TTL on. So I don't think the desync you see is a matter of the underflow error.

      7 days later

      Hi @harry

      Because you're not experiencing the same error as me, what firmware are you using for your Kasli? I wonder if the version of firmware I have might be causing the problem? This is the only other idea I have at the moment for resolving this issue.

        LukeBaker Since you have indicated you're using the Beta version of ARTIQ, I used ARTIQ v6.7268.e31ee1f0.beta which is newer than the one you have. I guess you can try to rebuild the gateware and firmware using the latest ARTIQ build on ONE of our Nix channels:

        Essentially, to enter the ARTIQ environment on your lab computer, you just need to enter a Nix shell using:

        $ nix-channel --add https://your-desired-channel-url
        $ nix-channel --update
        $ nix-shell "<artiq-fast/shell-dev.nix>"

        Then, download/obtain the ARTIQ description JSON file (inside of which the IP address should be modified to match your current network setting), connect your lab computer USB port to your Kasli USB/JTAG port, and then run the following to rebuild and reflash the gateware/firmware onto Kasli:

        [nix-shell]$ python -m artiq.gateware.targets.kasli_generic /path/to/description.json
        [nix-shell]$ artiq_flash -d artiq_kasli --srcbuild

        Afterwards, run the following command and you should obtain a log showing the correct ARTIQ version number:

        [nix-shell]$ artiq_coremgmt log

        Let's see if reflashing the gateware/firmware would help.

        12 days later

        Hi @harry

        I'm just updating you on the progress I've been making with the DMA issue.
        I've managed to udpate our nix-shell environment, the Kasli firmware and our conda environment (That we use to run experiments) to version 6.7345... which is the latest version, I believe.

        Unfortunately I'm now experiencing a different error. When I try to run an experiment from the dashboard (Just a simple Urukul signal output), the master responds with a TimeoutError and no signal is generated on my oscilloscope.

        I was also going to attach the coremgmt log, but when I try to retrieve it I also get the same Timeout Error.

        I'm going to try and switch around the Kasli boards to check if it's a hardware issue (When I'm back in the office), but I'm a bit stuck on what to do on the software side of things. I can't seem to get V6.7345... to respond. Should I try reverting back to your suggested version of V6.7268?

          LukeBaker It could be a connection problem - if you ping your crate with the original IP address and get 100% packet loss. First, please double-check the JSON file contains the correct IP address for the crate:

          {
              "target": "kasli",
              "variant": ...,
              "hw_rev": ...,
              "base": "standalone",
              "core_addr": "crate_ip_address",
              "peripherals": [
                  ...
              ]
          }

          Then, please do the following to store the same IP address in Kasli's flash memory:

          # Ensure there is no existing file named `kasli.config` because that will be overwritten by the following commands.
          [nix-shell]$ artiq_mkfs -s ip crate_ip_address kasli.config
          [nix-shell]$ artiq_flash -f kasli.config storage
          # Reinitialise Kasli.
          [nix-shell]$ artiq_flash start
          # Optional: run this to also make sure your device database file is using the right Kasli configuration.
          [nix-shell]$ artiq_ddb_template /path/to/description.json -o device_db.py

          Afterwards (wait a few seconds), ping the crate with this IP address again, and see if the timeout issue is gone.

          7 days later

          Hi @harry

          I managed to get the crate connected to the right IP address from your instructions. I tested the crate with Artiq beta version on the Firmware and beta version in my artiq conda environment (Both the same version). However, even when testing with the same code, I still received a timeout error. The same effect occurs if I make the 'hotfix' delay value go below 9us. (Above 9us there is no timeout error). It looks like the same error is occurring even with the beta version.

          Further, I managed to setup a new Kasli box with two TTLs and a different Kasli board. I created the same pattern as before and tested their output. However, I still get the same timeout error on the new board. (Additionally, as above, the timeout delay occurs only when the hotfix delay is below 9us).

          This could mean two things. Either I should roll back to a slightly older version of the beta, or the Kasli boards we've ordered are slightly older versions and we'll need to get them replaced? What's the revision number for the physical Kasli board you're using that doesn't cause a timeout error? It may be that our board is slightly older and we'll need to get it swapped out.

          I can't think of any other reason this problem would be occurring across multiple different boards and multiple different Artiq versions.

          Hi @harry ,

          Sorry to update again. The desync error is indeed a problem, but what I neglected to realise was the RTIOunderflow error the we've both been experiencing for these pulse pattern is a much bigger problem in and of itself (Sorry, this completely flew over my head), because we need pulse patterns like these not to cause an RTIOunderflow at all (Regardless of the desyncing issue). I'm going to create some simpler examples and post a new thread. I think it's possible the Desync issue is a lower tier problem with the root cause being in the RTIOunderflow error itself.

          @LukeBaker Great to have heard back from you. My Kasli board is v1.1, and Urukul is v1.3. These boards are compatible with both release-5 and beta versions of ARTIQ.

          I wonder whether or not you can still reproduce the "timeout" error - you can try to use release-5 first. When this error happens, can you verify that Pinging the Kasli still works? If it doesn't, there's most likely a connection issue between Kasli and your computer. Please test the connection first with Pinging - if Pinging is successful, please show me a log of the whole "timeout" error.

          TimeoutError is completely different from RTIOUnderflow, and it normally isn't related to how your experiment code controls the peripherals. To make it clearer for understanding, please consider attaching a full console log. Thanks!