Nothing much to add about the specific use case, as I haven't looked into the DRG myself, but in general:

Clearly, the required delay between at_mu(now_mu() & 7) and t_io_update_mu = now_mu() + [...] depends on the frequency.

That sounds like two separate problems. Assuming that your DDS is properly set up for phase control (clocked phase-coherently with the FPGA and sync delay/IO_UPDATE alignment tuned correctly), updates should be applied completely deterministically within the AD9910 state machine (PROFILE pin switching is another determinacy issue, but you are not doing that). Thus, it sounds like this frequency-dependent delay may in fact be an incorrect time-offset in some calculation (with the different alignments "lining up" different period multiples for each frequency, rather than the true zero that would work for all frequencies).

Beware that phase discontinuities do occur if one introduces a delay > 2 seconds (roughly) between any of the frequency mirrorings. The most likely cause is improper handling of integer overflows / underflows during multiplication.

Yep, roughly 2 seconds sounds suspiciously like 231 nanoseconds, so if I had to bet, I'd say that an intermediate value is incorrectly using 32 bit precision. Sign bits can be a bit tricky to get right, but in general, the multiplication easily accessible from ARTIQ Python (cast to int64 first to get 32 bit x 32 bit -> 64 bit) should be enough to calculate all the phases given the 32 bit width of the DDS registers.

    dpn Thank you, that helps. Do you see a mistake in my calculation of dt_mu below? Or another logic error?

    • In device_db.py, Urukul 0 is set to 1 GHz SYS_CLK and 250 MHz SYNC_CLK.
    • I verify that self.urukul0_ch0.sync_data.io_update_delay == self.urukul0_ch0.tune_io_update_delay().
    • I schedule self.urukul0_ch0.io_update.pulse_mu(8) only at coarse RTIO time stamps because only those have a constant delay to the SYS_CLK cycle where the new FTW comes into effect. I also store the time stamp now_mu() right before each self.urukul0_ch0.io_update.pulse_mu(8) like so:
      # align to coarse RTIO which aligns SYNC_CLK
      at_mu(now_mu() & ~7)
      delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
      # store time stamp
      self.urukul0_ch0.t_acc_start_mu = now_mu()
      # transfer new FTW to active output register
      self.urukul0_ch0.io_update.pulse_mu(8)
    • At the next frequency change, the duration dt_mu of phase accumulation only depends on the difference of the two time stamps:
      # align to coarse RTIO which aligns SYNC_CLK
      at_mu(now_mu() & ~7)
      delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
      # calculate duration of phase accumulation
      now = now_mu()
      dt_mu = now - self.urukul0_ch0.t_acc_start_mu
      # store new time stamp
      self.urukul0_ch0.t_acc_start_mu = now
      # transfer mirror frequency 1 to active output register
      self.urukul0_ch0.io_update.pulse_mu(8)
    • dpn replied to this.

      dtsevas That sounds sensible. (4 mu alignment sufficient instead of 8, but of course, this doesn't hurt, and there isn't really a reason to go for the finer one given the event dispatch limitations.)

      Just to be sure, I'd also check the output of the sync-delay auto-tuning routine, as in my experience this is a good first test for any clocking issues. (I'd expect to see a window width of 2, or at least 1.)

        dpn By default, my device_db.py contains

                "sync_delay_seed": "eeprom_urukul0:68",
                "io_update_delay": "eeprom_urukul0:68"

        and the experiment

                print("IO_UPDATE | measured optimal delay:", self.urukul0_ch0.tune_io_update_delay(),
                      " | actually used:", self.urukul0_ch0.sync_data.io_update_delay)
                delay(20*ms)
                sync_in_delay, window_size = self.urukul0_ch0.tune_sync_delay()
                print("SYNC_IN   | measured optimal delay:", sync_in_delay,
                      "| actually used:", self.urukul0_ch0.sync_data.sync_delay_seed,
                      "\n              measured window size:", window_size)

        outputs:

        IO_UPDATE | measured optimal delay: 0  | actually used: 0
        SYNC_IN   | measured optimal delay: 15 | actually used: -1 
                      measured window size: 5

        There are some variations over time:

        • After power-on, self.urukul0_ch0.tune_io_update_delay() returned 3 in the first few experiment runs and has been returning 0 ever since.
        • self.urukul0_ch0.tune_sync_delay() has always been returning either 2 and 5 for the window size.

        Variations over time are bad, right?

        Also, what does self.urukul0_ch0.sync_data.sync_delay_seed == -1 mean?

        Luckily, I can offer some good news: My time calculations were correct and the DDS parameter values are updated with deterministic timing. My phase tracking was still wrong because self.urukul0_ch0.set_cfr1(phase_autoclear=1) and self.urukul0_ch0.io_update.pulse_mu(8) clear the phase accumulator a time dt *before* the new DDS parameter values come into effect. Possible solutions:

        1. In PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING, instead of self.urukul0_ch0.acc_pow = 0, I could try to do self.urukul0_ch0.acc_pow = round(ftw * dt). I don't know if the required dt would be always be constant or if it would be the same for all AD9910 chips. But it seems that dt = 4*ns. If that is true, it might mean that the phase accumulator gets cleared when the io_pulse is registered by SYNC_CLK's rising edge, whereas the new DDS parameter values come into effect one SYNC_CLK cycle (4 ns) later.
        2. I could reset the phase accumulator via self.urukul0_ch0.set_mu(0, 0, 0, PHASE_MODE_ABSOLUTE) and afterwards use *only* PHASE_MODE_CONTINUOUS, because PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING reset the phase accumulator and thereby screw up the phase tracking.

        Seeds < 0 mean that auto-tuning during init() is disabled. (I haven't used the EEPROM sync_data yet, but I assume it behaves the same as when specified in the device DB.) With that, the multi-chip sync functionality should be disabled altogether. This leads to a non-deterministic relationship of the 250 MHz DDS SYNC_CLK to the 125 MHz FPGA clock (due to the intermediary 1 GHz SYSCLK stage). You'll want to enable that for the timing relationships to be stable across reboots.

        I haven't checked the behaviour of the auto-tune function in anger recently (we have a manual experiment to plot the incidence of errors across all possible delays for debugging clock issues), but I don't think I've ever seen a width of 5. As long as it is happy at all, the clocking should be fine, though.

        Regarding the autoclear timing, I don't think "a fraction of a nanosecond" can be quite right, since the DDS core only runs at 1 GHz (so 1 ns). Now that you mention it, I do remember that we were seeing a small glitch as well back in 2020 during development of phase-coherent SUServo, which we couldn't quite get rid of (within a day or two of work). In the end, our preliminary understanding was that, referencing the timing diagram in the manual (fig. 49), the accumulator clear might already happen at A, while the new POW is loaded a cycle later at B. It might actually only be a two or three clock SYSCLK cycles, but we didn't follow this up further, as we just switched the SUServo implementation not to clear after the first time (as you suggest in 2), rather just keeping track of the evolution of the hardware accumulator in the gateware driver.

        Just to be explicit, I think your calculation is probably correct and will give the correct phase relationship after the glitch has settled, but I am not sure there is a way to get rid of the glitch.

          dpn I came to the same conclusion after some experimenting. I think it's exactly one SYNC_CLK cycle (= 4 ns) and it happens exactly in the order you described, i.e. phase accumulator is cleared at A and new parameter values become effective at B in Figure 49. "I/O_UPDATE Transferring Data from I/O Buffer to Active Registers" of the AD9910 manual, which you referenced. This can be compensated by the new POW, though, so no problems here. If the glitch cannot be removed, we just manually keep track of the phase accumulator's value across the entire experiment via self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.

            Can it be compensated, though? What we saw was a glitch where the output went to zero absolute phase (cosine output -> max voltage) for a SYNC_CLK cycle and then resumed with the new parameters. Even with the correct POW, the glitch would still be present. Don't take this as gospel, though; as I said, there was still an alternative solution available, so we stopped experimenting fairly quickly.

              dtsevas Correction to myself: The correct starting value of the accumulator after an auto-clear is self.urukul0_ch0.acc_pow = -new_ftw*4.
              This is *wrong*: self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.
              Surprisingly, it seems that no phase is accumulated within the one SYNC_CLK cycle that starts with registration of the io_pulse (point A in the aforementioned Figure) and ends with activation of the new register values (point B in the aforementioned Figure), no matter the old value of the active FTW.

              dpn I just experimented a bit with the accumulator auto-clear and the glitch was always there, you are right. I will do the same as you: Reset the accumulator once before the important part of the experiment starts and then keep track of the accumulator's value throughout the experiment.

              Do you see any value in PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING? I am thinking about replacing them by a new function called ad9910.clear_accumulator(self) and calling that function once in ad9910.init(). I want to submit a pull request to github.com/m-labs/artiq sometime soon.

                @dpn I want to synchronize all AD9910 chips to the Kasli SoC's clock and to align io_update to the coarse RTIO grid (8 ns spacing). I could write the outputs of ad9910.tune_sync_delay() and ad9910.tune_io_update_delay() manually into device_db.py. Do you know of any other way? How does "io_update_delay": "eeprom_urukul0:64" work, for instance?

                  dtsevas Or maybe I should keep PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING and make them work without clearing the accumulator?

                  dtsevas Yes, there absolutely is value in PHASE_MODE_TRACKING. We use it all over the place as the default for coherent operations, as it provides a solution that works irrespective of global state. This can be valuable to keep otherwise unnecessary interdependencies out of the code, for instance when using the same RF chain to address multiple transitions that don't necessarily have anything to do with each other on the code side. The glitch doesn't matter if it only occurs once at the begining of a pulse (where, in addition, likely the RF switch is still off anyway).

                  As long as you know all the timing information ahead of time (e.g. no DMA sequences, or at most one global one), I suppose you could just add the "software phase tracking" to some global device wrapper to similarly avoid a dependence of various pieces of code on each other, but for the cases where the glitch isn't important, it' s awhole lot of complexity for nothing.

                  dtsevas We haven't been using the EEPROM support, as it didn't exist when we first started developing/using Urukul. There must be some sort of tool to write those values into the on-board EEPROM.

                    dpn Understood, thank you! I will not attempt to change anything about PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING. I have offered in the M-Labs chatroom to prepare a pull request with the accumulator tracking and frequency mirroring. Let's see if they think it makes sense to add to global ARTIQ or not.

                    Note: The overflow/underflow errors are also fixed. Turns out the AD9910 phase accumulator treats the FTW as an *unsigned* integer, so one can always calculate like this:

                    self.acc_pow += self.ftw * dt_mu * self.ad9910.sysclk_per_mu

                    dpn We haven't been using the EEPROM support, as it didn't exist when we first started developing/using Urukul. There must be some sort of tool to write those values into the on-board EEPROM.

                    artiq_sinara_tester write the delays to the EEPROM, should the AD9910 Urukul has synchronization enabled.

                    https://github.com/m-labs/artiq/blob/cfc5eb0a1a6dc6d846a99ca01d450c4024a391eb/artiq/frontend/artiq_sinara_tester.py#L319-L324

                    dtsevas How does "io_update_delay": "eeprom_urukul0:64" work, for instance?

                    The data at offset 64 of the EEPROM of the Urukul is read when the device is initialized on host.