Background

When the AD9910's digital ramp generator (DRG) starts from a non-zero parameter value and then realizes a falling ramp of that parameter, there is a mandatory, undesired discontinuity in the ramped parameter that lasts for 1 SYNC_CLK cycle (4 nanoseconds). Source, relevant discussion.

In amplitude ramps, we cannot prevent this.

In frequency ramps, we can program a rising ramp that results in an effective falling ramp:

  • The AD9910's reference clock runs at 1 GHz, so the frequency f < 400 MHz and its mirror frequency 1*GHz-f both output a sine wave of frequency f with identical amplitude in both cases.
  • I have confirmed the previous statement using an Urukul output and a 1 GHz oscilloscope with an FFT-function.
  • Note: ad9910.frequency_to_ftw() doesn't get the job done above 500 MHz, so I used this code:
    @kernel
    def frequency_to_uint32(self, frequency: TFloat) -> TInt32:
        """
        Linearly map frequency ∈ [0*GHz, 1*GHz] to an unsigned 32-bit integer {0,1,..., 2**32-1}.
        Hacking is necessary because the ARTIQ compiler does *not* know unsigned integers.

        :param frequency: Must be in the interval [0*GHz, 1*GHz].
        """
        if frequency < 0*Hz:
            raise ValueError("Invalid AD9910 frequency!")
        elif frequency < self.urukul0_ch0.sysclk / 2:
            return self.urukul0_ch0.frequency_to_ftw(frequency)
        elif frequency <= self.urukul0_ch0.sysclk:
            return -1 - self.urukul0_ch0.frequency_to_ftw(self.urukul0_ch0.sysclk - frequency)
        else:
            raise ValueError("Invalid AD9910 frequency!")
        return int32(0) # prevents compiler crash

Application

To ramp frequency from 300 MHz to 200 MHz with the DRG, we program a rising ramp from 700 MHz to 800 MHz.

Voila, no discontinuity, right? Wrong. The DRG no longer causes a discontinuity, but now the switch from 300 MHz (f) to 700 MHz (1*GHz-f) causes one.

Problem

The AD9910 FAQs state: "When a new frequency is programmed into the DDS, the next phase will simply be incremental with respect to the last phase value in the phase accumulator, and therefore the output sinewave will be phase continuous."

Unfortunately, it turns out that the AD9910's phase accumulator increments the phase value for f < 500 MHz and decrements it for f >= 500 MHz. As a result, the sinewave of the DDS output is mirrored in time about the exact moment of the switch from f to 1*GHz-f. A sinewave is symmetric about time-mirroring at its minima and maxima, so the mirroring is harmless at those extremal points. At all other times, however, a sinewave is asymmetric, so the mirroring constitutes an effective phase jump of the DDS output. Here an example of the Urukul output at the precise moment of switching from 200 MHz to 800 MHz:

{mirroring-of-sine-wave-at-switch-to-mirror-frequency.png}

You see the problem with this massive phase jump, right?

I would really appreciate some good ideas how to solve it.

I sketch one potential fix below, but it's very cumbersome.

Potential fix

  • AD9910 offers a deterministic way to reset the phase accumulator in its control function register 1 (CFR1) and ARTIQ offers a way to obtain the current RTIO time cursor now_mu().
  • Consequently, we could determine the time t_old of an extremal point of the output sinewave and write it into the core device cache.
  • When we want to perform a switch from f_old to its mirror frequency f_new at time t_sw, we can load t_old from cache and calculate:
turns_since_last_extremum = (f_old * (t_sw - t_old)) % 0.5
phi_new = phi_old + 2 * (0.5 - turns_since_last_extremum) # in the beginning of the experiment, we choose `phi_old = 0`
t_new = t_sw + (0.5 - turns_since_last_extremum) / f_new
self.urukul0_ch0.set(f_new, phi_new, A_old) # perform the switch from `f` to `1*GHz-f`
# `phi_new` ensures that the phase of the effective output is continuous

The above idea leads to some finite error in t_new, which will accumulate every time we switch between a frequency and its mirror frequency. Also, the above idea needs more development to keep track of frequency ramps. Seems like a bad idea overall.

See phase mode.

You should not use the default phase mode PHASE_MODE_CONTINUOUS.

There is already a phase tracking scheme in the AD9910 driver if you are not using the continuous mode (for single-tone at least).
When you use your negative FTW, it will compute the negative phase offset. Since AD9910 outputs cosine by default (CFR1[16]), it should always generate a continuous waveform.

But it seems that auto-clearing the phase accumulator generates glitches after I/O update for updating singletone profiles, as you might have found out already.

Then I guess you do have to do it in PHASE_MODE_CONTINUOUS.

A few points to add to the sketch of potential fix:

  • You should clear the phase accumulator prior to generating anything, so you can calculate the accumulated phase at any point. Unset autoclear before you generate anything (glitch otherwise).
  • I will assume the accumulator is exactly 0 at the beginning (right as the old waveform is programmed). When you program the new waveform, find the phase accumulator value (phi_acc), then program -2*phi_acc as POW (with appropriate shifts). This is to make the waveform starts at a phase offset of approximately -phi_acc. The original post had already explained this idea.

    @occheung Thank you for your advice! I didn't understand "autoclear causes a glitch", so I accidentally reproduced your findings from yesterday. Next, I will try switching to the mirror frequency without the autoclear. Purely for documentation purposes:

    1. pow_ has a time resolution of oscillation_period/2**16, i.e. 0.15 picoseconds at f=100 MHz or 0.3 picoseconds at f=200 MHz. However, now_mu()-ref_time_mu only has a time resolution of 1 nanosecond, which is already 10% of one period at f=100 MHz or 20% of one period at f=200 MHz. For a good phase match before and after the switch to the mirror frequency, we need to add a constant p to pow_ that cannot be calculated from now_mu()-ref_time_mu.
    2. At the precise moment of the switch from (100*MHz, 0.0, 0.1) to (900*MHz, pow_, 0.1), i.e. at the rising flank of the io_update pulse, the AD9910 chip outputs bullshit for roughly 5 nanoseconds:

    {problematic-dds-output-at-switch-to-mirror-frequency}

    1. I scanned p (and therefore pow_) in the range (-2π, 2π), but the output always contained 5 nanoseconds of bullshit. See for yourself in this video.
    2. I introduced an additional delay d between the buffer write self.urukul0.ch0.write64(...) and the transfer to the active output registers self.urukul0_ch0.io_update.pulse_mu(8) and performed a fine 2-dimensional scan of d and p in the range (0, 40 ns) x (-2π, 2π), but the output always contained 5 nanoseconds of bullshit.

    Run it yourself

    The code below:

    • Aligns the output phase of self.urukul0_ch0 deterministically to the edges of self.ttl0 via self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE).
    • Switches to the mirror frequency via self.dds_set(900*MHz, 0.0, 0.1, PHASE_MODE_TRACKING, self.urukul0_ch0.t_sw, p, d), where p is the constant phase added to pow_ and d is the aforementioned delay before io_update.
    • Marks the switch to the mirror frequency via self.ttl0.off().
    • Repeats the above steps 800 times while:
      • scanning d over (0, 40 ns) in 40 steps (outer loop),
      • scanning pow_ over (-2π, 2π) in 20 steps (inner loop),
      • taking 21 ms per iteration for a total duration < 20 seconds.

    To observe the output, trigger your scope on the falling edge of self.ttl0 and set it to normal trigger mode and 100 nanosecond time span. Also, make sure your scope is able to trigger every 21 ms or increase that duration.

    from artiq.language.environment import EnvExperiment
    from artiq.language.core import kernel, rpc, delay, delay_mu, now_mu, at_mu, parallel
    from artiq.language.units import ns, us, ms, s, Hz, MHz, V
    from artiq.language.types import TInt32, TInt64, TFloat, TStr, TBool, TTuple
    from artiq.coredevice.i2c import i2c_write_byte
    from artiq.coredevice.kasli_i2c import port_mapping
    from artiq.coredevice.ad9910 import _PHASE_MODE_DEFAULT, PHASE_MODE_CONTINUOUS, PHASE_MODE_ABSOLUTE, PHASE_MODE_TRACKING, _AD9910_REG_PROFILE0,\
                                        _AD9910_REG_RAMP_LIMIT, _AD9910_REG_RAMP_STEP, _AD9910_REG_RAMP_RATE
    from artiq.coredevice.urukul import DEFAULT_PROFILE
    from numpy import int32, uint32, int64, uint64
    
    # Maps Kasli EEM port indices that are visible on the PCB
    # to actual electrical port(?) indices that need to be passed to the FPGA.
    KASLI_I2C_BOARD_TO_PORT_MAPPING = [port%8 for port in port_mapping.values()]
    # for `artiq.coredevice.i2c.i2c_write_byte(busno, busaddr, data, ack=True)`
    # and `artiq.coredevice.i2c.i2c_read_byte(busno, busaddr)`
    DIO_SMA_BUS_NUMBER = 0
    DIO_SMA_BUS_ADDRESS = 0x7c # = 124 (decimal) or 01111100 (binary)
    
    # @rpc(flags={"async"})
    # def rpc_print(reg):
    #     for i, r in enumerate(reg):
    #         print(f"REG{i}: {r:64b}  |  decimal: {r}")
    #     print(f"bits: 3210987654321098765432109876543210987654321098765432109876543210 <-- LSB here")
    
    @rpc
    def print_binary(number, type_cast, nr_bits):
        bits = ""
        for i in range(nr_bits):
            bits = str(i % 10) + bits
        print("bits    :", bits)
        print("binary  :", f"{type_cast(number):{int(nr_bits)}b}")
    
    class DRGAmplitudeTest(EnvExperiment):
    
        def build(self):
            self.setattr_device("core") # artiq.coredevice.core.Core
            self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
            device_db = self.get_device_db() # dict, DO NOT EDIT!
            self.n_kasli_socs = 1 + len(device_db["core"]["arguments"]["satellite_cpu_targets"])
            self.setattr_device("i2c_switch0") # artiq.coredevice.i2c.I2CSwitch
            self.setattr_device("ttl0") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("ttl1") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("ttl2") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("urukul0_cpld") # artiq.coredevice.urukul.CPLD
            self.setattr_device("urukul0_ch0") # artiq.coredevice.ad9910.AD9910
            self.urukul0_ch0.t_sw = int64(0)
    
        @kernel
        def init(self):
            r"""
            Should be called once after every reboot or power-cycle of the Kasli (SoC).
            """
            for i in range(self.n_kasli_socs):
                while not self.core.get_rtio_destination_status(i):
                    pass
            self.core.reset()
            self.core.break_realtime()
            self.i2c_switch0.set(channel = KASLI_I2C_BOARD_TO_PORT_MAPPING[0])
            delay(1*us)
            i2c_write_byte(
                busno   = DIO_SMA_BUS_NUMBER,
                busaddr = DIO_SMA_BUS_ADDRESS,
                data    = 0
            )
            delay(1*us)
            self.i2c_switch0.unset()
            self.core.break_realtime()
            for ttl in [self.ttl0, self.ttl1, self.ttl2]:
                ttl.output()
                delay(1*us)
                ttl.off()
                delay(1*us)
            self.urukul0_cpld.init()
            delay(1*us)
            self.urukul0_cpld.cfg_att_en_all(1)
            delay(1*us)
            self.urukul0_ch0.sw.off()
            delay(1*us)
            self.urukul0_ch0.init()
            delay(1*us)
            self.urukul0_ch0.set_phase_mode(PHASE_MODE_CONTINUOUS)
            delay(1*us)
            self.urukul0_ch0.set_att(0.0)
            delay(1*us)
            self.core.wait_until_mu(now_mu())
    
        @kernel
        def frequency_to_uint32(self, frequency: TFloat) -> TInt32:
            """
            Linearly map frequency ∈ [0*GHz, 1*GHz] to an unsigned 32-bit integer {0,1,..., 2**32-1}.
            Hacking is necessary because the ARTIQ compiler does *not* know unsigned integers.
    
            :param frequency: Must be in the interval [0*GHz, 1*GHz].
            """
            if frequency < 0*Hz:
                raise ValueError("Invalid AD9910 frequency!")
            elif frequency < self.urukul0_ch0.sysclk / 2:
                return self.urukul0_ch0.frequency_to_ftw(frequency)
            elif frequency <= self.urukul0_ch0.sysclk:
                return -1 - self.urukul0_ch0.frequency_to_ftw(self.urukul0_ch0.sysclk - frequency)
            else:
                raise ValueError("Invalid AD9910 frequency!")
            return int32(0) # prevents compiler crash
    
        @kernel
        def set_mu(self, ftw: TInt32, pow_: TInt32, asf: TInt32,
                   phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                   ref_time_mu: TInt64 = int64(-1),
                   profile: TInt32 = DEFAULT_PROFILE,
                   ram_destination: TInt32 = -1,
                   p: TInt32 = 0, d: TInt64 = 0) -> TInt32:
            if phase_mode == _PHASE_MODE_DEFAULT:
                phase_mode = self.urukul0_ch0.phase_mode
            # Align to coarse RTIO which aligns SYNC_CLK. I.e. clear fine TSC
            # This will not cause a collision or sequence error.
            at_mu(now_mu() & ~7)
            if phase_mode != PHASE_MODE_CONTINUOUS:
                # Auto-clear phase accumulator on IO_UPDATE.
                # This is active already for the next IO_UPDATE
                self.urukul0_ch0.set_cfr1(phase_autoclear=1)
                if phase_mode == PHASE_MODE_TRACKING and ref_time_mu < 0:
                    # set default fiducial time stamp
                    ref_time_mu = 0
                if ref_time_mu >= 0:
                    # 32 LSB are sufficient.
                    # Also no need to use IO_UPDATE time as this
                    # is equivalent to an output pipeline latency.
                    dt = int32(now_mu() - ref_time_mu)
                    pow_ += (dt * ftw * self.urukul0_ch0.sysclk_per_mu >> 16) + 13000 + round(p/10 * (1 << 16))
            self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + profile,
                                     (asf << 16) | (pow_ & 0xffff), ftw)
            delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
            delay_mu(d)
            self.urukul0_ch0.t_sw = now_mu()
            self.urukul0_ch0.io_update.pulse_mu(8)  # assumes 8 mu > t_SYN_CCLK
            at_mu(now_mu() & ~7)  # clear fine TSC again
            delay(90*ns)
            self.ttl0.off()
            delay(-90*ns)
            if phase_mode != PHASE_MODE_CONTINUOUS:
                self.urukul0_ch0.set_cfr1()
                # future IO_UPDATE will activate
            return pow_
        
        @kernel
        def dds_set(self, frequ: TFloat, turns: TFloat, amp: TFloat,
                    phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                    ref_time_mu: TInt64 = int64(-1),
                    p: TInt32 = 0, d: TInt64 = 0):
            # if phase_mode == PHASE_MODE_TRACKING:
            #     phase_mode = self.urukul0_ch0.phase_mode
            ftw = self.frequency_to_uint32(frequ)
            pow_ = self.urukul0_ch0.turns_to_pow(turns)
            asf = self.urukul0_ch0.amplitude_to_asf(amp)
            self.set_mu(ftw, pow_, asf, phase_mode, ref_time_mu, p=p, d=d)
            self.core_cache.put("urukul0_ch0", [ftw, pow_, asf])
        
        @rpc(flags={"async"})
        def print_async(self, d, p):
            print("delay =", d, "turns =", p/10)
    
        @kernel
        def run(self):
            self.init()
            self.core.reset()
            self.core.break_realtime()
            # ---------------------------------
            # t0 = now_mu()
            # self.urukul0_cpld.set_profile(0, 7)
            # print(now_mu() - t0)
            delay(100*ms)
            for d in range(0, 40, 1):
                for p in range(-10, 11, 1):
                    self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE)
                    delay(20*us)
                    self.urukul0_ch0.sw.on()
                    delay(151*ns)
                    self.ttl0.on()
                    delay(-1.8*us)
                    self.dds_set(900*MHz, 0.0, 0.1, PHASE_MODE_TRACKING, self.urukul0_ch0.t_sw, p, d)
                    delay(1*us)
                    self.urukul0_ch0.sw.off()
                    self.print_async(d, p)
                    delay(20*ms)
                    self.core.wait_until_mu(now_mu())
                    delay(1*ms)

    occheung I thought I had figured everything out, but then I ran into now_mu() affecting the AD9910's output. I would be grateful for your opinion about artiq/issues/2776.

    Also, phase accumulation between two frequency changes only depends on the time difference between the two changes, right? So recording now_mu() right before each self.urukul0_ch0.io_update.pulse_mu(8) and taking the difference should yield the precise duration of phase accumulation, right?

    The arithmetic is making my head spin. I accumulate the phase inside self.urukul0_ch0.acc_pow = int64(0). When phase has been accumulated by a mirror frequency, I probably have to add +2*((~ftw)*dt_mu) instead of -2*(ftw*dt_mu), right? But how do I handle the case of underflows or overflows? Will those just work?

    The following experiment mirrors the frequency of self.urukul0_ch0 thirteen times with random delays in between completely *without* phase discontinuities. To observe the output, trigger your oscilloscope on falling edge of either self.ttl0 or self.ttl1 or self.ttl2 and set normal trigger mode and time span 100 ns.

    Beware that phase discontinuities *do* occur if one introduces a delay > 2 seconds (roughly) between any of the frequency mirrorings. The most likely cause is improper handling of integer overflows / underflows during multiplication.

    @occheung Do you have any advice how to handle the overflows / underflows during multiplication?

    Correction from later: The experiment has no phase discontinuities for pretty frequencies like 100 or 200 MHz. For the random frequency 238.3486 MHz, every single mirroring causes a phase discontinuity. The discontinuities become smaller if I change delay_mu(1) to delay_mu(0) and they almost disappear for delay_mu(2). Clearly, the required delay between at_mu(now_mu() & ~7) and t_io_update_mu = now_mu() + [...] depends on the frequency. I am close to giving up and doing falling frequency ramps the exact same way as falling amplitude ramps, i.e. with a 4 ns-long discontinuity in the ramped parameter.

    from artiq.language.environment import EnvExperiment
    from artiq.language.core import kernel, rpc, delay, delay_mu, now_mu, at_mu, parallel
    from artiq.language.units import ns, us, ms, s, Hz, MHz, V
    from artiq.language.types import TInt32, TInt64, TFloat, TStr, TBool, TList, TTuple
    from artiq.coredevice.i2c import i2c_write_byte
    from artiq.coredevice.kasli_i2c import port_mapping
    from artiq.coredevice.ad9910 import _PHASE_MODE_DEFAULT, PHASE_MODE_CONTINUOUS, PHASE_MODE_ABSOLUTE, PHASE_MODE_TRACKING, _AD9910_REG_PROFILE0
    from artiq.coredevice.urukul import DEFAULT_PROFILE
    from numpy import int32, uint32, int64, uint64
    
    # Maps Kasli EEM port indices that are visible on the PCB
    # to actual electrical port(?) indices that need to be passed to the FPGA.
    KASLI_I2C_BOARD_TO_PORT_MAPPING = [port%8 for port in port_mapping.values()]
    # for `artiq.coredevice.i2c.i2c_write_byte(busno, busaddr, data, ack=True)`
    # and `artiq.coredevice.i2c.i2c_read_byte(busno, busaddr)`
    DIO_SMA_BUS_NUMBER = 0
    DIO_SMA_BUS_ADDRESS = 0x7c # = 124 (decimal) or 01111100 (binary)
    
    @rpc
    def print_binary(number, type_cast=uint32, nr_bits=32, print_bits=False):
        print("binary  :", f"{type_cast(number):{int(nr_bits)}b}")
        if print_bits:
            bits = ""
            for i in range(nr_bits):
                bits = str(i % 10) + bits
            print("bits    :", bits)
    
    class DRGAmplitudeTest(EnvExperiment):
    
        def build(self):
            self.setattr_device("core") # artiq.coredevice.core.Core
            self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
            device_db = self.get_device_db() # dict, DO NOT EDIT!
            self.n_kasli_socs = 1 + len(device_db["core"]["arguments"]["satellite_cpu_targets"])
            self.setattr_device("i2c_switch0") # artiq.coredevice.i2c.I2CSwitch
            self.setattr_device("ttl0") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("ttl1") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("ttl2") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("ttl3") # artiq.coredevice.ttl.TTLInOut
            self.setattr_device("urukul0_cpld") # artiq.coredevice.urukul.CPLD
            self.setattr_device("urukul0_ch0") # artiq.coredevice.ad9910.AD9910
            self.urukul0_ch0.ftw = int32(0)
            self.urukul0_ch0.pow = int32(0)
            self.urukul0_ch0.asf = int32(0)
            self.urukul0_ch0.t_acc_start_mu = int64(0)
            self.urukul0_ch0.acc_pow = int64(0)
    
        @kernel
        def init(self):
            r"""
            Should be called once after every reboot or power-cycle of the Kasli (SoC).
            """
            for i in range(self.n_kasli_socs):
                while not self.core.get_rtio_destination_status(i):
                    pass
            self.core.reset()
            self.core.break_realtime()
            self.i2c_switch0.set(channel = KASLI_I2C_BOARD_TO_PORT_MAPPING[0])
            i2c_write_byte(
                busno   = DIO_SMA_BUS_NUMBER,
                busaddr = DIO_SMA_BUS_ADDRESS,
                data    = 0
            )
            self.i2c_switch0.unset()
            self.core.break_realtime()
            for ttl in [self.ttl0, self.ttl1, self.ttl2, self.ttl3]:
                ttl.output()
                delay(1*us)
                ttl.off()
                delay(1*us)
            self.urukul0_cpld.init()
            self.urukul0_cpld.cfg_att_en_all(1)
            self.urukul0_ch0.sw.off()
            self.urukul0_ch0.init()
            self.urukul0_ch0.set_phase_mode(PHASE_MODE_CONTINUOUS)
            self.urukul0_ch0.set_att(0.0)
            self.core.wait_until_mu(now_mu())
    
        @kernel
        def frequency_to_uint32(self, frequency: TFloat) -> TInt32:
            """
            Linearly map frequency ∈ [0*GHz, 1*GHz] to an unsigned 32-bit integer {0,1,..., 2**32-1}.
            Hacking is necessary because the ARTIQ compiler does *not* know unsigned integers.
    
            :param frequency: Must be in the interval [0*GHz, 1*GHz].
            """
            if frequency < 0*Hz:
                raise ValueError("Invalid AD9910 frequency!")
            elif frequency < self.urukul0_ch0.sysclk / 2:
                return self.urukul0_ch0.frequency_to_ftw(frequency)
            elif frequency <= self.urukul0_ch0.sysclk:
                return -1 - self.urukul0_ch0.frequency_to_ftw(self.urukul0_ch0.sysclk - frequency)
            else:
                raise ValueError("Invalid AD9910 frequency!")
            return int32(0) # prevents compiler crash
    
        @kernel
        def set_mu(self, ftw: TInt32, pow_: TInt32, asf: TInt32,
                   phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                   ref_time_mu: TInt64 = int64(-1),
                   profile: TInt32 = DEFAULT_PROFILE) -> TInt32:
            if phase_mode == _PHASE_MODE_DEFAULT:
                phase_mode = self.urukul0_ch0.phase_mode
            # Align to coarse RTIO which aligns SYNC_CLK. I.e. clear fine TSC
            # This will not cause a collision or sequence error.
            at_mu(now_mu() & ~7)
            if phase_mode != PHASE_MODE_CONTINUOUS:
                # Auto-clear phase accumulator on IO_UPDATE.
                # This is active already for the next IO_UPDATE
                self.urukul0_ch0.set_cfr1(phase_autoclear=1)
                if phase_mode == PHASE_MODE_TRACKING and ref_time_mu < 0:
                    # set default fiducial time stamp
                    ref_time_mu = 0
                if ref_time_mu >= 0:
                    # 32 LSB are sufficient.
                    # Also no need to use IO_UPDATE time as this
                    # is equivalent to an output pipeline latency.
                    dt = int32(now_mu()) - int32(ref_time_mu)
                    pow_ += dt * ftw * self.urukul0_ch0.sysclk_per_mu >> 16
            self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + profile,
                                     (asf << 16) | (pow_ & 0xffff), ftw)
            delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
            t_io_update_mu = now_mu()
            self.urukul0_ch0.io_update.pulse_mu(8)  # assumes 8 mu > t_SYN_CCLK
            at_mu(now_mu() & ~7)  # clear fine TSC again
            if phase_mode != PHASE_MODE_CONTINUOUS:
                # phase accumulator has been reset
                self.urukul0_ch0.acc_pow = 0
                self.urukul0_ch0.set_cfr1()
                # future IO_UPDATE will activate
            else:
                # calculate phase-offset word in AD9910's phase accumulator at rising flank of io_update
                dt_mu = t_io_update_mu - self.urukul0_ch0.t_acc_start_mu
                if self.urukul0_ch0.ftw >= 0:
                    # regular frequency, so AD9910 has been incrementing phase accumulator
                    self.urukul0_ch0.acc_pow += self.urukul0_ch0.ftw * dt_mu * self.urukul0_ch0.sysclk_per_mu
                else:
                    # mirror frequency, so AD9910 has been decrementing phase accumulator
                    # f_mirror + f = 1*GHz means that we just flip all of ftw's bits
                    self.urukul0_ch0.acc_pow -= (~self.urukul0_ch0.ftw) * dt_mu * self.urukul0_ch0.sysclk_per_mu
            self.urukul0_ch0.t_acc_start_mu = t_io_update_mu
            self.urukul0_ch0.ftw = ftw
            self.urukul0_ch0.pow = pow_
            self.urukul0_ch0.asf = asf
            return pow_
    
        @kernel
        def mirror(self, debug_ttl):
            # align to coarse RTIO which aligns SYNC_CLK
            at_mu(now_mu() & ~7)
            # delay by 1 nanosecond, otherwise phase discontinuity
            # see also https://github.com/m-labs/artiq/issues/2776
            delay_mu(1)
            # pre-calculate RTIO timeline cursor at next rising flank of io_update
            T_write64_mu = 1248 # RTIO timeline cursor advancement per register-write
            io_update_delay_mu = int64(self.urukul0_ch0.sync_data.io_update_delay)
            t_io_update_mu = now_mu() + T_write64_mu + io_update_delay_mu
            # f_mirror + f = 1*GHz means that we just flip all of ftw's bits
            ftw_mirror = ~self.urukul0_ch0.ftw
            # pre-calculate phase-offset word of AD9910's phase accumulator at next rising flank of io_update
            dt_mu = t_io_update_mu - self.urukul0_ch0.t_acc_start_mu
            if self.urukul0_ch0.ftw >= 0:
                # regular frequency, so AD9910 has been incrementing phase accumulator
                self.urukul0_ch0.acc_pow += self.urukul0_ch0.ftw * dt_mu * self.urukul0_ch0.sysclk_per_mu
            else:
                # mirror frequency, so AD9910 has been decrementing phase accumulator
                self.urukul0_ch0.acc_pow -= ftw_mirror * dt_mu * self.urukul0_ch0.sysclk_per_mu
            # pre-calculate phase-offset word of AD9910's total output phase at next rising flank of io_update
            pow_io_update_mu = (self.urukul0_ch0.acc_pow >> 16) + self.urukul0_ch0.pow
            # mirror phase-offset word around a multiple of 2π
            self.urukul0_ch0.pow -= 2*pow_io_update_mu
            # write mirror frequency to single-tone register
            self.urukul0_ch0.ftw = ftw_mirror
            self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + 7,
                                     (self.urukul0_ch0.asf << 16) | (self.urukul0_ch0.pow & 0xffff), self.urukul0_ch0.ftw)
            delay_mu(io_update_delay_mu)
            # record new switching time
            self.urukul0_ch0.t_acc_start_mu = now_mu()
            # transfer mirror frequency to active output register
            self.urukul0_ch0.io_update.pulse_mu(8)
            delay_mu(84)
            debug_ttl.off()
            delay_mu(-84)
            # verify that we pre-calculated the accumulated phase correctly
            if self.urukul0_ch0.t_acc_start_mu != t_io_update_mu:
                raise ValueError("You pre-calculated the RTIO time cursor wrongly, \
                                 so you caused a phase discontinuity. Check on scope!")
    
        @kernel
        def dds_set(self, frequ: TFloat, turns: TFloat, amp: TFloat,
                    phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                    ref_time_mu: TInt64 = int64(-1)):
            self.set_mu(self.frequency_to_uint32(frequ),
                        self.urukul0_ch0.turns_to_pow(turns),
                        self.urukul0_ch0.amplitude_to_asf(amp),
                        phase_mode, ref_time_mu)
    
        @rpc(flags={"async"})
        def print_async(self, d):
            print("d =", d)
    
        @kernel
        def run(self):
            self.init()
            self.core.reset()
            self.core.break_realtime()
            for d in range(-20, 20+1, 1):
                for p in range(1):
                    delay(50*ms)
                    self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE)
                    delay(20*us)
                    self.urukul0_ch0.sw.on()
                    delay(151*ns)
                    self.ttl0.on()
                    self.ttl1.on()
                    self.ttl2.on()
                    delay(-1*us + d*ns)
                    self.mirror(self.ttl0)
                    delay_mu(434+8-5*d)
                    self.mirror(self.ttl3)
                    delay_mu(4321+2*d)
                    self.mirror(self.ttl3)
                    delay_mu(459+4*d)
                    self.mirror(self.ttl3)
                    delay_mu(3456+11*d)
                    self.mirror(self.ttl3)
                    # delay(2*s)
                    delay_mu(2349+1*d)
                    self.mirror(self.ttl3)
                    delay_mu(2541-9*d)
                    self.mirror(self.ttl3)
                    delay_mu(984-4*d)
                    self.mirror(self.ttl1)
                    delay_mu(12345+3*d)
                    self.mirror(self.ttl3)
                    delay_mu(491+7*d)
                    self.mirror(self.ttl3)
                    delay_mu(42459-8*d)
                    self.mirror(self.ttl3)
                    delay_mu(987+31*d)
                    self.mirror(self.ttl2)
                    delay_mu(38132+d)
                    self.mirror(self.ttl3)
                    delay(1*us)
                    self.urukul0_ch0.sw.off()
                    delay(10*ms)
                    self.print_async(d)
                    # print_binary(spow, uint64, 64, print_bits=False)
                    # print_binary(0x7fffffff, print_bits=True)
                    # print_binary(self.frequency_to_uint32(300*MHz), print_bits=False)
                    self.core.wait_until_mu(now_mu())

    Nothing much to add about the specific use case, as I haven't looked into the DRG myself, but in general:

    Clearly, the required delay between at_mu(now_mu() & 7) and t_io_update_mu = now_mu() + [...] depends on the frequency.

    That sounds like two separate problems. Assuming that your DDS is properly set up for phase control (clocked phase-coherently with the FPGA and sync delay/IO_UPDATE alignment tuned correctly), updates should be applied completely deterministically within the AD9910 state machine (PROFILE pin switching is another determinacy issue, but you are not doing that). Thus, it sounds like this frequency-dependent delay may in fact be an incorrect time-offset in some calculation (with the different alignments "lining up" different period multiples for each frequency, rather than the true zero that would work for all frequencies).

    Beware that phase discontinuities do occur if one introduces a delay > 2 seconds (roughly) between any of the frequency mirrorings. The most likely cause is improper handling of integer overflows / underflows during multiplication.

    Yep, roughly 2 seconds sounds suspiciously like 231 nanoseconds, so if I had to bet, I'd say that an intermediate value is incorrectly using 32 bit precision. Sign bits can be a bit tricky to get right, but in general, the multiplication easily accessible from ARTIQ Python (cast to int64 first to get 32 bit x 32 bit -> 64 bit) should be enough to calculate all the phases given the 32 bit width of the DDS registers.

      dpn Thank you, that helps. Do you see a mistake in my calculation of dt_mu below? Or another logic error?

      • In device_db.py, Urukul 0 is set to 1 GHz SYS_CLK and 250 MHz SYNC_CLK.
      • I verify that self.urukul0_ch0.sync_data.io_update_delay == self.urukul0_ch0.tune_io_update_delay().
      • I schedule self.urukul0_ch0.io_update.pulse_mu(8) only at coarse RTIO time stamps because only those have a constant delay to the SYS_CLK cycle where the new FTW comes into effect. I also store the time stamp now_mu() right before each self.urukul0_ch0.io_update.pulse_mu(8) like so:
        # align to coarse RTIO which aligns SYNC_CLK
        at_mu(now_mu() & ~7)
        delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
        # store time stamp
        self.urukul0_ch0.t_acc_start_mu = now_mu()
        # transfer new FTW to active output register
        self.urukul0_ch0.io_update.pulse_mu(8)
      • At the next frequency change, the duration dt_mu of phase accumulation only depends on the difference of the two time stamps:
        # align to coarse RTIO which aligns SYNC_CLK
        at_mu(now_mu() & ~7)
        delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
        # calculate duration of phase accumulation
        now = now_mu()
        dt_mu = now - self.urukul0_ch0.t_acc_start_mu
        # store new time stamp
        self.urukul0_ch0.t_acc_start_mu = now
        # transfer mirror frequency 1 to active output register
        self.urukul0_ch0.io_update.pulse_mu(8)
      • dpn replied to this.

        dtsevas That sounds sensible. (4 mu alignment sufficient instead of 8, but of course, this doesn't hurt, and there isn't really a reason to go for the finer one given the event dispatch limitations.)

        Just to be sure, I'd also check the output of the sync-delay auto-tuning routine, as in my experience this is a good first test for any clocking issues. (I'd expect to see a window width of 2, or at least 1.)

          dpn By default, my device_db.py contains

                  "sync_delay_seed": "eeprom_urukul0:68",
                  "io_update_delay": "eeprom_urukul0:68"

          and the experiment

                  print("IO_UPDATE | measured optimal delay:", self.urukul0_ch0.tune_io_update_delay(),
                        " | actually used:", self.urukul0_ch0.sync_data.io_update_delay)
                  delay(20*ms)
                  sync_in_delay, window_size = self.urukul0_ch0.tune_sync_delay()
                  print("SYNC_IN   | measured optimal delay:", sync_in_delay,
                        "| actually used:", self.urukul0_ch0.sync_data.sync_delay_seed,
                        "\n              measured window size:", window_size)

          outputs:

          IO_UPDATE | measured optimal delay: 0  | actually used: 0
          SYNC_IN   | measured optimal delay: 15 | actually used: -1 
                        measured window size: 5

          There are some variations over time:

          • After power-on, self.urukul0_ch0.tune_io_update_delay() returned 3 in the first few experiment runs and has been returning 0 ever since.
          • self.urukul0_ch0.tune_sync_delay() has always been returning either 2 and 5 for the window size.

          Variations over time are bad, right?

          Also, what does self.urukul0_ch0.sync_data.sync_delay_seed == -1 mean?

          Luckily, I can offer some good news: My time calculations were correct and the DDS parameter values are updated with deterministic timing. My phase tracking was still wrong because self.urukul0_ch0.set_cfr1(phase_autoclear=1) and self.urukul0_ch0.io_update.pulse_mu(8) clear the phase accumulator a time dt *before* the new DDS parameter values come into effect. Possible solutions:

          1. In PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING, instead of self.urukul0_ch0.acc_pow = 0, I could try to do self.urukul0_ch0.acc_pow = round(ftw * dt). I don't know if the required dt would be always be constant or if it would be the same for all AD9910 chips. But it seems that dt = 4*ns. If that is true, it might mean that the phase accumulator gets cleared when the io_pulse is registered by SYNC_CLK's rising edge, whereas the new DDS parameter values come into effect one SYNC_CLK cycle (4 ns) later.
          2. I could reset the phase accumulator via self.urukul0_ch0.set_mu(0, 0, 0, PHASE_MODE_ABSOLUTE) and afterwards use *only* PHASE_MODE_CONTINUOUS, because PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING reset the phase accumulator and thereby screw up the phase tracking.

          Seeds < 0 mean that auto-tuning during init() is disabled. (I haven't used the EEPROM sync_data yet, but I assume it behaves the same as when specified in the device DB.) With that, the multi-chip sync functionality should be disabled altogether. This leads to a non-deterministic relationship of the 250 MHz DDS SYNC_CLK to the 125 MHz FPGA clock (due to the intermediary 1 GHz SYSCLK stage). You'll want to enable that for the timing relationships to be stable across reboots.

          I haven't checked the behaviour of the auto-tune function in anger recently (we have a manual experiment to plot the incidence of errors across all possible delays for debugging clock issues), but I don't think I've ever seen a width of 5. As long as it is happy at all, the clocking should be fine, though.

          Regarding the autoclear timing, I don't think "a fraction of a nanosecond" can be quite right, since the DDS core only runs at 1 GHz (so 1 ns). Now that you mention it, I do remember that we were seeing a small glitch as well back in 2020 during development of phase-coherent SUServo, which we couldn't quite get rid of (within a day or two of work). In the end, our preliminary understanding was that, referencing the timing diagram in the manual (fig. 49), the accumulator clear might already happen at A, while the new POW is loaded a cycle later at B. It might actually only be a two or three clock SYSCLK cycles, but we didn't follow this up further, as we just switched the SUServo implementation not to clear after the first time (as you suggest in 2), rather just keeping track of the evolution of the hardware accumulator in the gateware driver.

          Just to be explicit, I think your calculation is probably correct and will give the correct phase relationship after the glitch has settled, but I am not sure there is a way to get rid of the glitch.

            dpn I came to the same conclusion after some experimenting. I think it's exactly one SYNC_CLK cycle (= 4 ns) and it happens exactly in the order you described, i.e. phase accumulator is cleared at A and new parameter values become effective at B in Figure 49. "I/O_UPDATE Transferring Data from I/O Buffer to Active Registers" of the AD9910 manual, which you referenced. This can be compensated by the new POW, though, so no problems here. If the glitch cannot be removed, we just manually keep track of the phase accumulator's value across the entire experiment via self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.

              Can it be compensated, though? What we saw was a glitch where the output went to zero absolute phase (cosine output -> max voltage) for a SYNC_CLK cycle and then resumed with the new parameters. Even with the correct POW, the glitch would still be present. Don't take this as gospel, though; as I said, there was still an alternative solution available, so we stopped experimenting fairly quickly.

                dtsevas Correction to myself: The correct starting value of the accumulator after an auto-clear is self.urukul0_ch0.acc_pow = -new_ftw*4.
                This is *wrong*: self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.
                Surprisingly, it seems that no phase is accumulated within the one SYNC_CLK cycle that starts with registration of the io_pulse (point A in the aforementioned Figure) and ends with activation of the new register values (point B in the aforementioned Figure), no matter the old value of the active FTW.

                dpn I just experimented a bit with the accumulator auto-clear and the glitch was always there, you are right. I will do the same as you: Reset the accumulator once before the important part of the experiment starts and then keep track of the accumulator's value throughout the experiment.

                Do you see any value in PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING? I am thinking about replacing them by a new function called ad9910.clear_accumulator(self) and calling that function once in ad9910.init(). I want to submit a pull request to github.com/m-labs/artiq sometime soon.

                  @dpn I want to synchronize all AD9910 chips to the Kasli SoC's clock and to align io_update to the coarse RTIO grid (8 ns spacing). I could write the outputs of ad9910.tune_sync_delay() and ad9910.tune_io_update_delay() manually into device_db.py. Do you know of any other way? How does "io_update_delay": "eeprom_urukul0:64" work, for instance?

                    dtsevas Or maybe I should keep PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING and make them work without clearing the accumulator?