AD9910: Mirror frequencies invert sign of phase accumulation

Ooccheung · 11 Jun

But it seems that auto-clearing the phase accumulator generates glitches after I/O update for updating singletone profiles, as you might have found out already.

Then I guess you do have to do it in PHASE_MODE_CONTINUOUS.

Ooccheung · 12 Jun

A few points to add to the sketch of potential fix:

You should clear the phase accumulator prior to generating anything, so you can calculate the accumulated phase at any point. Unset autoclear before you generate anything (glitch otherwise).
I will assume the accumulator is exactly 0 at the beginning (right as the old waveform is programmed). When you program the new waveform, find the phase accumulator value (phi_acc), then program -2*phi_acc as POW (with appropriate shifts). This is to make the waveform starts at a phase offset of approximately -phi_acc. The original post had already explained this idea.

Ddtsevas · 12 Jun

@occheung Thank you for your advice! I didn't understand "autoclear causes a glitch", so I accidentally reproduced your findings from yesterday. Next, I will try switching to the mirror frequency without the autoclear. Purely for documentation purposes:

pow_ has a time resolution of oscillation_period/2**16, i.e. 0.15 picoseconds at f=100 MHz or 0.3 picoseconds at f=200 MHz. However, now_mu()-ref_time_mu only has a time resolution of 1 nanosecond, which is already 10% of one period at f=100 MHz or 20% of one period at f=200 MHz. For a good phase match before and after the switch to the mirror frequency, we need to add a constant p to pow_ that cannot be calculated from now_mu()-ref_time_mu.
At the precise moment of the switch from (100*MHz, 0.0, 0.1) to (900*MHz, pow_, 0.1), i.e. at the rising flank of the io_update pulse, the AD9910 chip outputs bullshit for roughly 5 nanoseconds:

{problematic-dds-output-at-switch-to-mirror-frequency}

I scanned p (and therefore pow_) in the range (-2π, 2π), but the output always contained 5 nanoseconds of bullshit. See for yourself in this video.
I introduced an additional delay d between the buffer write self.urukul0.ch0.write64(...) and the transfer to the active output registers self.urukul0_ch0.io_update.pulse_mu(8) and performed a fine 2-dimensional scan of d and p in the range (0, 40 ns) x (-2π, 2π), but the output always contained 5 nanoseconds of bullshit.

Run it yourself

The code below:

Aligns the output phase of self.urukul0_ch0 deterministically to the edges of self.ttl0 via self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE).
Switches to the mirror frequency via self.dds_set(900*MHz, 0.0, 0.1, PHASE_MODE_TRACKING, self.urukul0_ch0.t_sw, p, d), where p is the constant phase added to pow_ and d is the aforementioned delay before io_update.
Marks the switch to the mirror frequency via self.ttl0.off().
Repeats the above steps 800 times while:
- scanning d over (0, 40 ns) in 40 steps (outer loop),
- scanning pow_ over (-2π, 2π) in 20 steps (inner loop),
- taking 21 ms per iteration for a total duration < 20 seconds.

To observe the output, trigger your scope on the falling edge of self.ttl0 and set it to normal trigger mode and 100 nanosecond time span. Also, make sure your scope is able to trigger every 21 ms or increase that duration.

from artiq.language.environment import EnvExperiment
from artiq.language.core import kernel, rpc, delay, delay_mu, now_mu, at_mu, parallel
from artiq.language.units import ns, us, ms, s, Hz, MHz, V
from artiq.language.types import TInt32, TInt64, TFloat, TStr, TBool, TTuple
from artiq.coredevice.i2c import i2c_write_byte
from artiq.coredevice.kasli_i2c import port_mapping
from artiq.coredevice.ad9910 import _PHASE_MODE_DEFAULT, PHASE_MODE_CONTINUOUS, PHASE_MODE_ABSOLUTE, PHASE_MODE_TRACKING, _AD9910_REG_PROFILE0,\
                                    _AD9910_REG_RAMP_LIMIT, _AD9910_REG_RAMP_STEP, _AD9910_REG_RAMP_RATE
from artiq.coredevice.urukul import DEFAULT_PROFILE
from numpy import int32, uint32, int64, uint64

# Maps Kasli EEM port indices that are visible on the PCB
# to actual electrical port(?) indices that need to be passed to the FPGA.
KASLI_I2C_BOARD_TO_PORT_MAPPING = [port%8 for port in port_mapping.values()]
# for `artiq.coredevice.i2c.i2c_write_byte(busno, busaddr, data, ack=True)`
# and `artiq.coredevice.i2c.i2c_read_byte(busno, busaddr)`
DIO_SMA_BUS_NUMBER = 0
DIO_SMA_BUS_ADDRESS = 0x7c # = 124 (decimal) or 01111100 (binary)

# @rpc(flags={"async"})
# def rpc_print(reg):
#     for i, r in enumerate(reg):
#         print(f"REG{i}: {r:64b}  |  decimal: {r}")
#     print(f"bits: 3210987654321098765432109876543210987654321098765432109876543210 <-- LSB here")

@rpc
def print_binary(number, type_cast, nr_bits):
    bits = ""
    for i in range(nr_bits):
        bits = str(i % 10) + bits
    print("bits    :", bits)
    print("binary  :", f"{type_cast(number):{int(nr_bits)}b}")

class DRGAmplitudeTest(EnvExperiment):

    def build(self):
        self.setattr_device("core") # artiq.coredevice.core.Core
        self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
        device_db = self.get_device_db() # dict, DO NOT EDIT!
        self.n_kasli_socs = 1 + len(device_db["core"]["arguments"]["satellite_cpu_targets"])
        self.setattr_device("i2c_switch0") # artiq.coredevice.i2c.I2CSwitch
        self.setattr_device("ttl0") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("ttl1") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("ttl2") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("urukul0_cpld") # artiq.coredevice.urukul.CPLD
        self.setattr_device("urukul0_ch0") # artiq.coredevice.ad9910.AD9910
        self.urukul0_ch0.t_sw = int64(0)

    @kernel
    def init(self):
        r"""
        Should be called once after every reboot or power-cycle of the Kasli (SoC).
        """
        for i in range(self.n_kasli_socs):
            while not self.core.get_rtio_destination_status(i):
                pass
        self.core.reset()
        self.core.break_realtime()
        self.i2c_switch0.set(channel = KASLI_I2C_BOARD_TO_PORT_MAPPING[0])
        delay(1*us)
        i2c_write_byte(
            busno   = DIO_SMA_BUS_NUMBER,
            busaddr = DIO_SMA_BUS_ADDRESS,
            data    = 0
        )
        delay(1*us)
        self.i2c_switch0.unset()
        self.core.break_realtime()
        for ttl in [self.ttl0, self.ttl1, self.ttl2]:
            ttl.output()
            delay(1*us)
            ttl.off()
            delay(1*us)
        self.urukul0_cpld.init()
        delay(1*us)
        self.urukul0_cpld.cfg_att_en_all(1)
        delay(1*us)
        self.urukul0_ch0.sw.off()
        delay(1*us)
        self.urukul0_ch0.init()
        delay(1*us)
        self.urukul0_ch0.set_phase_mode(PHASE_MODE_CONTINUOUS)
        delay(1*us)
        self.urukul0_ch0.set_att(0.0)
        delay(1*us)
        self.core.wait_until_mu(now_mu())

    @kernel
    def frequency_to_uint32(self, frequency: TFloat) -> TInt32:
        """
        Linearly map frequency ∈ [0*GHz, 1*GHz] to an unsigned 32-bit integer {0,1,..., 2**32-1}.
        Hacking is necessary because the ARTIQ compiler does *not* know unsigned integers.

        :param frequency: Must be in the interval [0*GHz, 1*GHz].
        """
        if frequency < 0*Hz:
            raise ValueError("Invalid AD9910 frequency!")
        elif frequency < self.urukul0_ch0.sysclk / 2:
            return self.urukul0_ch0.frequency_to_ftw(frequency)
        elif frequency <= self.urukul0_ch0.sysclk:
            return -1 - self.urukul0_ch0.frequency_to_ftw(self.urukul0_ch0.sysclk - frequency)
        else:
            raise ValueError("Invalid AD9910 frequency!")
        return int32(0) # prevents compiler crash

    @kernel
    def set_mu(self, ftw: TInt32, pow_: TInt32, asf: TInt32,
               phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
               ref_time_mu: TInt64 = int64(-1),
               profile: TInt32 = DEFAULT_PROFILE,
               ram_destination: TInt32 = -1,
               p: TInt32 = 0, d: TInt64 = 0) -> TInt32:
        if phase_mode == _PHASE_MODE_DEFAULT:
            phase_mode = self.urukul0_ch0.phase_mode
        # Align to coarse RTIO which aligns SYNC_CLK. I.e. clear fine TSC
        # This will not cause a collision or sequence error.
        at_mu(now_mu() & ~7)
        if phase_mode != PHASE_MODE_CONTINUOUS:
            # Auto-clear phase accumulator on IO_UPDATE.
            # This is active already for the next IO_UPDATE
            self.urukul0_ch0.set_cfr1(phase_autoclear=1)
            if phase_mode == PHASE_MODE_TRACKING and ref_time_mu < 0:
                # set default fiducial time stamp
                ref_time_mu = 0
            if ref_time_mu >= 0:
                # 32 LSB are sufficient.
                # Also no need to use IO_UPDATE time as this
                # is equivalent to an output pipeline latency.
                dt = int32(now_mu() - ref_time_mu)
                pow_ += (dt * ftw * self.urukul0_ch0.sysclk_per_mu >> 16) + 13000 + round(p/10 * (1 << 16))
        self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + profile,
                                 (asf << 16) | (pow_ & 0xffff), ftw)
        delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
        delay_mu(d)
        self.urukul0_ch0.t_sw = now_mu()
        self.urukul0_ch0.io_update.pulse_mu(8)  # assumes 8 mu > t_SYN_CCLK
        at_mu(now_mu() & ~7)  # clear fine TSC again
        delay(90*ns)
        self.ttl0.off()
        delay(-90*ns)
        if phase_mode != PHASE_MODE_CONTINUOUS:
            self.urukul0_ch0.set_cfr1()
            # future IO_UPDATE will activate
        return pow_
    
    @kernel
    def dds_set(self, frequ: TFloat, turns: TFloat, amp: TFloat,
                phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                ref_time_mu: TInt64 = int64(-1),
                p: TInt32 = 0, d: TInt64 = 0):
        # if phase_mode == PHASE_MODE_TRACKING:
        #     phase_mode = self.urukul0_ch0.phase_mode
        ftw = self.frequency_to_uint32(frequ)
        pow_ = self.urukul0_ch0.turns_to_pow(turns)
        asf = self.urukul0_ch0.amplitude_to_asf(amp)
        self.set_mu(ftw, pow_, asf, phase_mode, ref_time_mu, p=p, d=d)
        self.core_cache.put("urukul0_ch0", [ftw, pow_, asf])
    
    @rpc(flags={"async"})
    def print_async(self, d, p):
        print("delay =", d, "turns =", p/10)

    @kernel
    def run(self):
        self.init()
        self.core.reset()
        self.core.break_realtime()
        # ---------------------------------
        # t0 = now_mu()
        # self.urukul0_cpld.set_profile(0, 7)
        # print(now_mu() - t0)
        delay(100*ms)
        for d in range(0, 40, 1):
            for p in range(-10, 11, 1):
                self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE)
                delay(20*us)
                self.urukul0_ch0.sw.on()
                delay(151*ns)
                self.ttl0.on()
                delay(-1.8*us)
                self.dds_set(900*MHz, 0.0, 0.1, PHASE_MODE_TRACKING, self.urukul0_ch0.t_sw, p, d)
                delay(1*us)
                self.urukul0_ch0.sw.off()
                self.print_async(d, p)
                delay(20*ms)
                self.core.wait_until_mu(now_mu())
                delay(1*ms)

Ddtsevas · 13 Jun

occheung I thought I had figured everything out, but then I ran into now_mu() affecting the AD9910's output. I would be grateful for your opinion about artiq/issues/2776.

Ddtsevas · 13 Jun

Also, phase accumulation between two frequency changes only depends on the time difference between the two changes, right? So recording now_mu() right before each self.urukul0_ch0.io_update.pulse_mu(8) and taking the difference should yield the precise duration of phase accumulation, right?

Ddtsevas · 13 Jun

The arithmetic is making my head spin. I accumulate the phase inside self.urukul0_ch0.acc_pow = int64(0). When phase has been accumulated by a mirror frequency, I probably have to add +2*((~ftw)*dt_mu) instead of -2*(ftw*dt_mu), right? But how do I handle the case of underflows or overflows? Will those just work?

Ddtsevas · 13 Jun

The following experiment mirrors the frequency of self.urukul0_ch0 thirteen times with random delays in between completely *without* phase discontinuities. To observe the output, trigger your oscilloscope on falling edge of either self.ttl0 or self.ttl1 or self.ttl2 and set normal trigger mode and time span 100 ns.

Beware that phase discontinuities *do* occur if one introduces a delay > 2 seconds (roughly) between any of the frequency mirrorings. The most likely cause is improper handling of integer overflows / underflows during multiplication.

@occheung Do you have any advice how to handle the overflows / underflows during multiplication?

Correction from later: The experiment has no phase discontinuities for pretty frequencies like 100 or 200 MHz. For the random frequency 238.3486 MHz, every single mirroring causes a phase discontinuity. The discontinuities become smaller if I change delay_mu(1) to delay_mu(0) and they almost disappear for delay_mu(2). Clearly, the required delay between at_mu(now_mu() & ~7) and t_io_update_mu = now_mu() + [...] depends on the frequency. I am close to giving up and doing falling frequency ramps the exact same way as falling amplitude ramps, i.e. with a 4 ns-long discontinuity in the ramped parameter.

from artiq.language.environment import EnvExperiment
from artiq.language.core import kernel, rpc, delay, delay_mu, now_mu, at_mu, parallel
from artiq.language.units import ns, us, ms, s, Hz, MHz, V
from artiq.language.types import TInt32, TInt64, TFloat, TStr, TBool, TList, TTuple
from artiq.coredevice.i2c import i2c_write_byte
from artiq.coredevice.kasli_i2c import port_mapping
from artiq.coredevice.ad9910 import _PHASE_MODE_DEFAULT, PHASE_MODE_CONTINUOUS, PHASE_MODE_ABSOLUTE, PHASE_MODE_TRACKING, _AD9910_REG_PROFILE0
from artiq.coredevice.urukul import DEFAULT_PROFILE
from numpy import int32, uint32, int64, uint64

# Maps Kasli EEM port indices that are visible on the PCB
# to actual electrical port(?) indices that need to be passed to the FPGA.
KASLI_I2C_BOARD_TO_PORT_MAPPING = [port%8 for port in port_mapping.values()]
# for `artiq.coredevice.i2c.i2c_write_byte(busno, busaddr, data, ack=True)`
# and `artiq.coredevice.i2c.i2c_read_byte(busno, busaddr)`
DIO_SMA_BUS_NUMBER = 0
DIO_SMA_BUS_ADDRESS = 0x7c # = 124 (decimal) or 01111100 (binary)

@rpc
def print_binary(number, type_cast=uint32, nr_bits=32, print_bits=False):
    print("binary  :", f"{type_cast(number):{int(nr_bits)}b}")
    if print_bits:
        bits = ""
        for i in range(nr_bits):
            bits = str(i % 10) + bits
        print("bits    :", bits)

class DRGAmplitudeTest(EnvExperiment):

    def build(self):
        self.setattr_device("core") # artiq.coredevice.core.Core
        self.setattr_device("core_cache") # artiq.coredevice.cache.CoreCache
        device_db = self.get_device_db() # dict, DO NOT EDIT!
        self.n_kasli_socs = 1 + len(device_db["core"]["arguments"]["satellite_cpu_targets"])
        self.setattr_device("i2c_switch0") # artiq.coredevice.i2c.I2CSwitch
        self.setattr_device("ttl0") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("ttl1") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("ttl2") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("ttl3") # artiq.coredevice.ttl.TTLInOut
        self.setattr_device("urukul0_cpld") # artiq.coredevice.urukul.CPLD
        self.setattr_device("urukul0_ch0") # artiq.coredevice.ad9910.AD9910
        self.urukul0_ch0.ftw = int32(0)
        self.urukul0_ch0.pow = int32(0)
        self.urukul0_ch0.asf = int32(0)
        self.urukul0_ch0.t_acc_start_mu = int64(0)
        self.urukul0_ch0.acc_pow = int64(0)

    @kernel
    def init(self):
        r"""
        Should be called once after every reboot or power-cycle of the Kasli (SoC).
        """
        for i in range(self.n_kasli_socs):
            while not self.core.get_rtio_destination_status(i):
                pass
        self.core.reset()
        self.core.break_realtime()
        self.i2c_switch0.set(channel = KASLI_I2C_BOARD_TO_PORT_MAPPING[0])
        i2c_write_byte(
            busno   = DIO_SMA_BUS_NUMBER,
            busaddr = DIO_SMA_BUS_ADDRESS,
            data    = 0
        )
        self.i2c_switch0.unset()
        self.core.break_realtime()
        for ttl in [self.ttl0, self.ttl1, self.ttl2, self.ttl3]:
            ttl.output()
            delay(1*us)
            ttl.off()
            delay(1*us)
        self.urukul0_cpld.init()
        self.urukul0_cpld.cfg_att_en_all(1)
        self.urukul0_ch0.sw.off()
        self.urukul0_ch0.init()
        self.urukul0_ch0.set_phase_mode(PHASE_MODE_CONTINUOUS)
        self.urukul0_ch0.set_att(0.0)
        self.core.wait_until_mu(now_mu())

    @kernel
    def frequency_to_uint32(self, frequency: TFloat) -> TInt32:
        """
        Linearly map frequency ∈ [0*GHz, 1*GHz] to an unsigned 32-bit integer {0,1,..., 2**32-1}.
        Hacking is necessary because the ARTIQ compiler does *not* know unsigned integers.

        :param frequency: Must be in the interval [0*GHz, 1*GHz].
        """
        if frequency < 0*Hz:
            raise ValueError("Invalid AD9910 frequency!")
        elif frequency < self.urukul0_ch0.sysclk / 2:
            return self.urukul0_ch0.frequency_to_ftw(frequency)
        elif frequency <= self.urukul0_ch0.sysclk:
            return -1 - self.urukul0_ch0.frequency_to_ftw(self.urukul0_ch0.sysclk - frequency)
        else:
            raise ValueError("Invalid AD9910 frequency!")
        return int32(0) # prevents compiler crash

    @kernel
    def set_mu(self, ftw: TInt32, pow_: TInt32, asf: TInt32,
               phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
               ref_time_mu: TInt64 = int64(-1),
               profile: TInt32 = DEFAULT_PROFILE) -> TInt32:
        if phase_mode == _PHASE_MODE_DEFAULT:
            phase_mode = self.urukul0_ch0.phase_mode
        # Align to coarse RTIO which aligns SYNC_CLK. I.e. clear fine TSC
        # This will not cause a collision or sequence error.
        at_mu(now_mu() & ~7)
        if phase_mode != PHASE_MODE_CONTINUOUS:
            # Auto-clear phase accumulator on IO_UPDATE.
            # This is active already for the next IO_UPDATE
            self.urukul0_ch0.set_cfr1(phase_autoclear=1)
            if phase_mode == PHASE_MODE_TRACKING and ref_time_mu < 0:
                # set default fiducial time stamp
                ref_time_mu = 0
            if ref_time_mu >= 0:
                # 32 LSB are sufficient.
                # Also no need to use IO_UPDATE time as this
                # is equivalent to an output pipeline latency.
                dt = int32(now_mu()) - int32(ref_time_mu)
                pow_ += dt * ftw * self.urukul0_ch0.sysclk_per_mu >> 16
        self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + profile,
                                 (asf << 16) | (pow_ & 0xffff), ftw)
        delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
        t_io_update_mu = now_mu()
        self.urukul0_ch0.io_update.pulse_mu(8)  # assumes 8 mu > t_SYN_CCLK
        at_mu(now_mu() & ~7)  # clear fine TSC again
        if phase_mode != PHASE_MODE_CONTINUOUS:
            # phase accumulator has been reset
            self.urukul0_ch0.acc_pow = 0
            self.urukul0_ch0.set_cfr1()
            # future IO_UPDATE will activate
        else:
            # calculate phase-offset word in AD9910's phase accumulator at rising flank of io_update
            dt_mu = t_io_update_mu - self.urukul0_ch0.t_acc_start_mu
            if self.urukul0_ch0.ftw >= 0:
                # regular frequency, so AD9910 has been incrementing phase accumulator
                self.urukul0_ch0.acc_pow += self.urukul0_ch0.ftw * dt_mu * self.urukul0_ch0.sysclk_per_mu
            else:
                # mirror frequency, so AD9910 has been decrementing phase accumulator
                # f_mirror + f = 1*GHz means that we just flip all of ftw's bits
                self.urukul0_ch0.acc_pow -= (~self.urukul0_ch0.ftw) * dt_mu * self.urukul0_ch0.sysclk_per_mu
        self.urukul0_ch0.t_acc_start_mu = t_io_update_mu
        self.urukul0_ch0.ftw = ftw
        self.urukul0_ch0.pow = pow_
        self.urukul0_ch0.asf = asf
        return pow_

    @kernel
    def mirror(self, debug_ttl):
        # align to coarse RTIO which aligns SYNC_CLK
        at_mu(now_mu() & ~7)
        # delay by 1 nanosecond, otherwise phase discontinuity
        # see also https://github.com/m-labs/artiq/issues/2776
        delay_mu(1)
        # pre-calculate RTIO timeline cursor at next rising flank of io_update
        T_write64_mu = 1248 # RTIO timeline cursor advancement per register-write
        io_update_delay_mu = int64(self.urukul0_ch0.sync_data.io_update_delay)
        t_io_update_mu = now_mu() + T_write64_mu + io_update_delay_mu
        # f_mirror + f = 1*GHz means that we just flip all of ftw's bits
        ftw_mirror = ~self.urukul0_ch0.ftw
        # pre-calculate phase-offset word of AD9910's phase accumulator at next rising flank of io_update
        dt_mu = t_io_update_mu - self.urukul0_ch0.t_acc_start_mu
        if self.urukul0_ch0.ftw >= 0:
            # regular frequency, so AD9910 has been incrementing phase accumulator
            self.urukul0_ch0.acc_pow += self.urukul0_ch0.ftw * dt_mu * self.urukul0_ch0.sysclk_per_mu
        else:
            # mirror frequency, so AD9910 has been decrementing phase accumulator
            self.urukul0_ch0.acc_pow -= ftw_mirror * dt_mu * self.urukul0_ch0.sysclk_per_mu
        # pre-calculate phase-offset word of AD9910's total output phase at next rising flank of io_update
        pow_io_update_mu = (self.urukul0_ch0.acc_pow >> 16) + self.urukul0_ch0.pow
        # mirror phase-offset word around a multiple of 2π
        self.urukul0_ch0.pow -= 2*pow_io_update_mu
        # write mirror frequency to single-tone register
        self.urukul0_ch0.ftw = ftw_mirror
        self.urukul0_ch0.write64(_AD9910_REG_PROFILE0 + 7,
                                 (self.urukul0_ch0.asf << 16) | (self.urukul0_ch0.pow & 0xffff), self.urukul0_ch0.ftw)
        delay_mu(io_update_delay_mu)
        # record new switching time
        self.urukul0_ch0.t_acc_start_mu = now_mu()
        # transfer mirror frequency to active output register
        self.urukul0_ch0.io_update.pulse_mu(8)
        delay_mu(84)
        debug_ttl.off()
        delay_mu(-84)
        # verify that we pre-calculated the accumulated phase correctly
        if self.urukul0_ch0.t_acc_start_mu != t_io_update_mu:
            raise ValueError("You pre-calculated the RTIO time cursor wrongly, \
                             so you caused a phase discontinuity. Check on scope!")

    @kernel
    def dds_set(self, frequ: TFloat, turns: TFloat, amp: TFloat,
                phase_mode: TInt32 = _PHASE_MODE_DEFAULT,
                ref_time_mu: TInt64 = int64(-1)):
        self.set_mu(self.frequency_to_uint32(frequ),
                    self.urukul0_ch0.turns_to_pow(turns),
                    self.urukul0_ch0.amplitude_to_asf(amp),
                    phase_mode, ref_time_mu)

    @rpc(flags={"async"})
    def print_async(self, d):
        print("d =", d)

    @kernel
    def run(self):
        self.init()
        self.core.reset()
        self.core.break_realtime()
        for d in range(-20, 20+1, 1):
            for p in range(1):
                delay(50*ms)
                self.dds_set(100*MHz, 0.0, 0.1, PHASE_MODE_ABSOLUTE)
                delay(20*us)
                self.urukul0_ch0.sw.on()
                delay(151*ns)
                self.ttl0.on()
                self.ttl1.on()
                self.ttl2.on()
                delay(-1*us + d*ns)
                self.mirror(self.ttl0)
                delay_mu(434+8-5*d)
                self.mirror(self.ttl3)
                delay_mu(4321+2*d)
                self.mirror(self.ttl3)
                delay_mu(459+4*d)
                self.mirror(self.ttl3)
                delay_mu(3456+11*d)
                self.mirror(self.ttl3)
                # delay(2*s)
                delay_mu(2349+1*d)
                self.mirror(self.ttl3)
                delay_mu(2541-9*d)
                self.mirror(self.ttl3)
                delay_mu(984-4*d)
                self.mirror(self.ttl1)
                delay_mu(12345+3*d)
                self.mirror(self.ttl3)
                delay_mu(491+7*d)
                self.mirror(self.ttl3)
                delay_mu(42459-8*d)
                self.mirror(self.ttl3)
                delay_mu(987+31*d)
                self.mirror(self.ttl2)
                delay_mu(38132+d)
                self.mirror(self.ttl3)
                delay(1*us)
                self.urukul0_ch0.sw.off()
                delay(10*ms)
                self.print_async(d)
                # print_binary(spow, uint64, 64, print_bits=False)
                # print_binary(0x7fffffff, print_bits=True)
                # print_binary(self.frequency_to_uint32(300*MHz), print_bits=False)
                self.core.wait_until_mu(now_mu())

dpn · 13 Jun

Nothing much to add about the specific use case, as I haven't looked into the DRG myself, but in general:

Clearly, the required delay between at_mu(now_mu() & ₇) and t_io_update_mu = now_mu() + [...] depends on the frequency.

That sounds like two separate problems. Assuming that your DDS is properly set up for phase control (clocked phase-coherently with the FPGA and sync delay/IO_UPDATE alignment tuned correctly), updates should be applied completely deterministically within the AD9910 state machine (PROFILE pin switching is another determinacy issue, but you are not doing that). Thus, it sounds like this frequency-dependent delay may in fact be an incorrect time-offset in some calculation (with the different alignments "lining up" different period multiples for each frequency, rather than the true zero that would work for all frequencies).

Beware that phase discontinuities do occur if one introduces a delay > 2 seconds (roughly) between any of the frequency mirrorings. The most likely cause is improper handling of integer overflows / underflows during multiplication.

Yep, roughly 2 seconds sounds suspiciously like 2³¹ nanoseconds, so if I had to bet, I'd say that an intermediate value is incorrectly using 32 bit precision. Sign bits can be a bit tricky to get right, but in general, the multiplication easily accessible from ARTIQ Python (cast to int64 first to get 32 bit x 32 bit -> 64 bit) should be enough to calculate all the phases given the 32 bit width of the DDS registers.

Ddtsevas · 15 Jun

dpn Thank you, that helps. Do you see a mistake in my calculation of dt_mu below? Or another logic error?

In device_db.py, Urukul 0 is set to 1 GHz SYS_CLK and 250 MHz SYNC_CLK.
I verify that self.urukul0_ch0.sync_data.io_update_delay == self.urukul0_ch0.tune_io_update_delay().

I schedule self.urukul0_ch0.io_update.pulse_mu(8) only at coarse RTIO time stamps because only those have a constant delay to the SYS_CLK cycle where the new FTW comes into effect. I also store the time stamp now_mu() right before each self.urukul0_ch0.io_update.pulse_mu(8) like so:

# align to coarse RTIO which aligns SYNC_CLK
at_mu(now_mu() & ~7)
delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
# store time stamp
self.urukul0_ch0.t_acc_start_mu = now_mu()
# transfer new FTW to active output register
self.urukul0_ch0.io_update.pulse_mu(8)

At the next frequency change, the duration dt_mu of phase accumulation only depends on the difference of the two time stamps:

# align to coarse RTIO which aligns SYNC_CLK
at_mu(now_mu() & ~7)
delay_mu(int64(self.urukul0_ch0.sync_data.io_update_delay))
# calculate duration of phase accumulation
now = now_mu()
dt_mu = now - self.urukul0_ch0.t_acc_start_mu
# store new time stamp
self.urukul0_ch0.t_acc_start_mu = now
# transfer mirror frequency 1 to active output register
self.urukul0_ch0.io_update.pulse_mu(8)

dpn · 15 Jun

dtsevas That sounds sensible. (4 mu alignment sufficient instead of 8, but of course, this doesn't hurt, and there isn't really a reason to go for the finer one given the event dispatch limitations.)

Just to be sure, I'd also check the output of the sync-delay auto-tuning routine, as in my experience this is a good first test for any clocking issues. (I'd expect to see a window width of 2, or at least 1.)

Ddtsevas · 16 Jun

dpn By default, my device_db.py contains

        "sync_delay_seed": "eeprom_urukul0:68",
        "io_update_delay": "eeprom_urukul0:68"

and the experiment

        print("IO_UPDATE | measured optimal delay:", self.urukul0_ch0.tune_io_update_delay(),
              " | actually used:", self.urukul0_ch0.sync_data.io_update_delay)
        delay(20*ms)
        sync_in_delay, window_size = self.urukul0_ch0.tune_sync_delay()
        print("SYNC_IN   | measured optimal delay:", sync_in_delay,
              "| actually used:", self.urukul0_ch0.sync_data.sync_delay_seed,
              "\n              measured window size:", window_size)

outputs:

IO_UPDATE | measured optimal delay: 0  | actually used: 0
SYNC_IN   | measured optimal delay: 15 | actually used: -1 
              measured window size: 5

There are some variations over time:

After power-on, self.urukul0_ch0.tune_io_update_delay() returned 3 in the first few experiment runs and has been returning 0 ever since.
self.urukul0_ch0.tune_sync_delay() has always been returning either 2 and 5 for the window size.

Variations over time are bad, right?

Also, what does self.urukul0_ch0.sync_data.sync_delay_seed == -1 mean?

Luckily, I can offer some good news: My time calculations were correct and the DDS parameter values are updated with deterministic timing. My phase tracking was still wrong because self.urukul0_ch0.set_cfr1(phase_autoclear=1) and self.urukul0_ch0.io_update.pulse_mu(8) clear the phase accumulator a time dt *before* the new DDS parameter values come into effect. Possible solutions:

In PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING, instead of self.urukul0_ch0.acc_pow = 0, I could try to do self.urukul0_ch0.acc_pow = round(ftw * dt). I don't know if the required dt would be always be constant or if it would be the same for all AD9910 chips. But it seems that dt = 4*ns. If that is true, it might mean that the phase accumulator gets cleared when the io_pulse is registered by SYNC_CLK's rising edge, whereas the new DDS parameter values come into effect one SYNC_CLK cycle (4 ns) later.
I could reset the phase accumulator via self.urukul0_ch0.set_mu(0, 0, 0, PHASE_MODE_ABSOLUTE) and afterwards use *only* PHASE_MODE_CONTINUOUS, because PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING reset the phase accumulator and thereby screw up the phase tracking.

dpn · 16 Jun

Seeds < 0 mean that auto-tuning during init() is disabled. (I haven't used the EEPROM sync_data yet, but I assume it behaves the same as when specified in the device DB.) With that, the multi-chip sync functionality should be disabled altogether. This leads to a non-deterministic relationship of the 250 MHz DDS SYNC_CLK to the 125 MHz FPGA clock (due to the intermediary 1 GHz SYSCLK stage). You'll want to enable that for the timing relationships to be stable across reboots.

I haven't checked the behaviour of the auto-tune function in anger recently (we have a manual experiment to plot the incidence of errors across all possible delays for debugging clock issues), but I don't think I've ever seen a width of 5. As long as it is happy at all, the clocking should be fine, though.

Regarding the autoclear timing, I don't think "a fraction of a nanosecond" can be quite right, since the DDS core only runs at 1 GHz (so 1 ns). Now that you mention it, I do remember that we were seeing a small glitch as well back in 2020 during development of phase-coherent SUServo, which we couldn't quite get rid of (within a day or two of work). In the end, our preliminary understanding was that, referencing the timing diagram in the manual (fig. 49), the accumulator clear might already happen at A, while the new POW is loaded a cycle later at B. It might actually only be a two or three clock SYSCLK cycles, but we didn't follow this up further, as we just switched the SUServo implementation not to clear after the first time (as you suggest in 2), rather just keeping track of the evolution of the hardware accumulator in the gateware driver.

Just to be explicit, I think your calculation is probably correct and will give the correct phase relationship after the glitch has settled, but I am not sure there is a way to get rid of the glitch.

Ddtsevas · 16 Jun

dpn I came to the same conclusion after some experimenting. I think it's exactly one SYNC_CLK cycle (= 4 ns) and it happens exactly in the order you described, i.e. phase accumulator is cleared at A and new parameter values become effective at B in Figure 49. "I/O_UPDATE Transferring Data from I/O Buffer to Active Registers" of the AD9910 manual, which you referenced. This can be compensated by the new POW, though, so no problems here. If the glitch cannot be removed, we just manually keep track of the phase accumulator's value across the entire experiment via self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.

dpn · 16 Jun

Can it be compensated, though? What we saw was a glitch where the output went to zero absolute phase (cosine output -> max voltage) for a SYNC_CLK cycle and then resumed with the new parameters. Even with the correct POW, the glitch would still be present. Don't take this as gospel, though; as I said, there was still an alternative solution available, so we stopped experimenting fairly quickly.

Ddtsevas · 16 Jun

dtsevas Correction to myself: The correct starting value of the accumulator after an auto-clear is self.urukul0_ch0.acc_pow = -new_ftw*4.
This is *wrong*: self.urukul0_ch0.acc_pow = (old_ftw - new_ftw) * 4.
Surprisingly, it seems that no phase is accumulated within the one SYNC_CLK cycle that starts with registration of the io_pulse (point A in the aforementioned Figure) and ends with activation of the new register values (point B in the aforementioned Figure), no matter the old value of the active FTW.

Ddtsevas · 16 Jun

dpn I just experimented a bit with the accumulator auto-clear and the glitch was always there, you are right. I will do the same as you: Reset the accumulator once before the important part of the experiment starts and then keep track of the accumulator's value throughout the experiment.

Do you see any value in PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING? I am thinking about replacing them by a new function called ad9910.clear_accumulator(self) and calling that function once in ad9910.init(). I want to submit a pull request to github.com/m-labs/artiq sometime soon.

Ddtsevas · 16 Jun

@dpn I want to synchronize all AD9910 chips to the Kasli SoC's clock and to align io_update to the coarse RTIO grid (8 ns spacing). I could write the outputs of ad9910.tune_sync_delay() and ad9910.tune_io_update_delay() manually into device_db.py. Do you know of any other way? How does "io_update_delay": "eeprom_urukul0:64" work, for instance?

Ddtsevas · 16 Jun

dtsevas Or maybe I should keep PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING and make them work without clearing the accumulator?

dpn · 16 Jun

dtsevas Yes, there absolutely is value in PHASE_MODE_TRACKING. We use it all over the place as the default for coherent operations, as it provides a solution that works irrespective of global state. This can be valuable to keep otherwise unnecessary interdependencies out of the code, for instance when using the same RF chain to address multiple transitions that don't necessarily have anything to do with each other on the code side. The glitch doesn't matter if it only occurs once at the begining of a pulse (where, in addition, likely the RF switch is still off anyway).

As long as you know all the timing information ahead of time (e.g. no DMA sequences, or at most one global one), I suppose you could just add the "software phase tracking" to some global device wrapper to similarly avoid a dependence of various pieces of code on each other, but for the cases where the glitch isn't important, it' s awhole lot of complexity for nothing.

dtsevas We haven't been using the EEPROM support, as it didn't exist when we first started developing/using Urukul. There must be some sort of tool to write those values into the on-board EEPROM.

Ddtsevas · 16 Jun

dpn Understood, thank you! I will not attempt to change anything about PHASE_MODE_ABSOLUTE and PHASE_MODE_TRACKING. I have offered in the M-Labs chatroom to prepare a pull request with the accumulator tracking and frequency mirroring. Let's see if they think it makes sense to add to global ARTIQ or not.

Note: The overflow/underflow errors are also fixed. Turns out the AD9910 phase accumulator treats the FTW as an *unsigned* integer, so one can always calculate like this:

self.acc_pow += self.ftw * dt_mu * self.ad9910.sysclk_per_mu