DRTIO synchronization

lriesebos

I have a DRTIO setup with an external 125 MHz clock input, see also https://forum.m-labs.hk/d/599-reference-clock-for-drtio-system. Now, I ran some tests to see if the two crates have their RTIO timestamps synced well.

When connecting two TTL outputs from the master to a scope, the pulse train overlaps and the outputs are clearly in sync with nanosecond precision.

For the same experiment with TTL outputs from the master and the satellite, there is a 276 ns latency for the TTL output on the satellite. in the figure, blue is the master.

if I look at the uart logs of the two crates, I see the following

master:

[2023-07-07 14:10:21] [     0.332466s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; }
[2023-07-07 14:10:21] [     0.344508s]  INFO(runtime::mgmt): management interface active
[2023-07-07 14:10:21] [     0.356530s]  INFO(runtime::session): accepting network sessions
[2023-07-07 14:10:21] [     0.369568s]  INFO(runtime::session): running startup kernel
[2023-07-07 14:10:21] [     0.414358s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
[2023-07-07 14:10:24] [     3.227096s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[2023-07-07 14:10:30] [     9.116778s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 29 packets
[2023-07-07 14:10:30] [     9.302240s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
[2023-07-07 14:10:30] [     9.349074s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
[2023-07-07 14:10:30] [     9.354518s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128

satellite:

[2023-07-07 14:10:21] [     0.029856s]  INFO(board_misoc::io_expander): MCP23017 io expander 0 not found. Checking for PCA9539.
[2023-07-07 14:10:21] [     0.058400s]  INFO(board_misoc::io_expander): MCP23017 io expander 1 not found. Checking for PCA9539.
[2023-07-07 14:10:22] [     0.456360s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[2023-07-07 14:10:24] [     2.401478s]  INFO(board_artiq::si5324):   ...locked
[2023-07-07 14:10:24] [     3.033241s]  INFO(satman): uplink is up, switching to recovered clock
[2023-07-07 14:10:24] [     3.066184s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[2023-07-07 14:10:26] [     4.794828s]  INFO(board_artiq::si5324):   ...locked
[2023-07-07 14:10:29] [     8.197790s]  INFO(board_artiq::si5324::siphaser): calibration successful, lead: 212, width: 433 (347deg)
[2023-07-07 14:10:30] [     8.687151s]  INFO(satman): TSC loaded from uplink
[2023-07-07 14:10:30] [     8.816918s]  INFO(satman): rank: 1
[2023-07-07 14:10:30] [     8.819018s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; }

So it does seem that the master takes the clock from the external 125 MHz and the satellite locks its clock to over DRTIO.
Though there is still this 276 ns latency. All runs on ARTIQ v7.8173.ff97675. @sb10q any idea why that is the case?

sb10q

lriesebos there is still this 276 ns latency. All runs on ARTIQ v7.8173.ff97675. @sb10q any idea why that is the case?

https://support.xilinx.com/s/article/58981 plus all the data processing on top.

lriesebos the latency seems to be constant over reboots

Yes, it is supposed to be.

lriesebos

just some added info

switching to ext0_synth0_125to125 does not make a difference for this issue (see https://github.com/m-labs/artiq/issues/1946)
the latency seems to be constant over reboots
two outputs on the satellite are synchronized, just as two outputs from the master

lriesebos

sb10q ok I see. so there is no compensation for this in gateware? if not, I guess we'll have to measure the latency and then compensate in software.

sb10q

Correct - generally the decision was made not to compensate for latency in ARTIQ because there are other user-supplied components after the ARTIQ devices that introduce unknown latency, so this has to be sorted out by the user either way.

Adding gateware to add a different user-defined time offset to each RTIO channel may have merit, but this needs to be considered carefully (in particular with the implementation of RTIO underflow detection). Also, with the speed of the Zynq devices we could perhaps simply do such offsetting in firmware at low performance cost and in a much easier way.

lriesebos

sb10q Ok, I accept that. Though instead of measuring the latency myself, can the latency also just be calculated based on the gateware implementation? Like, is there a known expression to calculate that?

sb10q

Things like your fiber length would also go into the equation, but yes it is deterministic and can be calculated (not with a simple expression though).

lriesebos

sb10q we have measured the latency to be 279 MU for our specific case. just wondering, do you know if (short) direct attach SFP cables result in lower latency because of the lack of optical transceivers? I assume using different length fibers will have negligible difference on the latency.

sb10q

No, the 10G BiDi optical transceivers are analog devices (laser intensity modulator for TX, PD->TIA->limiting amplifier for RX) and they contribute a negligible amount of latency. (Fun fact, you can hack DWDM SFPs to do low-cost FM spectroscopy of HCN by modifying their laser current driver to reduce the modulation depth, and exploit laser diode chirp. SNR is poor because of large residual AM but it works, at least until the telecom DFB manufacturers improve the chirp characteristics of their lasers)

lriesebos

@sb10q a late follow-up on this one if you don't mind. WRPLL was introduced in ARTIQ 8. did that by coincidence introduce any mechanism for a satellite to detect/measure its DRTIO uplink latency? I thought WR has some mechanisms to measure the latency on a link.

sb10q

No. WRPLL is only the jitter cleaner part.

It's not very complicated to measure the latency (WRPLL or otherwise), just not deemed very useful.