I'm setting up a DRTIO master and satellite. Both are Kasli v1.1.

[nix-shell:~/artiq-test]$ cat sinara-systems/brittonlab-kasli-master.json 
 
{
    "target": "kasli",
    "min_artiq_version": "6.0",
    "variant": "brittonlab-kasli-master",
    "hw_rev": "v1.1",
    "base": "master",
    "core_addr": "192.168.1.40",
    "peripherals": [
        {
            "type": "dio",
            "ports": [0],
            "bank_direction_low": "input",
            "bank_direction_high": "output"
        }
   ]
}

[nix-shell:~/artiq-test]$ cat sinara-systems/brittonlab-kasli-satellite.json 
{
    "target": "kasli",
    "min_artiq_version": "6.0",
    "variant": "brittonlab-kasli-satellite",
    "hw_rev": "v1.1",
    "base": "satellite",
    "peripherals": [
        {
            "type": "dio",
            "ports": [0],
            "bank_direction_low": "input",
            "bank_direction_high": "output"
        }
   ]
}


[nix-shell:~/artiq-test]$  python -m artiq.gateware.targets.kasli_generic sinara-systems/brittonlab-kasli-master.json

[nix-shell:~/artiq-test]$  python -m artiq.gateware.targets.kasli_generic sinara-systems/brittonlab-kasli-satellite.json

# Confirm that both Kasli #1 and Kasli #2 can be properly flashed with a master variant of the gateware/firmware. This is a check that the hardware is working properly. 

# connected USB to Kasli #1, Ethernet to SFP0 (192.168.1.40)
[nix-shell:~/artiq-test]$ artiq_flash --srcbuild -d artiq_kasli -V brittonlab-kasli-master
# confirmed that I can ping Kasli #1

# connected USB to Kasli #2, Ethernet to SFP0 (192.168.1.41)
[nix-shell:~/artiq-test]$ artiq_flash --srcbuild -d artiq_kasli -V brittonlab-kasli-master
# confirmed that I can ping Kasli #2
# GOOD: Both Kasli are working properly in master role

# keep USB connected to Kasli #2
[nix-shell:~/artiq-test]$ artiq_flash --srcbuild -d artiq_kasli -V brittonlab-kasli-satellite
# cycle power on Kasli #2

# connect Kasli #1 SFP1 (downstream) to Kasli #2 SFP0 (upstream)
# configure routing table
[nix-shell:~/artiq-test]$ artiq_route rt.bin init
[nix-shell:~/artiq-test]$ artiq_route rt.bin set 0 0
[nix-shell:~/artiq-test]$ artiq_route rt.bin set 1 1 0
[nix-shell:~/artiq-test]$ artiq_route rt.bin show
  0:   0
  1:   1   0
[nix-shell:~/artiq-test]$ artiq_coremgmt -D 192.168.1.40 config write -f routing_table rt.bin 
[nix-shell:~/artiq-test]$ artiq_coremgmt -D 192.168.1.40 reboot

Look at Kasli #1 master log.

[nix-shell:~/artiq-test]$ artiq_coremgmt -D 192.168.1.40 log
[     0.000009s]  INFO(runtime): ARTIQ runtime starting...
[     0.003933s]  INFO(runtime): software ident 7.7614.011f3bdb.beta;brittonlab-kasli-master
[     0.012046s]  INFO(runtime): gateware ident 7.7614.011f3bdb.beta;brittonlab-kasli-master
[     0.020186s]  INFO(runtime): log level set to INFO by default
[     0.025906s]  INFO(runtime): UART log level set to INFO by default
[     0.032290s]  INFO(runtime::rtio_clocking): using internal RTIO clock
[     0.309725s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     3.463786s]  INFO(board_artiq::si5324):   ...locked
[     3.494739s]  INFO(runtime): network addresses: MAC=80-1f-12-47-2c-7f IPv4=192.168.1.40 IPv6-LL=fe80::821f:12ff:fe47:2c7f IPv6=no configured address
[     3.512116s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; }
[     3.524637s]  INFO(runtime::mgmt): management interface active
[     3.538953s]  INFO(runtime::session): accepting network sessions
[     3.554368s]  INFO(runtime::session): running startup kernel
[     3.558864s]  INFO(runtime::session): no startup kernel found
[     3.564653s]  INFO(runtime::session): no connection, starting idle kernel
[     3.571506s]  INFO(runtime::session): no idle kernel found
[     3.576888s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[     5.554406s]  INFO(runtime::mgmt): new connection from 192.168.1.68:44310
[     7.598641s]  INFO(runtime::mgmt): new connection from 192.168.1.68:44312
[    21.177270s]  INFO(runtime::mgmt): new connection from 192.168.1.68:44314
[    23.768125s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
[    23.773096s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
[    23.979546s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
[    28.802686s]  INFO(runtime::mgmt): new connection from 192.168.1.68:44316

Look at Kasli #2 (satellite) UART log.

# look at UART on Kasli #2 (satellite) as it boots
[nix-shell:~/artiq-test]$ artiq_flash -t kasli -V brittonlab-kasli-satellite start; flterm /dev/ttyUSB2
Open On-Chip Debugger 0.10.0-snapshot (2021-04-08-04:15)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
none separate
adapter speed: 25000 kHz
Info : ftdi: if you experience problems at higher adapter clocks, try the command "ftdi_tdo_sample_edge falling"
Info : clock speed 25000 kHz
Info : JTAG tap: xc7.tap tap/device found: 0x13631093 (mfg: 0x049 (Xilinx), part: 0x3631, ver: 0x1)
Info : gdb server disabled
TEMP 63.17 C
VCCINT 0.991 V
VCCAUX 1.773 V
VCCBRAM 0.995 V
VPVN 0.000 V
VREFP 0.000 V
VREFN 0.000 V
VCCPINT 0.000 V
VCCPAUX 0.000 V
VCCODDR 0.000 V
 
__  __ _ ____         ____ 
|  \/  (_) ___|  ___  / ___|
| |\/| | \___ \ / _ \| |    
| |  | | |___) | (_) | |___ 
|_|  |_|_|____/ \___/ \____|
 
MiSoC Bootloader
Copyright (c) 2017-2021 M-Labs Limited
 
Bootloader CRC passed
Gateware ident 7.7614.011f3bdb.beta;brittonlab-kasli-satellite
Initializing SDRAM...
Read leveling scan:
Module 1:
00000000000011111111111000000000
Module 0:
00000000000011111111110000000000
Read leveling: 17+-5 16+-5 done
SDRAM initialized
Memory test passed
 
Booting from flash...
Starting firmware.
[     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
[     0.005612s]  INFO(satman): software ident 7.7614.011f3bdb.beta;brittonlab-kasli-satellite
[     0.013897s]  INFO(satman): gateware ident 7.7614.011f3bdb.beta;brittonlab-kasli-satellite
[     0.292661s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
[     2.920849s]  INFO(board_artiq::si5324):   ...locked

The problem is that ping fails repeatedly in log of master device. I only showed the first failure.

I will order 10 GBPS SFP transceivers and update the Sinara SFP page based on my observation. What SPF p/n does M-Labs recommend? -Joe

  • jdp replied to this.
    15 days later

    jbqubit i am having exactly this problem uising artiq6 gateware built in linux. The satellite keeps rebooting which drops connection with master and causes an RTIO unreachable error. If you use gtkterm to monitor kasli2 uart you should see it restart as it prints the bootloader.

    I changed to 10GBPS following recommendation from guys @ oxford but same error occurs so thibk it might be a gaaeware error if you also reproduce the same fault. Seems unlikely the satellite should reboot in response to loosing the drtio link.

    Let me know if you had more success than me!

      sb10q I replicated @jbqubit's issue with 10 GBPS transceivers and a different environment on the same machine. jdp I did not observe the satellite rebooting; although, it is not returning any pings to the master at all, so there is no drtio link to lose.

      • jdp replied to this.

        For reference: we successfully use FS part numbers 36351 and 36353 (as recommended on another thread) with fibers like e.g. part no. 40442. Gateware/firmware is now built against ARTIQ v7.7636.ea1dd2da.beta for Kasli v2.0.1 (master) and Kasli v1.1 (satellite).

        • jdp replied to this.

          rgresia Thanks for the feedback, sorry for hijacking the thread with a different problem!

          airwoodix Thanks - I am going to try reflashing against this version to see what happens, but we are using identical hardware

          airwoodix / rgresia /@sb10q I just rebuilt my master/satellite on 7.7637.522c2f59 and I replicate your error - now the satellite has nothing in the log, and I can't get the master to run anything on the satellite (master log below).

          If I run with release 6 gateware on Master and 7.7637.522c2f59 on satellite the system works (albeit with our recurring problem of the satellite randomly rebooting) but this at least shows the fiber connection is the not the limiting problem with master/satellite builds. I will attempt to roll back to 7.7636.ea1dd2da and try again, will report my findings kater

          [     0.000009s]  INFO(runtime): ARTIQ runtime starting...
          [     0.003934s]  INFO(runtime): software ident 7.7637.522c2f59.beta;Kasli_Earth_artiq6_Master
          [     0.012221s]  INFO(runtime): gateware ident 7.7637.522c2f59.beta;Kasli_Earth_artiq6_Master
          [     0.020513s]  INFO(runtime): log level set to INFO by default
          [     0.026238s]  INFO(runtime): UART log level set to INFO by default
          [     0.032613s]  INFO(runtime::rtio_clocking): using internal RTIO clock (by default)
          [     0.311207s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
          [     7.857771s]  INFO(board_artiq::si5324):   ...locked
          [     7.888651s]  INFO(runtime): network addresses: MAC=04-91-62-c6-ea-ea IPv4=192.168.1.71 IPv6-LL=fe80::691:62ff:fec6:eaea IPv6=no configured address
          [     7.903694s]  WARN(board_artiq::drtio_routing): could not read routing table from configuration, using default
          [     7.912449s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2 0; }
          [     7.926820s]  INFO(runtime::mgmt): management interface active
          [     7.941132s]  INFO(runtime::session): accepting network sessions
          [     7.956544s]  INFO(runtime::session): running startup kernel
          [     7.961015s]  INFO(runtime::session): no startup kernel found
          [     7.967067s]  INFO(runtime::session): no connection, starting idle kernel
          [     7.973659s]  INFO(runtime::session): no idle kernel found
          [     7.979061s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
          [    27.733913s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
          [    27.738815s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
          [    27.945405s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
          [    32.853484s]  INFO(runtime::mgmt): new connection from 192.168.1.54:58317
          [    48.101487s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
          [    48.306943s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
          [    68.462976s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] ping failed
          [    68.668391s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
          [   287.165685s]  INFO(runtime::session): new connection from 192.168.1.38:63671
          [   287.242330s]  INFO(runtime::kern_hwreq): resetting RTIO
          [   287.352925s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)
          [   287.560424s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] reset failed, aux packet error (timeout)
          [   287.666309s]  INFO(runtime::session): no connection, starting idle kernel
          [   287.672334s]  INFO(runtime::session): no idle kernel found
          [   287.970375s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)
          [   288.379421s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)

          '''

          @sb10q @airwoodix we have been playing around and have hit a roadblock.

          1) Prior to today, my gateware was built on 6.7582.c2248278 for both master and satellite, and I flashed the gateware and things worked - dds, ttl and analog. (Aside from rebooting every 20 mins). I flashed this back in March. The satellite log always reported a boot like below:

          __  __ _ ____         ____
          |  \/  (_) ___|  ___  / ___|
          | |\/| | \___ \ / _ \| |
          | |  | | |___) | (_) | |___
          |_|  |_|_|____/ \___/ \____|
          
          MiSoC Bootloader
          Copyright (c) 2017-2021 M-Labs Limited
          
          Bootloader CRC passed
          Gateware ident 6.7582.c2248278;Kasli_Earth_Luna_artiq6_Satellite
          Initializing SDRAM...
          Read leveling scan:
          Module 1:
          00000000000011111111111000000000
          Module 0:
          00000000000011111111111100000000
          Read leveling: 17+-5 17+-6 done
          SDRAM initialized
          Memory test passed
          
          Booting from flash...
          Starting firmware.
          [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
          [     0.005612s]  INFO(satman): software ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
          [     0.014072s]  INFO(satman): gateware ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
          [     0.293009s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
          [     2.245255s]  INFO(board_artiq::si5324):   ...locked
          [     2.393374s]  INFO(satman): uplink is up, switching to recovered clock
          [     2.426444s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
          [     4.169475s]  INFO(board_artiq::si5324):   ...locked
          [     6.857361s]  INFO(board_artiq::si5324::siphaser): calibration successful, l                                                                                                                                                             ead: 80, width: 435 (349deg)
          [     7.098975s]  WARN(satman): aux packet error (routing error)
          [     7.306905s]  WARN(satman): aux packet error (routing error)
          [     7.521824s]  WARN(satman): aux packet error (routing error)
          [     7.729419s]  INFO(satman): TSC loaded from uplink
          [     7.733062s]  WARN(satman): aux packet error (routing error)
          [     7.940356s]  WARN(satman): aux packet error (routing error)
          [     8.343516s]  INFO(satman): rank: 1
          [     8.345622s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2                                                                                                                                                              0; }
          [    16.036747s]  INFO(satman): resetting RTIO

          2) I swapped satellite to 7.7637.522c2f59 using latest artiq7 beta, and then it never gets past

          [     2.426444s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
          [     4.169475s]  INFO(board_artiq::si5324):   ...locked

          However, having first updated just the satellite and left the master on 6.7582.c2248278 this operated exactly as before.

          3) This morning I tried master also on 7.7637.522c2f59 and again see same behvaiour, except now when using Zotino self.z0.init() the AO card now errors.

          4) I attempted to roll-back to 6.7582.c2248278 on both satellite and master, using the same firmware compiled back in march and transferred using the same virtual environment. I now never see the satellite boot past

          [     2.426444s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
          [     4.169475s]  INFO(board_artiq::si5324):   ...locked

          and the zotino init error that was never an issue is now reproducible.

          If I flash the master with gateware compiled against an identical .json but now set to "standalone" then the code runs no issue so the Zotino is functional and there is not a hardware issue or problem with the artiq experiment file. I also used a third Kasli (all v1.1) and any pair behaves the same.

          At this point I am stuck trying to get back to where I started this morning --> Is there a reason why flashing the same firmware created back in March is giving me different behaviour now? Does the nix environment inherit new bootloader files etc that might have changed since I did this back in March which are not tracked by simply changing the artiq environment and then update when launching the nix shell?

          For reference, my nix environment script is simply nix-shell -I articSrc=~/artiqDev/artig6 and ~/artiqDev/nix-scripts/artiq-fast/shell-dev.nix

          Thanks in advance for the help

            jdp Does the nix environment inherit new bootloader files etc that might have changed since I did this back in March which are not tracked by simply changing the artiq environment and then update when launching the nix shell?

            No, it doesn't.

            6 days later

            jdp Me and @jbqubit have had success using this cable, but unfortunately I am unsure why. After moving around cables and several restarts trying to setup the rest of the system, it started working and have not had any issues since. No software changes were made.

            • jdp replied to this.
              a month later

              We now have this working in the lab using the following optical transcivers.

              • FS P/N: SFP-10G-BX #36351 :: Cisco SFP-10G-BXU Compatible 10GBASE-BX10-U BiDi SFP+ 1270nm-TX/1330nm-RX 10km DOM LC SMF Transceiver Module

              • FS P/N: SFP-10G-BX #36353 :: Cisco SFP-10G-BXD Compatible 10GBASE-BX10-D BiDi SFP+ 1330nm-TX/1270nm-RX 10km DOM LC SMF Transceiver Module

              Here's what the core device logs looks like when successful.

              $ artiq_coremgmt log
              [     0.000009s]  INFO(runtime): ARTIQ runtime starting...
              [     0.003932s]  INFO(runtime): software ident 7.unknown.beta;brittonlab-laserlab-left
              [     0.011609s]  INFO(runtime): gateware ident 7.unknown.beta;brittonlab-laserlab-left
              [     0.019304s]  INFO(runtime): log level set to INFO by default
              [     0.025024s]  INFO(runtime): UART log level set to INFO by default
              [     0.031410s]  INFO(runtime::rtio_clocking): using internal RTIO clock
              [     0.308853s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
              [     3.454839s]  INFO(board_artiq::si5324):   ...locked
              [     3.485744s]  INFO(runtime): network addresses: MAC=80-1f-12-47-2c-7f IPv4=192.168.1.40 IPv6-LL=fe80::821f:12ff:fe47:2c7f IPv6=no configured address
              [     3.500881s]  WARN(board_artiq::drtio_routing): could not read routing table from configuration, using default
              [     3.509634s]  INFO(board_artiq::drtio_routing): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2 0; }
              [     3.523794s]  INFO(runtime::mgmt): management interface active
              [     3.538105s]  INFO(runtime::session): accepting network sessions
              [     3.553560s]  INFO(runtime::session): running startup kernel
              [     3.558044s]  INFO(runtime::session): no startup kernel found
              [     3.563839s]  INFO(runtime::session): no connection, starting idle kernel
              [     3.570691s]  INFO(runtime::session): no idle kernel found
              [     3.576089s]  INFO(runtime::rtio_mgt::drtio): [DEST#0] destination is up
              [     3.782417s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging
              [     7.393209s]  INFO(runtime::mgmt): new connection from 192.168.1.81:58416
              [     9.076542s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 26 packets
              [     9.163652s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
              [     9.170398s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
              [     9.176845s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128
              [ 21167.820131s]  INFO(runtime::mgmt): new connection from 192.168.1.81:58908
              $ artiq_flash -t kasli start; flterm /dev/ttyUSB2
              Open On-Chip Debugger 0.10.0-snapshot (2021-07-12-19:45)
              Licensed under GNU GPL v2
              For bug reports, read
              	http://openocd.org/doc/doxygen/bugs.html
              none separate
              adapter speed: 25000 kHz
              Info : ftdi: if you experience problems at higher adapter clocks, try the command "ftdi_tdo_sample_edge falling"
              Info : clock speed 25000 kHz
              Info : JTAG tap: xc7.tap tap/device found: 0x13631093 (mfg: 0x049 (Xilinx), part: 0x3631, ver: 0x1)
              Info : gdb server disabled
              TEMP 88.47 C
              VCCINT 0.995 V
              VCCAUX 1.779 V
              VCCBRAM 1.000 V
              VPVN 0.000 V
              VREFP 0.000 V
              VREFN 0.000 V
              VCCPINT 0.000 V
              VCCPAUX 0.000 V
              VCCODDR 0.000 V
              
               __  __ _ ____         ____ 
              |  \/  (_) ___|  ___  / ___|
              | |\/| | \___ \ / _ \| |    
              | |  | | |___) | (_) | |___ 
              |_|  |_|_|____/ \___/ \____|
              
              MiSoC Bootloader
              Copyright (c) 2017-2021 M-Labs Limited
              
              Bootloader CRC passed
              Gateware ident 7.unknown.beta;brittonlab-kasli-satellite
              Initializing SDRAM...
              Read leveling scan:
              Module 1:
              00000000000111111111111000000000
              Module 0:
              00000000000111111111111000000000
              Read leveling: 16+-6 16+-6 done
              SDRAM initialized
              Memory test passed
              
              Booting from flash...
              Starting firmware.
              [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
              [     0.005611s]  INFO(satman): software ident 7.unknown.beta;brittonlab-kasli-satellite
              [     0.013374s]  INFO(satman): gateware ident 7.unknown.beta;brittonlab-kasli-satellite
              [     0.291616s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
              [     2.251909s]  INFO(board_artiq::si5324):   ...locked
              [     2.580583s]  INFO(satman): uplink is up, switching to recovered clock
              [     2.613653s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
              [     4.356685s]  INFO(board_artiq::si5324):   ...locked
              [     8.470009s]  INFO(board_artiq::si5324::siphaser): calibration successful, lead: 337, width: 437 (351deg)
              [     8.935267s]  INFO(satman): TSC loaded from uplink
              [     9.037914s]  INFO(satman): rank: 1
              [     9.040020s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2 0; }