sb10q the satellite keeps rebooting after 15-20 minutes (but only when a sequence is running) and we are trying to work out why but there are no error messages on log - setting the log level only changes how much info the master core log returns.

If the satellite crashes like that it is better to look at its UART log anyway.

  • jdp replied to this.

    sb10q is there anyway to increase the log level on the satellite to maybe identify the cause?

    Check its normal UART log first.

    • jdp replied to this.

      sb10q here is an example of what we see - the satellite is just peridocally rebooting with no error message, the master shows the aux packet errors and the link re-establishing but doesn't report why the satellite has fallen over.

      Satellite Log (Kasli just has 4x DIO SMA and 4x MCX boards)

       __  __ _ ____         ____
      |  \/  (_) ___|  ___  / ___|
      | |\/| | \___ \ / _ \| |
      | |  | | |___) | (_) | |___
      |_|  |_|_|____/ \___/ \____|
      
      MiSoC Bootloader
      Copyright (c) 2017-2021 M-Labs Limited
      
      Bootloader CRC passed
      Gateware ident 6.7582.c2248278;Kasli_Earth_Luna_artiq6_Satellite
      Initializing SDRAM...
      Read leveling scan:
      Module 1:
      00000000000011111111111000000000
      Module 0:
      00000000000011111111111100000000
      Read leveling: 17+-5 17+-6 done
      SDRAM initialized
      Memory test passed
      
      Booting from flash...
      Starting firmware.
      [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
      [     0.005612s]  INFO(satman): software ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
      [     0.014072s]  INFO(satman): gateware ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
      [     0.293009s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
      [     2.245255s]  INFO(board_artiq::si5324):   ...locked
      [     2.393374s]  INFO(satman): uplink is up, switching to recovered clock
      [     2.426444s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
      [     4.169475s]  INFO(board_artiq::si5324):   ...locked
      [     6.857361s]  INFO(board_artiq::si5324::siphaser): calibration successful, l                                                                                                                                                             ead: 80, width: 435 (349deg)
      [     7.098975s]  WARN(satman): aux packet error (routing error)
      [     7.306905s]  WARN(satman): aux packet error (routing error)
      [     7.521824s]  WARN(satman): aux packet error (routing error)
      [     7.729419s]  INFO(satman): TSC loaded from uplink
      [     7.733062s]  WARN(satman): aux packet error (routing error)
      [     7.940356s]  WARN(satman): aux packet error (routing error)
      [     8.343516s]  INFO(satman): rank: 1
      [     8.345622s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2                                                                                                                                                              0; }
      [    16.036747s]  INFO(satman): resetting RTIO
      [    48.406541s]  INFO(satman): resetting RTIO
      [    48.552867s]  INFO(satman): resetting RTIO
      [   110.166787s]  INFO(satman): resetting RTIO
      [   110.320345s]  INFO(satman): resetting RTIO
      [   383.968813s]  INFO(satman): resetting RTIO
      [   384.094257s]  INFO(satman): resetting RTIO
      
       __  __ _ ____         ____
      |  \/  (_) ___|  ___  / ___|
      | |\/| | \___ \ / _ \| |
      | |  | | |___) | (_) | |___
      |_|  |_|_|____/ \___/ \____|
      
      MiSoC Bootloader
      Copyright (c) 2017-2021 M-Labs Limited
      
      Bootloader CRC passed
      Gateware ident 6.7582.c2248278;Kasli_Earth_Luna_artiq6_Satellite
      Initializing SDRAM...
      Read leveling scan:
      Module 1:
      00000000000011111111111000000000
      Module 0:
      00000000000011111111111100000000
      Read leveling: 17+-5 17+-6 done
      SDRAM initialized
      Memory test passed
      
      Booting from flash...
      Starting firmware.
      [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
      [     0.005612s]  INFO(satman): software ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
      [     0.014072s]  INFO(satman): gateware ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
      [     0.293009s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
      [     2.253302s]  INFO(board_artiq::si5324):   ...locked
      [     3.051533s]  INFO(satman): uplink is up, switching to recovered clock
      [     3.084603s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
      [     4.827635s]  INFO(board_artiq::si5324):   ...locked
      [     8.500670s]  INFO(board_artiq::si5324::siphaser): calibration successful, l                                                                                                                                                             ead: 259, width: 435 (349deg)
      [     8.675852s]  WARN(satman): aux packet error (routing error)
      [     8.883582s]  WARN(satman): aux packet error (routing error)
      [     9.098713s]  WARN(satman): aux packet error (routing error)
      [     9.306158s]  INFO(satman): TSC loaded from uplink
      [     9.309789s]  WARN(satman): aux packet error (routing error)
      [     9.517295s]  WARN(satman): aux packet error (routing error)
      [     9.932076s]  INFO(satman): rank: 1
      [     9.934181s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2                                                                                                                                                              0; }
      [    26.772851s]  INFO(satman): resetting RTIO
      [    86.098893s]  INFO(satman): resetting RTIO
      [    86.241249s]  INFO(satman): resetting RTIO
      
       __  __ _ ____         ____
      |  \/  (_) ___|  ___  / ___|
      | |\/| | \___ \ / _ \| |
      | |  | | |___) | (_) | |___
      |_|  |_|_|____/ \___/ \____|

      Master Log (Kasli just has 4x DIO SMA and 4x MCX boards)

      [104689.643272s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.651103s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.658938s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.666773s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.674608s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.682443s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.690281s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
      [104689.698390s] ERROR(runtime::moninj::remote_moninj): aux packet error (aux packet error)
      [104689.706144s] ERROR(runtime::moninj::remote_moninj): aux packet error (aux packet error)
      [104689.820219s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging  
      [104689.915386s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104690.324361s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104690.733413s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104691.142388s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104691.551559s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104691.960487s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104692.369576s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104692.778520s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104693.187559s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104693.596597s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104694.005673s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104694.414636s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104694.823560s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104695.232533s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104695.641696s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104695.849079s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 15 packets
      [104695.855886s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104696.067695s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104696.275156s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
      [104696.461447s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
      [104696.468527s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
      [104696.474657s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128
      [104696.681648s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] error(s) found (0x03):   
      [104696.687544s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] received packet of an unknown type
      [104696.695727s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] received truncated packet
      [105708.648765s]  INFO(runtime::session): new connection from 192.168.1.38:58273
      [105708.724260s]  INFO(runtime::kern_hwreq): resetting RTIO
      [105709.436403s]  INFO(runtime::session): no connection, starting idle kernel
      [105709.442471s]  INFO(runtime::session): no idle kernel found
      [155065.367958s]  INFO(runtime::moninj): new connection from 192.168.1.38:49884
      [155108.665804s]  INFO(runtime::session): new connection from 192.168.1.38:49890
      [155108.730199s]  INFO(runtime::kern_hwreq): resetting RTIO
      [155108.875240s]  INFO(runtime::kern_hwreq): resetting RTIO

      Is your power supply stable?

      • jdp replied to this.

        sb10q Yes I swaped master/satellite power and see same behaviour

        That doesn't mean anything. The power consumption can be different on both ends. What power supply are you using?

        • jdp replied to this.

          sb10q Fair comment, we are using two 12V, 5A adaptors (XP Power AFM60US12C2) plugged into front and rear of kasli but I will swap over to an external bench top PSU with more current incase that is the issue.

          sb10q Tried an external benchtop PSU - current draw was only a few A at 12V, no change in behaviour

          2 years later

          @jdp did you ever resolve this issue? We are occasionally getting the "aux packet error (timeout)" error, accompanied by all DAC and TTL channels going to 0V.

          • jdp replied to this.
            10 months later

            rowanq apologies I missed the notification on this - we replaced with copper SFP connectors rather than fiber cables and this seemed to solve the problem. I can dig out the part number if you would find it useful