Hi,

I am in the process of setting up our first ARTIQ rack and I would like to be able to read core log messages (e.g. [ 0.000009s] INFO(runtime): ARTIQ runtime starting...) from a central location.

Currently I am connected to the front panel USB/virtual COM port and I can read the log messages just fine, however in the long run I want to use something like graylog or similar.

So far I have discovered artiq_coremgmt log but I am not sure if it is the right tool for continuous logging since every call seems to open a new port:

watch artiq_coremgmt log
Every 2.0s: artiq_coremgmt log
[ 20323.039156s]  INFO(runtime::mgmt): new connection from 192.168.0.180:34248
[ 20325.254006s]  INFO(runtime::mgmt): new connection from 192.168.0.180:34250
[ 20327.468182s]  INFO(runtime::mgmt): new connection from 192.168.0.180:34252
[ 20329.691528s]  INFO(runtime::mgmt): new connection from 192.168.0.180:34254

..and it is probably not smart to do that.

So my questions are: How could I send the core log messages of a chosen level to a network location in the syslog "format"?

Thanks!
Jonas

There's aqctl_corelog, which is set up by the example device databases, and does continuous logging.
Currently there's no support for the syslog format, you'd have to send the output through a script or modify it.

Yeah I have tried this, but I get no output from it, so it runs but it does not print any lines. I called it like this

aqctl_corelog yyy.yyy.yyy.yyy -v

It appears in the UART log though:

[ 337.571831s] INFO(runtime::mgmt): new connection from xxx.xxx.xxx.xxx:55932

And when I do stuff the UART prints INFO messages but nothing in the aqctl_corelog console.

I checked -h and https://m-labs.hk/artiq/manual/utilities.html#core-device-logging-controller but it is not clear to me how this tool should work. Can you assist further?

Which ARTIQ version is it?

I think I am on the dev/beta/master branch:

Gateware ident 7.7614.011f3bdb.beta
shell-dev.nix from nix-scripts/artiq-fast @ 10e6220703
artiq (i.e. artiqSrc) @ 011f3bd

gateware compiled with

{
    "target": "kasli",
    "variant": "psi0",
    "hw_rev": "v2.0",
    "base": "standalone",
    "core_addr": "192.168.0.69",
    "peripherals": [
        {
            "type": "fastino",
            "ports": [0]
        },
        {
            "type": "zotino",
            "ports": [1]
        },
        {
            "type": "urukul",
            "hw_rev": "v1.5",
            "dds": "ad9910",
            "ports": [4, 5],
            "synchronization": true,
            "clk_sel": 2
        },
        {
            "type": "sampler",
            "hw_rev": "v2.2",
            "ports": [10, 11]
        }
   ]
}
8 days later

Tried with 6.7604.040aa6fd, same result. No output from aqctl_corelog 192.168.0.69 -v

I see it in the UART log

[ 113.775219s] INFO(runtime::mgmt): new connection from 192.168.0.191:49188

Just to be clear: I want to see the same messages from the UART output in a console on another host.

5 days later

It looks like aqctl_corelogwasn't updated as part of #1591. Adding something like the following should make it work (quickly tested on master - same as in comm_mgmt.py)

# (...)
# writer.write(b"ARTIQ management\n")

endian = await reader.readexactly(1)
if endian == b"e":
    endian = "<"
elif endian == b"E":
    endian = ">"
else:
    raise IOError("Incorrect reply from device: expected e/E.")

# writer.write(struct.pack("B", Request.PullLog.value))
# (...)
# while True:
    length, = struct.unpack(endian + "l", await reader.readexactly(4))

Is there a similar trick for getting the corelog out of satellite kasli's or am I restricted to using the USB serial output port only for this?

Important errors like sequence errors are reported on the master in DRTIO systems. Why do you need the satellite logs?

  • jdp replied to this.

    sb10q the satellite keeps rebooting after 15-20 minutes (but only when a sequence is running) and we are trying to work out why but there are no error messages on log - setting the log level only changes how much info the master core log returns.

    If the satellite crashes like that it is better to look at its UART log anyway.

    • jdp replied to this.

      sb10q is there anyway to increase the log level on the satellite to maybe identify the cause?

      Check its normal UART log first.

      • jdp replied to this.

        sb10q here is an example of what we see - the satellite is just peridocally rebooting with no error message, the master shows the aux packet errors and the link re-establishing but doesn't report why the satellite has fallen over.

        Satellite Log (Kasli just has 4x DIO SMA and 4x MCX boards)

         __  __ _ ____         ____
        |  \/  (_) ___|  ___  / ___|
        | |\/| | \___ \ / _ \| |
        | |  | | |___) | (_) | |___
        |_|  |_|_|____/ \___/ \____|
        
        MiSoC Bootloader
        Copyright (c) 2017-2021 M-Labs Limited
        
        Bootloader CRC passed
        Gateware ident 6.7582.c2248278;Kasli_Earth_Luna_artiq6_Satellite
        Initializing SDRAM...
        Read leveling scan:
        Module 1:
        00000000000011111111111000000000
        Module 0:
        00000000000011111111111100000000
        Read leveling: 17+-5 17+-6 done
        SDRAM initialized
        Memory test passed
        
        Booting from flash...
        Starting firmware.
        [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
        [     0.005612s]  INFO(satman): software ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
        [     0.014072s]  INFO(satman): gateware ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
        [     0.293009s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
        [     2.245255s]  INFO(board_artiq::si5324):   ...locked
        [     2.393374s]  INFO(satman): uplink is up, switching to recovered clock
        [     2.426444s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
        [     4.169475s]  INFO(board_artiq::si5324):   ...locked
        [     6.857361s]  INFO(board_artiq::si5324::siphaser): calibration successful, l                                                                                                                                                             ead: 80, width: 435 (349deg)
        [     7.098975s]  WARN(satman): aux packet error (routing error)
        [     7.306905s]  WARN(satman): aux packet error (routing error)
        [     7.521824s]  WARN(satman): aux packet error (routing error)
        [     7.729419s]  INFO(satman): TSC loaded from uplink
        [     7.733062s]  WARN(satman): aux packet error (routing error)
        [     7.940356s]  WARN(satman): aux packet error (routing error)
        [     8.343516s]  INFO(satman): rank: 1
        [     8.345622s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2                                                                                                                                                              0; }
        [    16.036747s]  INFO(satman): resetting RTIO
        [    48.406541s]  INFO(satman): resetting RTIO
        [    48.552867s]  INFO(satman): resetting RTIO
        [   110.166787s]  INFO(satman): resetting RTIO
        [   110.320345s]  INFO(satman): resetting RTIO
        [   383.968813s]  INFO(satman): resetting RTIO
        [   384.094257s]  INFO(satman): resetting RTIO
        
         __  __ _ ____         ____
        |  \/  (_) ___|  ___  / ___|
        | |\/| | \___ \ / _ \| |
        | |  | | |___) | (_) | |___
        |_|  |_|_|____/ \___/ \____|
        
        MiSoC Bootloader
        Copyright (c) 2017-2021 M-Labs Limited
        
        Bootloader CRC passed
        Gateware ident 6.7582.c2248278;Kasli_Earth_Luna_artiq6_Satellite
        Initializing SDRAM...
        Read leveling scan:
        Module 1:
        00000000000011111111111000000000
        Module 0:
        00000000000011111111111100000000
        Read leveling: 17+-5 17+-6 done
        SDRAM initialized
        Memory test passed
        
        Booting from flash...
        Starting firmware.
        [     0.000004s]  INFO(satman): ARTIQ satellite manager starting...
        [     0.005612s]  INFO(satman): software ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
        [     0.014072s]  INFO(satman): gateware ident 6.7582.c2248278;Kasli_Earth_Luna_                                                                                                                                                             artiq6_Satellite
        [     0.293009s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
        [     2.253302s]  INFO(board_artiq::si5324):   ...locked
        [     3.051533s]  INFO(satman): uplink is up, switching to recovered clock
        [     3.084603s]  INFO(board_artiq::si5324): waiting for Si5324 lock...
        [     4.827635s]  INFO(board_artiq::si5324):   ...locked
        [     8.500670s]  INFO(board_artiq::si5324::siphaser): calibration successful, l                                                                                                                                                             ead: 259, width: 435 (349deg)
        [     8.675852s]  WARN(satman): aux packet error (routing error)
        [     8.883582s]  WARN(satman): aux packet error (routing error)
        [     9.098713s]  WARN(satman): aux packet error (routing error)
        [     9.306158s]  INFO(satman): TSC loaded from uplink
        [     9.309789s]  WARN(satman): aux packet error (routing error)
        [     9.517295s]  WARN(satman): aux packet error (routing error)
        [     9.932076s]  INFO(satman): rank: 1
        [     9.934181s]  INFO(satman): routing table: RoutingTable { 0: 0; 1: 1 0; 2: 2                                                                                                                                                              0; }
        [    26.772851s]  INFO(satman): resetting RTIO
        [    86.098893s]  INFO(satman): resetting RTIO
        [    86.241249s]  INFO(satman): resetting RTIO
        
         __  __ _ ____         ____
        |  \/  (_) ___|  ___  / ___|
        | |\/| | \___ \ / _ \| |
        | |  | | |___) | (_) | |___
        |_|  |_|_|____/ \___/ \____|

        Master Log (Kasli just has 4x DIO SMA and 4x MCX boards)

        [104689.643272s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.651103s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.658938s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.666773s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.674608s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.682443s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.690281s] ERROR(runtime::moninj::remote_moninj): aux packet error (link went down)
        [104689.698390s] ERROR(runtime::moninj::remote_moninj): aux packet error (aux packet error)
        [104689.706144s] ERROR(runtime::moninj::remote_moninj): aux packet error (aux packet error)
        [104689.820219s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link RX became up, pinging  
        [104689.915386s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104690.324361s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104690.733413s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104691.142388s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104691.551559s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104691.960487s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104692.369576s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104692.778520s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104693.187559s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104693.596597s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104694.005673s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104694.414636s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104694.823560s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104695.232533s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104695.641696s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104695.849079s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] remote replied after 15 packets
        [104695.855886s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104696.067695s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104696.275156s] ERROR(runtime::moninj::remote_moninj): aux packet error (timeout)  
        [104696.461447s]  INFO(runtime::rtio_mgt::drtio): [LINK#0] link initialization completed
        [104696.468527s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] destination is up
        [104696.474657s]  INFO(runtime::rtio_mgt::drtio): [DEST#1] buffer space is 128
        [104696.681648s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] error(s) found (0x03):   
        [104696.687544s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] received packet of an unknown type
        [104696.695727s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] received truncated packet
        [105708.648765s]  INFO(runtime::session): new connection from 192.168.1.38:58273
        [105708.724260s]  INFO(runtime::kern_hwreq): resetting RTIO
        [105709.436403s]  INFO(runtime::session): no connection, starting idle kernel
        [105709.442471s]  INFO(runtime::session): no idle kernel found
        [155065.367958s]  INFO(runtime::moninj): new connection from 192.168.1.38:49884
        [155108.665804s]  INFO(runtime::session): new connection from 192.168.1.38:49890
        [155108.730199s]  INFO(runtime::kern_hwreq): resetting RTIO
        [155108.875240s]  INFO(runtime::kern_hwreq): resetting RTIO

        Is your power supply stable?

        • jdp replied to this.

          sb10q Yes I swaped master/satellite power and see same behaviour

          That doesn't mean anything. The power consumption can be different on both ends. What power supply are you using?

          • jdp replied to this.