We're running on the latest upstream artiq master gateware.
After an nondeterministic period of time in our experiments we encounter both link errors and sequence errors
The logs shown are as follows.
DEST #0
[ 1542.604249s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] error(s) found (0x04):
[ 1542.610091s] ERROR(runtime::rtio_mgt::drtio): [LINK#0] timeout attempting to get remote buffer space
[ 1542.619944s] WARN(runtime::rtio_mgt::drtio): [LINK#1] unsolicited aux packet: TSCAck
[ 1542.627026s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] error(s) found (0x04):
[ 1542.634141s] ERROR(runtime::rtio_mgt::drtio): [LINK#1] timeout attempting to get remote buffer space
[ 1542.645159s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0006:_1762eom
[ 1542.655393s] ERROR(runtime::rtio_mgt::drtio): [DEST#2] RTIO sequence error involving channel 0x0003:_ground_dp
[ 1542.867581s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1543.081299s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1545.149344s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0003:_650dp
[ 1545.363290s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0008:suservo0
[ 1555.827323s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0003:_650dp
[ 1556.041276s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1556.255132s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1556.468229s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0004:_614dp
[ 1556.682110s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0011:_493eom_11_ttl
[ 1556.896094s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1557.109112s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1557.322117s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0002:_493sigma
[ 1557.535115s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1557.748161s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1557.961308s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0003:_650dp
[ 1558.175227s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0006:_1762eom
[ 1558.389158s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1558.602279s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0005:_1762sp
[ 1560.615320s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0003:_650dp
[ 1560.829287s] ERROR(runtime::rtio_mgt::drtio): [DEST#1] RTIO sequence error involving channel 0x0008:suservo0
DEST #1
[ 2784.146092s] ERROR(satman): received packet of an unknown type
[ 2784.150586s] ERROR(satman): timeout attempting to get buffer space from CRI, destination=0xc0
[ 2784.159099s] ERROR(satman): write underflow, channel=0, timestamp=1545247125414, counter=1545372549488, slack=-125424074
[ 2784.173638s] ERROR(satman): received packet of an unknown type
[ 2784.178139s] ERROR(satman): received truncated packet
[ 2784.183159s] ERROR(satman): write underflow, channel=1, timestamp=437905711215, counter=1545400101264, slack=-1107494390049
DEST # 2
[ 2930.554069s] ERROR(satman): write underflow, channel=6, timestamp=1270499749083, counter=1545372648680, slack=-274872899597
[ 2930.563936s] INFO(satman): TSC loaded from uplink
[ 2930.568706s] ERROR(satman): received packet of an unknown type
[ 2930.574510s] ERROR(satman): received truncated packet
[ 2930.579547s] ERROR(satman): timeout attempting to get buffer space from CRI, destination=0x57
[ 2930.588077s] ERROR(satman): write underflow, channel=2, timestamp=1546005020011, counter=12363432125432, slack=-10817427105421
[ 2943.699773s] ERROR(satman): write underflow, channel=3, timestamp=1558521121013, counter=12376550453472, slack=-10818029332459
[ 3012.501525s] ERROR(satman): write underflow, channel=3, timestamp=1627322209389, counter=12445352213288, slack=-10818030003899
DEST # 3
Shows nothing on the log - it isn't involved in this particular experiment.
After both reinitialisation of devices and a core.reset devices on the offending satellites are unresponsive. After a power cycle of the satellite or a reset of the master FPGA (artiq_flash start) the satellites come back online.
There is an added effect of experiments hanging. This can be patched by replacing the rtio_input_data
calls for Sampler and SUServo with the timestamped alternative in the locations linked below.
https://github.com/m-labs/artiq/blob/c1f2ff371784ad47463969b98c455cf66abb2b6a/artiq/coredevice/suservo.py#L149
https://github.com/m-labs/artiq/blob/c1f2ff371784ad47463969b98c455cf66abb2b6a/artiq/coredevice/spi2.py#L241
The above behaviour leads me to believe that no rtio_output
events are being triggered on these satellites, thus there is no input data to read, causing the experiments to hang.
Any help in debugging this would be greatly appreciated - thanks in advance!