sb10q Yes, I can create 200 MB HDF5 files (one dataset with
np.zeros(26000000)) with no problem; no explicit compression used:
with h5py.File('blah.h5', 'w'):
This is done on the same computer that runs
I tried messing around with saving files through jobs submitted to
artiq_master a little bit more (created dummy jobs that only save a large array into dataset). The limit seems to be around
np.zeros(25900000), corresponding to about 101 MB on disk, but the threshold can move around. Sometimes I can save a good HDF5 file with
np.zeros(25900000), but sometimes it will give me an HDF5 file with a
bad object header version number error, all ran with the exact same arguments. A trend I noticed is that bad HDF5 files will be slightly smaller than the good ones (on the order of 1 to a few tens of KB), but there could also be bad HDF5 files with the same file sizes as the good ones; all with the same arguments.
Posts from the HDF Group forum seem to suggest that there could something wrong during the file writing process. At this point I am not sure if it is a
h5py problem, or hardware problem (e.g. not enough memory because some of it is used for other processes running in the background) despite our upgraded RAM, or timeout related issues.
Regarding my suspicion involving timeouts, sometimes when bad HDF5 files are written, I see log entries on my
artiq_dashboard along the lines of
worker refuses to die or a job running in parallel (interrupted through
scheduler.pause()) would terminate because of
core device connection closed unexpectedly. This does not happen for all the bad HDF5 files written (most of them are near the threshold), but it happens very repeatedly if I try to write a very very large file such that it far exceeds the threshold (but still way below 200 MB of file size on disk). So it seems to me that something is watching the tasks, killing them whenever their time is up.
You mentioned that there is no watchdog for the writing of the HDF5, but are there other watchdogs watching relevant processes (in asyncio maybe) that may affect the saving of the HDF5 file?