By default ARTIQ saves data in the "results" subfolder under the working directory that artiq_master is called at. Is there an argument that can be used to change the data saving directory?

We edited the artiq_master.py file in our ARTIQ fork to save data in arbitrary locations, but I am wondering if vanilla ARTIQ supports or should support this.

Why do you need this? Is changing the current directory before running the master not good enough? The general idea is the current directory is the "home folder" for your ARTIQ instance, with the device and dataset DBs and the results all in one place.

    sb10q Is changing the current directory before running the master not good enough?

    The most important reason for us is that we want to save data in a folder structure that groups data by names of different control computers, e.g.,

    • data
      • computer_1_data
        • artiq_data
        • other_data (for example, data manually recorded)
      • computer_2_data
        • artiq_data
        • other_data
      • ...

    The root "data" folder is shared between different computers with tools such as NAS or Google Drive. ARTIQ data are saved in the corresponding "artiq_data" folder of each of the computers. We can in principle run artiq_master from the "artiq_data" directory, but that creates one additional folder that we don't need in our data saving structure, and also stores the device_db.py and other log files on the NAS which we might not want.

    On the other hand, since artiq_master supports using user-defined paths for device_db, dataset_db, repository, and log files, shouldn't it support saving data files in user-defined directory too?

    @sb10q just want to follow up on this issue. Do you think this issue worth discussion in the artiq repo? I can also prepare a PR for adding an argument in artiq_master to save data to another location.

    4 years later

    Is there any update with regards to this functionality? It would be really great to be able to specify a directory to save data to. Also if we could add a file name prefix for better organization, that would be nice. Organizing data by timestamps alone isn't really all that helpful.

    4 months later

    I don't think there have been any developments, but also choosing output directories is rather contrary to the workflow that really works well in a lot of laboratories. The ARTIQ result files are all saved in a single output directory (well, subdivided by date), which acts as one central location for archival, with the RID (run id) acting as a unique identifier. This directory can be mirrored (using programs like lsyncd) to network shares as required, and analysis scripts, etc. can in turn pull from this central repository.

    By keeping the files in one place, with a single canonical naming scheme (the run id), data provenance is easy to ensure – even when coming back to some old results years later, all the old data will be easy to locate. Further search indexes, extra metadata, etc. can be built on top of this. Of course, this isn't the only potential design, but it really has worked well here.

    As for the implementation, see artiq.master.worker_db (and the top-level worker process code in artiq.master.worker_impl).

      dpn

      Apologies for the code posted below. I do not understand how to format it properly in the code brackets here. I keep trying and failing...

      I think we get the philosophy. But that workflow can be suboptimal depending on how sophisticated the measurements are. For instance, let's imagine a control script with the ability to run 200 different nested for loops. This is completely unmanageable/doesn't scale with the current nested loop approach (unless I am missing something). It is also a real challenge when data is manually recorded. Where does that get put? So the extremes of high level automation and manual data entry don't work great with the current workflow.

      I'll give an example for a high level of sophistication since the manual case is obvious.

      A much cleaner way to handle loops ( and change the order of the loops) would be to use numpy's meshgrid and a function like this to create the nested loops and make it easy to alter the order.

      `
      def loops(arrs):
      '''Creates nested for loops for arbitrary numbers of arguments,
      the first argument is the first loop, and the last is the last '''
      return np.transpose(np.meshgrid(
      arrs)).reshape(-1, len(arrs))


      loop_array = loops(x,y,z)
      `
      Then to track it these loops i.e. generate metadata maybe we want to throw this into a dataframe with column names

      '
      scan_df = pd.DataFrame(loop_array,columns = column_names)
      scan_df.to_csv(direc+'scan_params.csv')

      '
      .
      Now the 200 nested loops of the script is replaced with a single for loop. Each "state" of the experimental parameters is now recorded in the loops array.

      It would notionally be nice to throw the loop metadata into the H5 file. However, my understanding of the H5 file is that it is written after a scan and that the datasets in the environment don't support dataframes. Hence, we would like the ability to alter the default directory structure. I.e. It would be better, to make folders with the RID , default save the H5 file in that folder and then add other structures as needed in that new directory.

      For similar reasons, I think it would be an upgrade to the core language to explicitly make the scan object NoScan explicitly have a .sequence value. This would make the syntax identical to the other scan objects and it would make it easier to generate the sequence for the scans described above. To me it is not a "NoScan", it is a "Repeat Scan" that repeats x times or possibly x*y times if in a nested loop.

      `
      class RepeatScan(ScanObject):
      def init(self, value, repetitions=1):
      self.value = value
      self.repetitions = repetitions
      self.sequence = np.ones(self.repetitions) * self.value

      def _gen(self):
          for i in range(self.repetitions):
              yield self.value
      
      def __iter__(self):
          return self._gen()
      
      def __len__(self):
          return self.repetitions
      
      def describe(self):
          return {
              "ty": "BScan2",
              "value": self.value,
              "repetitions": self.repetitions
          }

      `