Why does the git integration require (?) two git repositories?

Bbodokaiser · Jul 16, 2024

Hello,

I fail to grasp the idea behind the git integration.

In the management tutorial, we setup the following files:

~/artiq-master
- /device_db.py
- /repository (bare git repo)
- /hooks
  - /post-receive
~/artiq-work (regular git repo)
- /mgmt_tutorial.py

We register ~/artiq-master/repository as origin of ~/artiq-work, allowing us to push commits from ~/artiq-work to ~/artiq-master/repository. When ~/artiq-master/repository receives a commit, the post-receive hook is triggered, which performs a rescan through the committed files.
We run artiq_master with the git flag -g inside the ~/artiq-master directory to interface with the artiq hardware and schedule the experiments.

Initially, I thought why not just have one git repository and use a post-push hook, but apparently there is no git support for such hook.
Second, maybe the idea of using two git repositories should promote the master-client architecture, but then, why not go "full-server" and have some gitlab runner executing the experiments?

So what is the reasoning behind using two git repositories to integrate ARTIQ into git? What is the suggested workflow?

architeuthis · Jul 17, 2024

As the management system tutorial says:

Using Git to host the experiment repository helps with the tracking of modifications to experiments and with the traceability of a result to a particular version of an experiment.

Note:
The workflow we will describe in this tutorial corresponds to a situation where the ARTIQ master machine is also used as a Git server where multiple users may push and pull code. The Git setup can be customized according to your needs; the main point to remember is that when scanning or submitting, the ARTIQ master uses the internal Git data (not any working directory that may be present) to fetch the latest fully completed commit at the repository’s head.

So textually on one hand the use of two repositories is indeed for the sake of the master-client architecture; the machine the ARTIQ master is running on is also the Git server and multiple clients can push experiments and run them using either of the ARTIQ clients. On the other hand, the intention is also reproducibility; if the master repository always corresponds to some named and recorded commit it should (given some basic discipline in the use of 'execute experiment outside repository' and/or the ARTIQ client's direct submit) be possible to consult the Git history to reproduce and trace a certain version of an experiment and connect it to its result.

In practice, if there's no wider client-server architecture and you're always running the client from the same machine as the master, I don't think there's that much reason to use two repos. As the manual says, the ARTIQ master always draws from the internal Git data, never the working directory, so you can get the same effect of reproducibility using a single, non-bare repository. Whether or not Git supports post-push hooks is irrelevant, because if you are only using Git for local version control there's nowhere to push to anyway. What you'd want to link the rescan to in that case is a post-commit hook, which very much does exist and is supported.

Can't say I'm familiar with GitLab runners, but would be the benefit of this as opposed to a post-receive or post-commit hook? Why add more complexity if it isn't necessary?

Edit: Or, TL;DR, see the management system page.

Bbodokaiser · Jul 17, 2024

architeuthis Thanks for the insights on using two repositories!

I often amend git commits, so a post-commit hook isn't as robust for my needs as a post-push hook, which provides an extra layer to refine commits before they propagate.

Regarding GitLab runners, I assume you have already hosted the experiment's git repository on some GitLab server.
You can then register the computer connected directly to the ARTIQ's master device as a GitLab runner and have it run artiq_master as a background service as well as artiq_client scan-repository --async as a GitLab runner job.
The advantage of such a setup would be that you isolate the interface between git and the ARTIQ's master device, which hides complexity and reduces the risk of breaking things for the end user.

But it is a bit of an investment to set these things up. My worry with the two repositories' local setup is that the end users don't really understand how it works and will opt out of using artiq_run all the time.