One-way syncing of GitHub to GitLab repositories#

Introduction#

In software development, it is crucial to use a version control system (VCS) to manage the code base. The most common VCS nowadays is Git with the well-known providers GitHub and GitLab. While GitHub has more publicity, GitLab can be self-hosted. The RRZE provides two GitLab instances for the university. The NHR@FAU group provides Cx services for these two instances. Some users at the university might have the use case that their main repository is at GitHub while they still want to use the Cx services attached to the GitLab instances. The NHR@FAU group has this use-case with their software tools like LIKWID (GitHub repository) which requires physical access to the hardware and therefore cannot be tested properly in containerized/virtualized environments.

Why this page?#

Although GitLab can be self-hosted and is open-source, some features are only available with distinct licenses and none of the GitLab instances at RRZE has such a distinct license (anymore). One of these features was "running Cx for a remote repository" which involved "pull mirroring". With "pull mirroring", the GitLab server integrates with the GitHub repository to pull changes and run Cx locally. The open-source license contains only "push mirroring". The following page contains an introduction how to do the one-side synchronization of the repositories with only freely usable features.

Situation#

What we have#

Original repository at GitHub (like LIKWID)
A local GitLab instance with open-source license (like the two GitLab servers operated by the RRZE [1], [2])

What we need#

One-way synchronization of the original GitHub repository with a repository at a local GitLab instance
Run the local CI/CD services
Return CI/CD results back to original GitHub repository

Naming#

In the following we use distinct names for the repositories to make clear where things should be changed:

GitHub repository: the original GitHub repository and the source for the one-way synchronization.
GitLab repository: the duplicate repository of the GitHub repository and the destination of the one-way synchronization.
Sync repository: The auxiliary repository used for synchronization.

How it is done#

1. Import remote GitHub repository to GitLab, the "GitLab repository"#

GitLab with all licenses provides a feature called "Import project" which mirrors a remote repository. In order to import projects from GitHub using the GitHub specific importer, you need an GitHub API token. Go to https://github.com/settings/tokens (or Profile -> Settings -> Developer settings -> Personal access tokens) and create a new token with the repo scope. This token is used just for importing, so a short expiration period is enough. Copy the token to Git lab's importer and it retrieves the list of accessible repositories from GitHub. Import your repository, it will take some time. You can adjust the settings of the repository already, e.g., enable NHR@FAU Cx services. Moreover, unprotect your main branch in the GitLab repository at Settings -> Repository -> Protected branches, if you want to get changes of this branch as well.

2. Set up a separate repository at GitLab, the "Sync repository"#

There are multiple ways to achieve this but the most simple solution is a separate repository at GitLab which CI/CD pipeline updates the duplicated repository. So create an empty repository and enable CI/CD in the settings.

3. Create a pipeline trigger in the Sync repository#

Go to Settings -> CI/CD -> Pipeline triggers and create a new trigger token. Note down the token. It will present some options how to trigger it. We will use the web hook: https://GIT_SERVER/api/v4/projects/PROJECT_ID/ref/REF_NAME/trigger/pipeline?token=TOKEN. The PROJECT_ID is already filled out commonly but if not Settings -> General shows it. The REF_NAME is main by default at GitLab.

4. Create an access token in the GitLab repository#

There are multiple ways to authenticate, here we use a Project Access Token. You can also use a SSH key pair.

Go to Settings -> Access Tokens and create a new token with the scope. Note down the token.

5. Register the GitLab repository write token in the Sync repository#

Add a CI/CD variable GIT_PUSH_TOKEN with the Project Access Token of the GitLab repository at Settings -> CI/CD -> Variables with the masked flag enabled.

6. Create sync CI pipeline in Sync repository#

Create a file .gitlab-ci.yml or use the GitLab pipeline editor:

    update-repo:
      script:
        - git clone --mirror https://github.com/GITHUB_ORG/GITHUB_REPO.git
        - cd GITHUB_REPO.git
        - git remote remove origin
        - git push --mirror "https://whatever:${GIT_PUSH_TOKEN}@GITLAB_SERVER/GITLAB_ORG/GITLAB_REPO.git"

First, the source repository is cloned with the --mirror flag and we change into the newly created folder. Afterwards we remove the remote origin because the --mirror option contains: "Newly created local refs will be pushed to the remote end, locally updated refs will be force updated on the remote end, and deleted refs will be removed from the remote end." (see git push man page) We don't want to push anything to our origin, the repository at GitHub (https://github.com/GITHUB_ORG/GITHUB_REPO.git). Finally, we push everything to the duplicated GitLab repository also using the --mirror option and the GitLab repository write token.

7. Create a web hook in the GitHub repository#

Now we set up everything for the one-way synchronization and as soon as we trigger the pipeline in the Sync repository, all changes should be mirrored from the GitHub repository to the GitLab repository. Now we use the pipeline trigger created in 3. Go to the GitHub repository's Settings -> Web hooks and create a new one using the pipeline trigger URL. GitHub will trigger the web hook as soon as you press OK. You can check whether it was transmitted successfully by editing it and checking the Recent deliveries tab.

8. Create a status update token in the GitHub repository#

In order to send the status of the CI/CD pipeline execution, we need some authentication with GitHub. In the first step, we already created a similar token. Go to the page again but create one with scope repo:status and long expiration period. Note down the token.

Add a CI/CD variable in the GitLab repository which contains the token like GITHUB_API_TOKEN and enable the masked switch.

9. Send status updates about the CI/CD status of the GitLab repository#

As a last step we want the status of the CI/CD pipeline execution visible in GitHub. GitHub shows a yellow dot when the pipeline is running, so we need something in the beginning of our pipeline. The final status should be transmitted as well. How your pipeline is structured is up to you but here is how I did it. Add two new pipeline stages .prenotify and .postnotify surrounding your current stages:

stages:
    - .prenotify
    - ...
    - .postnotify

Add a single job to the .prenotify stage:

notify-github-pending:
    stage: .prenotify
    when: always
    script:
    - .ci/notify_github.sh pending

For the .postnotify stage we need two separate jobs to the the when: on_success and when: on_failure feature of GitLab CI/CD:

notify-github-success:
    stage: .postnotify
    when: on_success
    script:
    - .ci/notify_github.sh success

notify-github-failure:
    stage: .postnotify
    when: on_success
    script:
    - .ci/notify_github.sh failure

The notify_github.sh script is basically:

Create JSON file status.json with fields:
- state with one of pending, success, failure or error
- target_url with an URL to the pipeline, commonly ${CI_PIPELINE_URL} (optional)
- description with some descriptive string like CI runs at NHR@FAU systems <status> (optional)
- context can be used to differentiate later if you have multiple CI/CD systems integrated. I used ci/${CI_SERVER_HOST} (optional)

Send it to GitHub using the token created in 8. Use ${CI_COMMIT_SHA} to update the right commit at GitHub.

    curl -s -X POST -H "Accept: application/vnd.github+json" \
         -H "Authorization: token ${GITHUB_API_TOKEN}" \
         https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/statuses/${CI_COMMIT_SHA}" \
         -d @status.json

notify_github.sh

#!/bin/bash
GITHUB_ORG="<YOUR_GITHUB_ORG>"
GITHUB_REPO="<YOUR_GITHUB_REPO>"
GITHUB_SHA="${CI_COMMIT_SHA}"
cat << EOF > headers.curl
Accept: application/vnd.github+json
Authorization: token ${GITHUB_API_TOKEN}
EOF
cat << EOF > success.json
{
    "state" : "success",
    "target_url" : "${CI_PIPELINE_URL}",
    "description" : "CI runs at NHR@FAU systems successful"
}
EOF
cat << EOF > failure.json
{
    "state" : "failure",
    "target_url" : "${CI_PIPELINE_URL}",
    "description" : "CI runs at NHR@FAU systems failed"
}
EOF
cat << EOF > pending.json
{
    "state" : "pending",
    "target_url" : "${CI_PIPELINE_URL}",
    "description" : "CI runs at NHR@FAU systems pending"
}
EOF
GITHUB_API_URL="https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/statuses/${GITHUB_SHA}"
if [ "$1" == "success" ]; then
curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @success.json
elif [ "$1" == "failure" ]; then
curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @failure.json
elif [ "$1" == "pending" ]; then
  curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @pending.json
fi

Now every time the CI/CD pipeline of the GitLab repository runs, the status is visible at GitHub. Unfortunately, I didn't find a way to catch pipeline errors (bad YAML, ...) at GitLab to notify GitHub of the error but you could use the error state if the synchronization fails.

What if there are changes pushed to the GitLab repository directly?#

This is one-way synchronization from GitHub to GitLab. Don't change anything on the GitLab side because this probably causes the synchronization to fail! If it happens, delete the duplicated repository at GitLab, create a new duplication, create the push token and enable CI/CD. Finally, update the token in the Sync repository.

What about syncing multiple repositories?#

Often GitHub organizations host multiple repositories, each might require syncing to GitLab. You can use the Sync repository to sync all of them. Create a branch for each project. Afterwards create a web hook for each branch using the branch name as REF_NAME in the web hook URL. Create a GitLab repository write token (5.) in each destination register and add them to the CI/CD variables in the Sync repository with distinct names. Update the .gitlab-ci.yml files in the branches with the different source & destination URLs and the variable names.

You could also add all sync jobs to the .gitlab-ci.yml in the main branch and use a single web hook but it is a waste of resources.

In case you want to know#

I tried using GitHub Actions that push to a remote repository at GitLab but independent of the authentication used (token or SSH key pair), the GitHub Action or the GitLab server did not allow connections (Connection refused).

I cannot use a local GitHub Actions runner because the system where the runner could be executed is located in a private network, so we need some globally accessible server where we can execute the git repository sync.