One-way syncing of GitHub to GitLab repositories#
Introduction#
In software development, it is crucial to use a version control system (VCS) to manage the code base. The most common VCS nowadays is Git with the well-known providers GitHub and GitLab. While GitHub has more publicity, GitLab can be self-hosted. The RRZE provides two GitLab instances for the university. The NHR@FAU group provides Cx services for these two instances. Some users at the university might have the use case that their main repository is at GitHub while they still want to use the Cx services attached to the GitLab instances. The NHR@FAU group has this use-case with their software tools like LIKWID (GitHub repository) which requires physical access to the hardware and therefore cannot be tested properly in containerized/virtualized environments.
Why this page?#
Although GitLab can be self-hosted and is open-source, some features are only available with distinct licenses and none of the GitLab instances at RRZE has such a distinct license (anymore). One of these features was "running Cx for a remote repository" which involved "pull mirroring". With "pull mirroring", the GitLab server integrates with the GitHub repository to pull changes and run Cx locally. The open-source license contains only "push mirroring". The following page contains an introduction how to do the one-side synchronization of the repositories with only freely usable features.
Situation#
What we have#
- Original repository at GitHub (like LIKWID)
- A local GitLab instance with open-source license (like the two GitLab servers operated by the RRZE [1], [2])
What we need#
- One-way synchronization of the original GitHub repository with a repository at a local GitLab instance
- Run the local CI/CD services
- Return CI/CD results back to original GitHub repository
Naming#
In the following we use distinct names for the repositories to make clear where things should be changed:
- GitHub repository: the original GitHub repository and the source for the one-way synchronization.
- GitLab repository: the duplicate repository of the GitHub repository and the destination of the one-way synchronization.
- Sync repository: The auxiliary repository used for synchronization.
How it is done#
1. Import remote GitHub repository to GitLab, the "GitLab repository"#
GitLab with all licenses provides a feature called "Import project"
which mirrors a remote repository. In order to import projects from
GitHub using the GitHub specific importer, you need an GitHub API token.
Go to https://github.com/settings/tokens (or Profile -> Settings ->
Developer settings -> Personal access tokens) and create a new token with
the repo
scope. This token is used just for importing, so a short
expiration period is enough. Copy the token to Git lab's importer and it
retrieves the list of accessible repositories from GitHub. Import your
repository, it will take some time. You can adjust the settings of the
repository already, e.g., enable NHR@FAU Cx services.
Moreover, unprotect your main branch in the GitLab repository at
Settings -> Repository -> Protected branches, if you want to get changes of
this branch as well.
2. Set up a separate repository at GitLab, the "Sync repository"#
There are multiple ways to achieve this but the most simple solution is a separate repository at GitLab which CI/CD pipeline updates the duplicated repository. So create an empty repository and enable CI/CD in the settings.
3. Create a pipeline trigger in the Sync repository#
Go to Settings -> CI/CD -> Pipeline triggers and create a new trigger
token. Note down the token. It will present some options how to trigger
it. We will use the web hook:
https://GIT_SERVER/api/v4/projects/PROJECT_ID/ref/REF_NAME/trigger/pipeline?token=TOKEN
.
The PROJECT_ID
is already filled out commonly but if not Settings ->
General shows it. The REF_NAME
is main
by default at GitLab.
4. Create an access token in the GitLab repository#
There are multiple ways to authenticate, here we use a Project Access Token. You can also use a SSH key pair.
Go to Settings -> Access Tokens and create a new token with the scope. Note down the token.
5. Register the GitLab repository write token in the Sync repository#
Add a CI/CD variable GIT_PUSH_TOKEN
with the Project Access Token of
the GitLab repository at Settings -> CI/CD -> Variables with the masked
flag enabled.
6. Create sync CI pipeline in Sync repository#
Create a file .gitlab-ci.yml
or use the GitLab pipeline editor:
update-repo:
script:
- git clone --mirror https://github.com/GITHUB_ORG/GITHUB_REPO.git
- cd GITHUB_REPO.git
- git remote remove origin
- git push --mirror "https://whatever:${GIT_PUSH_TOKEN}@GITLAB_SERVER/GITLAB_ORG/GITLAB_REPO.git"
First, the source repository is cloned with the --mirror
flag and we
change into the newly created folder. Afterwards we remove the remote
origin
because the --mirror
option contains: "Newly created local
refs will be pushed to the remote end, locally updated refs will be
force updated on the remote end, and deleted refs will be removed from
the remote end." (see git push
man page) We don't want to push anything to our origin
, the repository
at GitHub (https://github.com/GITHUB_ORG/GITHUB_REPO.git
). Finally, we
push everything to the duplicated GitLab repository also using the
--mirror
option and the GitLab repository write token.
7. Create a web hook in the GitHub repository#
Now we set up everything for the one-way synchronization and as soon as we trigger the pipeline in the Sync repository, all changes should be mirrored from the GitHub repository to the GitLab repository. Now we use the pipeline trigger created in 3. Go to the GitHub repository's Settings -> Web hooks and create a new one using the pipeline trigger URL. GitHub will trigger the web hook as soon as you press OK. You can check whether it was transmitted successfully by editing it and checking the Recent deliveries tab.
8. Create a status update token in the GitHub repository#
In order to send the status of the CI/CD pipeline execution, we need
some authentication with GitHub. In the first step, we already created a
similar token. Go to the page
again but create one with scope repo:status
and long expiration
period. Note down the token.
Add a CI/CD variable in the GitLab repository which contains the token
like GITHUB_API_TOKEN
and enable the masked
switch.
9. Send status updates about the CI/CD status of the GitLab repository#
As a last step we want the status of the CI/CD pipeline execution
visible in GitHub. GitHub shows a yellow dot when the pipeline is
running, so we need something in the beginning of our pipeline. The
final status should be transmitted as well. How your pipeline is
structured is up to you but here is how I did it. Add two new pipeline
stages .prenotify
and .postnotify
surrounding your current stages:
Add a single job to the .prenotify
stage:
For the .postnotify
stage we need two separate jobs to the the
when: on_success
and when: on_failure
feature of GitLab CI/CD:
notify-github-success:
stage: .postnotify
when: on_success
script:
- .ci/notify_github.sh success
notify-github-failure:
stage: .postnotify
when: on_success
script:
- .ci/notify_github.sh failure
The notify_github.sh
script is basically:
-
Create JSON file
status.json
with fields:state
with one ofpending
,success
,failure
orerror
target_url
with an URL to the pipeline, commonly${CI_PIPELINE_URL}
(optional)description
with some descriptive string likeCI runs at NHR@FAU systems <status>
(optional)context
can be used to differentiate later if you have multiple CI/CD systems integrated. I usedci/${CI_SERVER_HOST}
(optional)
-
Send it to GitHub using the token created in 8. Use
${CI_COMMIT_SHA}
to update the right commit at GitHub.
notify_github.sh
#!/bin/bash
GITHUB_ORG="<YOUR_GITHUB_ORG>"
GITHUB_REPO="<YOUR_GITHUB_REPO>"
GITHUB_SHA="${CI_COMMIT_SHA}"
cat << EOF > headers.curl
Accept: application/vnd.github+json
Authorization: token ${GITHUB_API_TOKEN}
EOF
cat << EOF > success.json
{
"state" : "success",
"target_url" : "${CI_PIPELINE_URL}",
"description" : "CI runs at NHR@FAU systems successful"
}
EOF
cat << EOF > failure.json
{
"state" : "failure",
"target_url" : "${CI_PIPELINE_URL}",
"description" : "CI runs at NHR@FAU systems failed"
}
EOF
cat << EOF > pending.json
{
"state" : "pending",
"target_url" : "${CI_PIPELINE_URL}",
"description" : "CI runs at NHR@FAU systems pending"
}
EOF
GITHUB_API_URL="https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/statuses/${GITHUB_SHA}"
if [ "$1" == "success" ]; then
curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @success.json
elif [ "$1" == "failure" ]; then
curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @failure.json
elif [ "$1" == "pending" ]; then
curl -s -X POST -H @headers.curl "${GITHUB_API_URL}" -d @pending.json
fi
Now every time the CI/CD pipeline of the GitLab repository runs, the
status is visible at GitHub. Unfortunately, I didn't find a way to catch
pipeline errors (bad YAML, ...) at GitLab to notify GitHub of the error
but you could use the error
state if the synchronization fails.
GitHub docs related to commit status updates
What if there are changes pushed to the GitLab repository directly?#
This is one-way synchronization from GitHub to GitLab. Don't change anything on the GitLab side because this probably causes the synchronization to fail! If it happens, delete the duplicated repository at GitLab, create a new duplication, create the push token and enable CI/CD. Finally, update the token in the Sync repository.
What about syncing multiple repositories?#
Often GitHub organizations host multiple repositories, each might
require syncing to GitLab. You can use the Sync repository to sync all
of them. Create a branch for each project. Afterwards create a web hook
for each branch using the branch name as REF_NAME
in the web hook URL.
Create a GitLab repository write token (5.) in each destination register
and add them to the CI/CD variables in the Sync repository with distinct
names. Update the .gitlab-ci.yml
files in the branches with the
different source & destination URLs and the variable names.
You could also add all sync jobs to the .gitlab-ci.yml
in the main
branch and use a single web hook but it is a waste of resources.
In case you want to know#
I tried using GitHub Actions that push to a remote repository at GitLab
but independent of the authentication used (token or SSH key pair), the
GitHub Action or the GitLab server did not allow connections
(Connection refused
).
I cannot use a local GitHub Actions runner because the system where the runner could be executed is located in a private network, so we need some globally accessible server where we can execute the git repository sync.