random bits of computer info: rsync

Monday, April 26, 2021

Rsync between volumes on two different OpenShift clusters

This is a short HOWTO about rsync-ing data between 2 distinct OpenShift clusters.

You always have the option to oc rsync the data from source OpenShift cluster to your local workstation and then oc rsync from your workstation to target cluster. But if you have halt a terabyte of data you may not have enough space or it may take several days because of network bandwidth limitation.

The method I describe below avoids any such inefficiencies as well the rsync process is restarted in case some network or system glitch kills it.

It basically works by having:

a kubeconfig file with access to the target OpenShift cluster inside a secret on the source OpenShift cluster
a pod on target OpenShift cluster with target volume mounted
a pod on source OpenShift cluster with source volume and kubeconfig secret mountedand an entrypoint running oc rsync

So lets start with generating a proper kubeconfig secret.

$ touch /tmp/kubeconfig
$ chmod 600 /tmp/kubeconfig
$ oc login --config=/tmp/kubeconfig # make sure to use target cluster API endpoint
$ oc project my-target-cluster-namespace --config=/tmp/kubeconfig

Note that command below will run against source OpenShift cluster.

$ oc login # use source cluster API endpoint
$ oc create secret generic kubeconfig --from-file=config=/tmp/kubeconfig

I will assume that you have your target pod already running inside target cluster. Otherwise you can create one similar to the pod in source cluster below, just use some entrypoint command to keep it permanently running. For example /bin/sleep 1000000000000000000.

Now all we need to do is run a proper pod in source cluster to do the rsync task. Here is an example pod YAML with comments to make clear what to use in your situation:

apiVersion: v1
kind: Pod
metadata:
  name: rsync-pod
  namespace: my-namespace-on-source-cluster
spec:
  containers:
    # use client version ±1 of target OpenShift cluster version
    - image: quay.io/openshift/origin-cli:4.6
      name: rsync
      command:
      - "oc"
      args:
      - "--namespace=my-target-cluster-namespace"
      - "--kubeconfig=/run/secrets/kube/config"
      # insecure TLS is not recommended but is a quick hack to get you going
      - "--insecure-skip-tls-verify=true"
      - "rsync"
      - "--compress=true"
      - "--progress=true"
      - "--strategy=rsync"
      - "/path/to/data/dir/"
      - "target-pod-name:/path/to/data/dir/"
      volumeMounts:
        - mountPath: /path/to/data/dir
          name: source-data-volume
        - mountPath: /run/secrets/kube
          name: kubeconfig
          readOnly: true
  # restart policy will keep restarting your pod until rsync completes successfully
  restartPolicy: OnFailure
  terminationGracePeriodSeconds: 30
  volumes:
    - name: source-data-volume
      persistentVolumeClaim:
        claimName: source-persistant-volume-claim-name
    - name: kubeconfig
      secret:
        defaultMode: 420
        secretName: kubeconfig

And last needed command is to run this pod inside the source cluster:

$ oc create -f rsync-pod.yaml

Now check what state is your pod in:

$ oc describe pod rsync-pod

If it start properly, then monitor your progress:

$ oc logs -f rsync-pod

Monday, February 15, 2016

rsync to/from OpenShift v3 pods

Update: Some pointed out `oc rsync` command already exist. Shame on me i missed that. My only solace is that it does not support all rsync options (yet). Read below only if standard command does not work for you (or if curious to know how it works).

I was thinking about easy copying files to/from OpenShift pods and thought it would be awesome if I can make `rsync` use `oc exec` instead of ssh to perform that. This should not be a common use case as pods should generally be stateless but one may want to backup data from a persistent volume for example when environment is not in control the pod owner's control.

You may already know that ssh access is not available to OpenShift/Kubernates managed pods but at least in openshift one can use the client tool to access them in a ssh-like fashion. That's done using the `oc rsh` and `oc exec` sub-commands.

In fact `oc rsh` only wraps `oc exec` by adding it's options `-i` to pass stdin to remote process as well `-t` for terminal. For rsync we only need `-i` though. Here's the magic incantation:

$ rsync -av -e 'oc exec -n fs1d4 -i myapp-1-vytqm' -- /tmp/ec2-user/ --:haha/
sending incremental file list
created directory haha
./
gagag
brewer/

sent 142 bytes  received 38 bytes  72.00 bytes/sec
total size is 4  speedup is 0.02

Let me dissect that for you. First we use the `-a` option of rsync because I want recursive sync keeping all file properties, as well `-v` to see what actually happened. You can add and mix any other `rsync` options here like `--delete`, `--exclude`, etc.

Then we specify the rsh command and that is 'oc exec -n fs1d4 -i myapp-1-vytqm' where `fs1d4` is your project name and `myapp-1-vytqm` is the desired pod name. You can use any other `oc` option here like `--config`, `--container`, etc. The important points are:

keep the `-i` option so that rsync can talk to remote process
do not include the `--` option terminator so that rsync can later add it

After that we use the `--` option terminator to tell rsync threat any further command parameters as path location specifiers and not options. This is important because our hack forces us to use `--` as a hostname in SRC or DST location specifier. More on that later.

Our SRC location specifier is `/tmp/ec2-user/` and that is a local test directory. The DST location specifier is `--:haha/` which means relative path `haha/` on host `--`. The reason is that if we only specify "haha/", then rsync will consider that location local and will not invoke the remote shell. So we need to specify some hostname. But whatever we specify, it will break our `oc exec` command. So I figured, I can specify `--` for the hostname and rsync will just append to the remote shell command. And in fact we need `--` appended, otherwise the remote call to `rsync` will fail.

You may already know that local `rsync` calls `rsync` on the remote end and then the two processes communicate over stdin/stdout. To avoid `oc exec` interpret options of the remote `rsync` invocation, we need that `--` as a hostname`. And to avoid `--:haha/` being interpreted as an option to local `rsync` invocation, we need the first `--` above.

I doubt my ability to explain but hopefully gave you some pointers how to make it work. Here are a few other important points:

make sure your pod has `rsync` already installed and pointed to in the pod PATH variable. Or use `--rsync-path` option.
make sure you have write access to destination dir
make sure you're already logged into OpenShift (oc login)
make sure to write correct commands as mistakes usually produce hardly informative output (`strace -f` helps to debug)

Hope that helps somebody.