Why You Should Be Using Kubernetes to Run Your Containerized Jobs

May 15, 2024 Kubernetes, Uncategorized
Avatar photo

Russell Bates

Understanding Kubernetes Jobs

Kubernetes Jobs are a way to run one or more pods to completion on a cluster. A Job creates pods and ensures they run successfully until a specified number of completions is reached. This makes Jobs useful for running batch processes or parallel computational workloads.

How Jobs Work

When you create a Job, it starts running the pods specified in the Job’s template. As pods complete successfully, the Job keeps track of the successes. Once the desired number of successful pod completions is reached, the Job itself is considered complete.

If a pod fails or gets deleted, the Job will keep restarting it until either:

1) The pod succeeds
2) The Job hits its backoffLimit for maximum failed restart attempts

You can configure the Job’s parallelism to run multiple pods at once in parallel. This allows compute-intensive tasks to complete faster by spreading the load.

Jobs make it easy to run batch jobs or parallel processing reliably on Kubernetes. When the Job finishes, it cleans up any pods it created automatically.

Example: Calculating Pi

Let’s look at an example Job that calculates the value of pi to 2000 decimal places using Perl:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

This Job runs a single pod using the perl:5.34.0 image to execute the bpi function and print pi to 2000 places. The restartPolicy is set to Never since the Job only needs to run once to completion.

You can create the Job with kubectl apply -f job.yaml and monitor its status and logs with:

kubectl describe job pi 
kubectl logs job/pi

The logs will show the calculated value of pi once the Job succeeds.

Jobs are a great way to run parallel or batch workloads reliably on Kubernetes. The backoffLimit and parallelism settings allow you to control the Job’s execution. For recurring scheduled Jobs, you’ll want to look at CronJobs instead.

Running a Kubernetes Job to Calculate Pi

Kubernetes Jobs are useful for running batch processes or parallel workloads to completion on a cluster. In this tutorial, we’ll run a simple Job that calculates the value of pi to 2000 decimal places using Perl.

Job Configuration

First, let’s look at the Job configuration:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

A few key points about this Job:

  • It uses the official perl:5.34.0 image
  • The command runs the bpi function to calculate pi to 2000 places
  • restartPolicy is set to Never since the Job just needs to run once
  • backoffLimit of 4 for failed pod restart attempts

Creating the Job

To create the Job, simply run:

kubectl apply -f job.yaml

You should see output like: job.batch/pi created

Checking Job Status

You can check on the status of the running Job with:

kubectl describe job pi

This will show details like the number of active pods, completion time, and recent events.

Once the Job completes successfully, you’ll see Completed: 1 in the status.

Viewing Logs

To view the logs and get the calculated value of pi, you can run:

kubectl logs job/pi

This will output the full 2000 decimal places that were calculated.

You can also view the logs of the individual pod that ran by first getting the pod name:

pods=$(kubectl get pods --selector=job-name=pi --output=jsonpath='{.items[*].metadata.name}')
kubectl logs $pods

Cleaning Up

When the Job is complete, it will automatically clean up any pods it created. You can delete the Job object with:

kubectl delete job pi

This quick tutorial showed how to run a simple batch Job on Kubernetes to calculate pi. Jobs make it easy to run parallelized or discrete workloads reliably on a cluster.

Go deeper with Kubernetes jobs. Check out the official documentation.