Apache Airflow Deployment

Contributors netapp-dorianh kevin-hoke mboglesby Download PDF of this page

NetApp recommends running Apache Airflow on top of Kubernetes. This section describes the tasks that you must complete to deploy Airflow in your Kubernetes cluster.

It is possible to deploy Airflow on platforms other than Kubernetes. Deploying Airflow on platforms other than Kubernetes is outside of the scope of this solution.

Prerequisites

Before you perform the deployment exercise that is outlined in this section, we assume that you have already performed the following tasks:

  1. You already have a working Kubernetes cluster.

  2. You have already installed and configured NetApp Trident in your Kubernetes cluster as outlined in the section “NetApp Trident Deployment and Configuration.”

Install Helm

Airflow is deployed using Helm, a popular package manager for Kubernetes. Before you deploy Airflow, you must install Helm on the deployment jump host. To install Helm on the deployment jump host, follow the installation instructions in the official Helm documentation.

Set Default Kubernetes StorageClass

Before you deploy Airflow, you must designate a default StorageClass within your Kubernetes cluster. The Airflow deployment process attempts to provision new persistent volumes using the default StorageClass. If no StorageClass is designated as the default StorageClass, then the deployment fails. To designate a default StorageClass within your cluster, follow the instructions outlined in the section Kubeflow Deployment. If you have already designated a default StorageClass within your cluster, then you can skip this step.

Use Helm to Deploy Airflow

To deploy Airflow in your Kubernetes cluster using Helm, perform the following tasks from the deployment jump host:

  1. Deploy Airflow using Helm by following the deployment instructions for the official Airflow chart on the Artifact Hub. The example commands that follow show the deployment of Airflow using Helm. Modify, add, and/or remove values in the custom- values.yaml file as needed depending on your environment and desired configuration.

    $ cat << EOF > custom-values.yaml
    ###################################
    # Airflow - Common Configs
    ###################################
    airflow:
      ## the airflow executor type to use
      ##
      executor: "CeleryExecutor"
      ## environment variables for the web/scheduler/worker Pods (for airflow configs)
      ##
      #
    ###################################
    # Airflow - WebUI Configs
    ###################################
    web:
      ## configs for the Service of the web Pods
      ##
      service:
        type: NodePort
    ###################################
    # Airflow - Logs Configs
    ###################################
    logs:
      persistence:
        enabled: true
    ###################################
    # Airflow - DAGs Configs
    ###################################
    dags:
      ## configs for the DAG git repository & sync container
      ##
      gitSync:
        enabled: true
        ## url of the git repository
        ##
        repo: "git@github.com:mboglesby/airflow-dev.git"
        ## the branch/tag/sha1 which we clone
        ##
        branch: master
        revision: HEAD
        ## the name of a pre-created secret containing files for ~/.ssh/
        ##
        ## NOTE:
        ## - this is ONLY RELEVANT for SSH git repos
        ## - the secret commonly includes files: id_rsa, id_rsa.pub, known_hosts
        ## - known_hosts is NOT NEEDED if `git.sshKeyscan` is true
        ##
        sshSecret: "airflow-ssh-git-secret"
        ## the name of the private key file in your `git.secret`
        ##
        ## NOTE:
        ## - this is ONLY RELEVANT for PRIVATE SSH git repos
        ##
        sshSecretKey: id_rsa
        ## the git sync interval in seconds
        ##
        syncWait: 60
    EOF
    $ helm install airflow airflow-stable/airflow -n airflow --version 8.0.8 --values ./custom-values.yaml
    ...
    Congratulations. You have just deployed Apache Airflow!
    1. Get the Airflow Service URL by running these commands:
       export NODE_PORT=$(kubectl get --namespace airflow -o jsonpath="{.spec.ports[0].nodePort}" services airflow-web)
       export NODE_IP=$(kubectl get nodes --namespace airflow -o jsonpath="{.items[0].status.addresses[0].address}")
       echo http://$NODE_IP:$NODE_PORT/
    2. Open Airflow in your web browser
  2. Confirm that all Airflow pods are up and running. It may take a few minutes for all pods to start.

    $ kubectl -n airflow get pod
    NAME                                READY   STATUS    RESTARTS   AGE
    airflow-flower-b5656d44f-h8qjk      1/1     Running   0          2h
    airflow-postgresql-0                1/1     Running   0          2h
    airflow-redis-master-0              1/1     Running   0          2h
    airflow-scheduler-9d95fcdf9-clf4b   2/2     Running   2          2h
    airflow-web-59c94db9c5-z7rg4        1/1     Running   0          2h
    airflow-worker-0                    2/2     Running   2          2h
  3. Obtain the Airflow web service URL by following the instructions that were printed to the console when you deployed Airflow using Helm in step 1.

    $ export NODE_PORT=$(kubectl get --namespace airflow -o jsonpath="{.spec.ports[0].nodePort}" services airflow-web)
    $ export NODE_IP=$(kubectl get nodes --namespace airflow -o jsonpath="{.items[0].status.addresses[0].address}")
    $ echo http://$NODE_IP:$NODE_PORT/
  4. Confirm that you can access the Airflow web service.

Error: Missing Graphic Image