Adding Terraform NSX configurations to a multibranch CI/CD Pipeline

Managing our network with an infrastructure as code (IaC) approach means using the same methodologies and processes we would use for the application code. Even better would be to treat the network configurations required to support the application as an integral part of the app itself. But, what does it exactly mean in practice? An option is to “code” our required configuration in a script and making it part of the actual application version control repository. In this way, when we run our integration and/or deployment pipeline, the infrastructure is prepared and tested in conjunction with the application itself, following the same process and guaranteeing that everything is tested end to end.

To wrap my head around these concepts, I put together a mock scenario where a simple application (a static wen server) is first deployed in a test environment. Network and security configurations are dynamically created based on the content of the development branch of the application Github repository. After the application is validated in the test environment, the development branch is merged into the production branch triggering the deployment in the production environment.

The app is packaged as a vSphere virtual machine template that already includes the NGINX web server bits and configurations. In this example, I assume the application has been built and packaged already. This makes life easier for the purpose of this example so that I can focus on the Network and Security piece. Still, it is also a valid approach in the real world where you may want the same exact bits that you tested in a QA environment going into production without the risk of building and repackaging again.

The NSX configuration is coded in Terraform files.  I picked Terraform for its declarative nature and the ability (i.e., over the native NSX Policy API) to orchestrate multiple IaaS providers, in this case, NSX and vSphere. The topology deployed is very simple. Each instance of the application requires a dedicated T1 Gateway with NAT enabled (the IP address of the Web Virtual machine is hardcoded in the template). Depending on if the application is deployed in QA or Production, the Web Server IP will be translated to a different public IP. In this example, I am always deploying to the same SDDC, but it would not be difficult to extend the solution to have separate pairs of vCenter and NSX for each environment.

Solution Workflow Overview

Screen Shot 2019-12-03 at 3.59.52 PM

1. The Cloud Admin is testing a configuration change for the networking supporting the application and pushes this change in the development branch of the repository

2. Jenkins detects a change in the development branch and proceeds to deploy the application together with the required infrastructure in the QA environment

Screen Shot 2019-12-03 at 4.10.37 PM

3. At this point, end-to-end tests for the application can be performed. Ideally, those tests should be automated and coded as part of the pipeline. Alternatively, they can be performed manually, and the operator can inform Jenkins of their success or not. In this example, I completed the tests manually, and they consist of simply browsing the application web page. Once the application has been verified, Jenkins will destroy the test deployment.

Screen Shot 2019-12-03 at 4.13.32 PM

4. After tests have been successful, the application can be promoted to production with confidence. We want to use the same infrastructure code we tested in our QA environment when deploying for production, so we merge the development branch into the production branch

5. Jenkins detects the change in the production branch of the repository and deploys the app in production using the exact same configuration we used in the QA environment except for the NAT IP, that is passed as a parameter to Terraform

Demo

References

Github repository used in the demo

Taking a closer look

If you read so far and also watched the video demo, I think you will probably have enough of this. That said, some people may be interested in how exactly I put together the demo. Although everything is in the Github repository, I thought to provide some additional information about the components here.

The Jenkins Pipeline

The configuration of the Jenkins pipeline on the Jenkins server is actually straightforward. It’s a vanilla multi-branch pipeline configured to check my Github repository at  https://github.com/lcamarda/terraform-jenkins.git

Screen Shot 2019-12-04 at 11.35.15 AM

The only additional configuration to be performed is setting up the periodic scan of the repository. Jenkins performs the scan every minute. It worked well for the demo, but it is probably too frequent for any practical scenario. You will probably want to look at a webhook for a more elegant solution. The build configuration (the actual actions performed by the Pipeline) is retrieved from a file named Jenkinsfile located in the Github repository. In this way, the Pipeline itself, which describes our deployment process, is also covered by version control.

Screen Shot 2019-12-04 at 11.34.19 AM

You can look at the entire pipeline script here

Below you can see how the actions (called steps) for each stage are coded trough the Jenkins declarative pipeline syntax. The steps in each stage are performed only if the when condition is matched. In the first stage, we only fetch the code we need depending on the branch we are building.

 pipeline {
    agent any
    stages {
        stage('fetch_latest_code_dev') {
            when { branch 'development'
            }
            steps {
                git branch: "development" , url: "https://github.com/lcamarda/terraform-jenkins.git"
            }
        }
        stage('fetch_latest_code_prod') {
            when { branch 'production'
            }
            steps {
                git branch: "production" , url: "https://github.com/lcamarda/terraform-jenkins.git"
            }
        }
....................

After we retrieved our code (our Terraform files), we are going to delete any file that may have been left in the workspace by previous runs of Terraform. We then plan and then apply our deployment. For the production branch only, I added a manual check before the actual implementation. The values of the two variables, representing the NAT IP and the suffix for all NSX-T objects’ names created, are passed as parameters to the terraform plan command depending on the branch we are building.

stage('init_plan_apply_dev') {
            when { branch 'development'
            }
            steps {
                sh '''
                cd terraform
                rm -f myplan terraform.tfstate terraform.tfstate.backup
                terraform init
                terraform plan -var="nat_ip=172.16.102.10" -var="tenant_name=dev" -out ./myplan
                terraform apply -auto-approve ./myplan
                '''                   
            }
        }
        stage('init_and_plan_prod') {
            when { branch 'production'
            }
            steps {
                sh '''
                cd terraform
                rm -f myplan terraform.tfstate terraform.tfstate.backup
                terraform init
                terraform plan -var="nat_ip=172.16.102.20" -var="tenant_name=prod" -out ./myplan
                '''
            }
        }
        stage('terraform_apply_prod') {
            when { branch 'production'
            }
            steps {
                input message: "Should we apply the Terraform configuration in Production?"
                sh '''
                cd terraform
                terraform apply -auto-approve ./myplan
                '''
            }
        }

The last portion of the pipeline deals with the deletion of the application and its supporting infrastructure. One interesting bit is here is the fact that a simple terraform destroy did not give me consistent results, sometimes resulting in leftover objects ( Running terraform destroy again would clean them ). The reason was that even if terraform was deleting the virtual machine before the segment where the VM was attached, the logical port on the logical switch was sometimes not removed quickly enough by NSX Manager (waiting for the notification from the ESXi host that the VM was detached). To work around this issue, the pipeline deletes the Web VM first, sleeps for a minute, and then removes the rest of the deployment.

        stage('terraform_destroy_dev') {
            when { branch 'development'
            }
            steps {
                input message: "Should we destroy the test environment?"
                sh '''
                cd terraform
                terraform destroy  -var="nat_ip=172.16.102.10" -var="tenant_name=dev" -target=vsphere_virtual_machine.webvm -auto-approve
                sleep 60
                terraform destroy -var="nat_ip=172.16.102.10" -var="tenant_name=dev" -auto-approve
                '''
            }
        }
        stage('terraform_destroy_prod') {
            when { branch 'production'
            }
            steps {
                input message: "Should we destroy the prod environment?"
                sh '''
                cd terraform
                terraform destroy  -var="nat_ip=172.16.102.20" -var="tenant_name=prod" -target=vsphere_virtual_machine.webvm -auto-approve
                sleep 60
                terraform destroy -var="nat_ip=172.16.102.20" -var="tenant_name=prod" -auto-approve
                '''
            }
        }
    
   }
}

The Terraform Files

Dependencies between NSX-T and vSphere providers

The Terraform configuration is split into multiple files. This is not necessary from a Terraform perspective, I could have just put all the configurations in a single .tf file, but it makes my life easier to manage the settings of the different components separately. Terraform works at the folder level. It will consider all the .tf files in the working folder evaluating the dependencies between all the defined objects for a single IaaS provider. When we are working with multiple providers, in this case, NSX-T and vSphere, eventual dependencies must be made explicit by the user. In our case, the dependency I had to specify is around the logical switch where the virtual machine is connected.

The logical switch must be created before the virtual machine can be connected to it. The problem here is that I cannot configure the VM resource in Terraform to connect to an NSX-T logical switch resource, as the two objects are managed by different providers. What I can do is set the virtual machine to connect to an existing opaque network in vCenter. The opaque network appears in vCenter after the logical switch has bed created. What I needed to do here is telling Terraform to discover the “existing” opaque network after the NSX-T logical switch has been created. See below:

data "vsphere_network" "terraform_web" {
    name = "${nsxt_logical_switch.web.display_name}"
    datacenter_id = "${data.vsphere_datacenter.dc.id}"
    depends_on = ["nsxt_logical_switch.web"]
}

resource "nsxt_logical_switch" "web" {
  admin_state       = "UP"
  description       = "LS created by Terraform"
  display_name      = "OV-Web-Terraform-${var.tenant_name}"
  transport_zone_id = "${data.nsxt_transport_zone.overlay_tz.id}"
  replication_mode  = "MTEP"

Then connect the VM to the vsphere_network type object and not to the nsxt_logical_switch type object. Note the additional dependency that prevents Terraform to deploy the VM before the NSX-T logical switch has been created.

resource "vsphere_virtual_machine" "webvm" {
    name             = "${var.tenant_name}-webvm"
    depends_on = ["nsxt_logical_switch.web"]
    resource_pool_id = "${data.vsphere_resource_pool.pool.id}"
    datastore_id     = "${data.vsphere_datastore.datastore.id}"
    num_cpus = 1
    memory   = 2048
    guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
    scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"
    # Attach the VM to the network data source that refers to the newly created logical switch
    network_interface {
      network_id = "${data.vsphere_network.terraform_web.id}"
    }

Variables and credentials

I defined all my variables in the variables.tf file. The value of those variables has been hardcoded in the terraform.tfvars file or passed to the terraform plan command depending on the environment (test or prod) we were deploying on. Some of the variables represent credentials and are used when defining the NSX-T and vSphere providers. This makes life easier for the purpose of this demo, but it goes without saying that including credentials in files that are checked in version control is a big no-no. A better approach would be to store those credentials on the Jenkins server as a credentials object.  This is what I did here for a Principal Identity certificate.

Want to learn more about how to use Terraform with NSX-T and/or vSphere?

I am not often a fan of the official documentation as a way to take a first glance at a new topic. Still, the Terraform documentation for both the NSX-T and vSphere Providers is quite excellent and includes some great examples.

NSX-T Terraform Provider

vSphere Terraform Provider

Also, this VMware blog about NSX-T and Terraform is great

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s