<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Hesti Networking Lab]]></title><description><![CDATA[👋 I'm Hung Ha (Hesti) – a Network Engineer passionate about automating the boring stuff. Here at Hesti's Lab, I share real-world guides on Network Automation, ]]></description><link>https://blogs.hestinetlab.com</link><generator>RSS for Node</generator><lastBuildDate>Thu, 23 Apr 2026 22:17:02 GMT</lastBuildDate><atom:link href="https://blogs.hestinetlab.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Deploying Enterprise AWX on K3s]]></title><description><![CDATA[NetDevOps: Deploying Enterprise AWX on K3s
In the world of NetDevOps, moving from manual Ansible CLI execution to a centralized platform is a game-changer. Ansible AWX (the open-source version of Red ]]></description><link>https://blogs.hestinetlab.com/deploying-enterprise-awx-on-k3s</link><guid isPermaLink="true">https://blogs.hestinetlab.com/deploying-enterprise-awx-on-k3s</guid><category><![CDATA[netdevops]]></category><category><![CDATA[ansible]]></category><category><![CDATA[awx]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[NetworkAutomation]]></category><category><![CDATA[k3s]]></category><dc:creator><![CDATA[Hùng Việt]]></dc:creator><pubDate>Sun, 19 Apr 2026 13:28:57 GMT</pubDate><content:encoded><![CDATA[<hr />
<h1>NetDevOps: Deploying Enterprise AWX on K3s</h1>
<p>In the world of <strong>NetDevOps</strong>, moving from manual Ansible CLI execution to a centralized platform is a game-changer. <strong>Ansible AWX</strong> (the open-source version of Red Hat Ansible Automation Platform) provides the Web UI, RBAC, and API capabilities needed to scale network automation — like pushing mass firmware upgrades (e.g., Cisco IOS-XE 17.15.04b) across hundreds of devices.</p>
<p>However, deploying AWX on Kubernetes in certain regions (like Vietnam) often hits a major roadblock: the <code>ImagePullBackOff</code> error due to <code>gcr.io</code> (Google Container Registry) being throttled or blocked.</p>
<p>In this guide from <strong>Hesti Networking Lab</strong>, we will deploy AWX using the AWX Operator on <strong>K3s</strong> and implement a specific patch to bypass these network restrictions.</p>
<hr />
<h2>🏗️ 1. Infrastructure Preparation</h2>
<p>To ensure stability for Ansible Execution Environments, we recommend using a full Virtual Machine (VM) rather than a container-based environment.</p>
<ul>
<li><p><strong>OS:</strong> Ubuntu 22.04 / 24.04 LTS</p>
</li>
<li><p><strong>Specs:</strong> 4 vCPU, <strong>8GB RAM (Minimum)</strong>, 40GB+ Disk.</p>
</li>
<li><p><strong>Network:</strong> Internet access and a static IP.</p>
</li>
</ul>
<hr />
<h2>🚀 2. Step 1: Install K3s (The Foundation)</h2>
<p>K3s is a highly available, certified Kubernetes distribution designed for production workloads in resource-constrained environments. It’s perfect for our NetDevOps Lab.</p>
<p>Run the following command to install K3s:</p>
<pre><code class="language-shell">curl -sfL [https://get.k3s.io](https://get.k3s.io) | sh -
</code></pre>
<p>Configure permissions to run <code>kubectl</code> as your current user:</p>
<pre><code class="language-shell">mkdir -p ~/.kube
sudo k3s kubectl config view --raw &gt; ~/.kube/config
chmod 600 ~/.kube/config
</code></pre>
<p>Verify the node is ready:</p>
<pre><code class="language-shell">kubectl get nodes
</code></pre>
<hr />
<h2>⚙️ 3. Step 2: The AWX Operator &amp; GCR Fix</h2>
<p>The "Standard" installation usually fails at this stage because the <code>kube-rbac-proxy</code> image is hosted on <a href="http://gcr.io"><code>gcr.io</code></a>. We will fix this by patching the <code>kustomization.yaml</code> to pull from <a href="http://quay.io"><code>quay.io</code></a> instead.</p>
<p>Create a deployment directory:</p>
<pre><code class="language-shell">mkdir ~/awx-deploy &amp;&amp; cd ~/awx-deploy
</code></pre>
<p>Create <code>kustomization.yaml</code>:</p>
<pre><code class="language-shell">nano kustomization.yaml
</code></pre>
<p><strong>Paste the following content:</strong></p>
<pre><code class="language-yaml">apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  # Pull the AWX Operator (Version 2.19.1)
  - [github.com/ansible/awx-operator/config/default?ref=2.19.1](https://github.com/ansible/awx-operator/config/default?ref=2.19.1)

images:
  - name: quay.io/ansible/awx-operator
    newTag: 2.19.1
  # --- CRITICAL FIX FOR GCR.IO ISSUES ---
  - name: gcr.io/kubebuilder/kube-rbac-proxy
    newName: quay.io/brancz/kube-rbac-proxy
    newTag: v0.15.0

namespace: awx
</code></pre>
<p>Deploy the Operator:</p>
<pre><code class="language-shell">kubectl apply -k .
</code></pre>
<pre><code class="language-shell">kubectl get pods -n awx -w
</code></pre>
<p>Wait until you see <code>awx-operator-controller-manager</code> is <strong>Running 2/2</strong>.</p>
<hr />
<h2>🏗️ 4. Step 3: Deploy the AWX Instance</h2>
<p>Now, we define the actual AWX application. We will use <code>nodeport</code> to make the web interface accessible via the VM's IP.</p>
<p>Create <code>awx-demo.yml</code>:</p>
<pre><code class="language-shell">nano awx-demo.yml
</code></pre>
<p><strong>Paste the content:</strong></p>
<pre><code class="language-yaml">---
apiVersion: [awx.ansible.com/v1beta1](https://awx.ansible.com/v1beta1)
kind: AWX
metadata:
  name: awx-demo
spec:
  service_type: nodeport
</code></pre>
<p>Update your <code>kustomization.yaml</code> to include this new resource:</p>
<pre><code class="language-shell">nano kustomization.yaml
</code></pre>
<p><strong>Modify the resources section:</strong></p>
<pre><code class="language-yaml">resources:
  - [github.com/ansible/awx-operator/config/default?ref=2.19.1](https://github.com/ansible/awx-operator/config/default?ref=2.19.1)
  - awx-demo.yml  # Add this line
</code></pre>
<p>Apply the final configuration:</p>
<pre><code class="language-shell">kubectl apply -k .
</code></pre>
<hr />
<h2>⏳ 5. Step 4: Verification &amp; Login</h2>
<p>It will take 5-10 minutes for the Operator to pull the Postgres, Redis, Task, and Web images. Monitor the progress:</p>
<pre><code class="language-bash">watch kubectl get pods -n awx
</code></pre>
<h3>Harvest the Rewards:</h3>
<p>Once the <code>awx-demo-task</code> is <strong>4/4 Running</strong> and <code>awx-demo-web</code> is <strong>3/3 Running</strong>, you are ready.</p>
<p><strong>1. Get the Web Access Port:</strong></p>
<pre><code class="language-plaintext">kubectl get svc awx-demo-service -n awx
</code></pre>
<p><em>Identify the Port mapped to 80 (e.g.,</em> <code>80:31213/TCP</code><em>). Access it via</em> <code>http://&lt;VM_IP&gt;:31213</code><em>.</em></p>
<p><strong>2. Retrieve the Admin Password:</strong></p>
<pre><code class="language-plaintext">kubectl get secret awx-demo-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode ; echo
</code></pre>
<p><strong>Login:</strong></p>
<ul>
<li><p><strong>Username:</strong> <code>admin</code></p>
</li>
<li><p><strong>Password:</strong> (The string retrieved above)</p>
</li>
</ul>
<hr />
<h2>🎯 Summary</h2>
<p>By implementing a simple image patch, we bypassed regional network restrictions and successfully deployed a production-grade AWX instance on K3s. This setup is the perfect "Command Center" for any NetDevOps engineer looking to automate enterprise infrastructure.</p>
<p>Happy Automating!</p>
]]></content:encoded></item><item><title><![CDATA[Automating Cisco IOS Upgrades: A Step-by-Step Ansible Guide (INSTALL MODE)]]></title><description><![CDATA[Introduction
If you’ve been following my blog, you might have seen my earlier guide on Bundle Mode upgrades. That method is classic-simple and familiar. But let’s be honest, if you are managing modern Catalyst 9000s or ISR routers, Install Mode is th...]]></description><link>https://blogs.hestinetlab.com/automating-cisco-os-upgrade-ansible-install-mode</link><guid isPermaLink="true">https://blogs.hestinetlab.com/automating-cisco-os-upgrade-ansible-install-mode</guid><dc:creator><![CDATA[Hùng Việt]]></dc:creator><pubDate>Wed, 31 Dec 2025 07:04:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766971368080/768f8d50-557c-49f0-bad1-db8ce6dfa0b9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>If you’ve been following my blog, you might have seen my earlier guide on <a target="_blank" href="https://blogs.hestinetlab.com/automating-cisco-ios-xe-upgrade-ansible"><strong>Bundle Mode</strong></a> upgrades. That method is classic-simple and familiar. But let’s be honest, if you are managing modern Catalyst 9000s or ISR routers, Install Mode is the standard we should all be aiming for.</p>
<p>Unlike the legacy method where we simply pointed the boot variable to a <strong>.bin</strong> file, Install Module extracts the packages into the flash…</p>
<p>Here is how I do it use workflow in Ansible Tower.</p>
<h2 id="heading-diagram">Diagram</h2>
<p>I use the same diagram as <a target="_blank" href="https://blogs.hestinetlab.com/automating-cisco-ios-xe-upgrade-ansible"><strong>Bundle Mode</strong></a> as well</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767164027716/43f59a8c-9358-4555-8692-4a2b25721b8c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-the-logic-the-3-phase-workflow"><strong>The logic: The 3-phase Workflow</strong></h2>
<p>I still use 3-phase logic, which requires manual intervention (human-in-the-loop) to verify everything before the reload. However, Phase 2 is significantly different: it now handles the Install Mode process automatically, rather than just configuring the boot system for Bundle Mode</p>
<p><strong>1. Phase 1: Preparation &amp; Staging</strong></p>
<ul>
<li><p>Health Checks: Version, Storage, Register, Routing.</p>
</li>
<li><p>Staging: Transfer image &amp; Verify MD5</p>
</li>
</ul>
<p><strong>2. Phase 2: Activation (The Upgrade)</strong></p>
<ul>
<li><p>Backup configuration</p>
</li>
<li><p><mark>Install add &amp; activate &amp; commited</mark></p>
</li>
</ul>
<p><strong>3. Phase 3: Post-checks</strong></p>
<ul>
<li><p>Validate version</p>
</li>
<li><p>Compare with Pre-check</p>
</li>
</ul>
<h2 id="heading-the-orchestration">The Orchestration</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768655515553/5be72f1a-9f83-49cb-ac12-2fdf9b82b17f.png" alt class="image--center mx-auto" /></p>
<p>My Logic: The “Human-in-the-Loop” Strategy</p>
<ul>
<li><p><strong>Phase 1 (Pre-checks):</strong> Runs automatically. It gathers data, checks storage, and stages the image. While Ansible is copying the large image file, I simply grab a cup of coffee and relax ☕—this is the longest part of the process.</p>
</li>
<li><p><strong>The Safety Pause (Approval Node):</strong> This is critical. The workflow pauses here, waiting for me to review the Pre-check report. If I see any red flags (like insufficient storage or unstable routing), I deny the job. <strong>Nothing gets rebooted.</strong></p>
</li>
<li><p><strong>Phase 2 (The Upgrade):</strong> This triggers only after I manually click "Approve". This ensures the reboot happens exactly when I want it (e.g., during a maintenance window).</p>
</li>
<li><p><strong>Visual Debugging:</strong> If Phase 3 turns red, I know instantly that the Post-check failed, eliminating the need to dig through thousands of log lines manually.</p>
</li>
</ul>
<h3 id="heading-the-implementation"><strong>The Implementation</strong></h3>
<p>For this demonstration, I utilized a <strong>Cisco C8000v</strong> virtual router running <strong>IOS XE</strong>. The workflow aims to perform an upgrade from version <strong>17.06.03</strong> to <strong>17.15.04c</strong> using the Install Mode method.</p>
<blockquote>
<p>Additionally, to keep the lab simple, I configured the AAP controller itself to act as the backend SCP Server for hosting and transferring the image files.</p>
</blockquote>
<p>Here is the core logic for the <strong>Pre-check phase</strong>. My approach is to capture a full snapshot of the hardware inventory and current network state, including the routing table and interface status. I check each component individually and transfer the image file from the SCP Server, saving all these parameters as <code>extra_vars</code> to ensure a precise comparison during the post-check phase.</p>
<pre><code class="lang-yaml">  <span class="hljs-attr">tasks:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Get</span> <span class="hljs-string">all</span> <span class="hljs-string">elements</span> <span class="hljs-string">hardware</span> <span class="hljs-string">device</span>
      <span class="hljs-attr">cisco.ios.ios_facts:</span>
        <span class="hljs-attr">gather_subset:</span> <span class="hljs-string">hardware</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">pre_upgrade_facts</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Display</span> <span class="hljs-string">pre-upgrade</span> <span class="hljs-string">facts</span>
      <span class="hljs-attr">debug:</span>
        <span class="hljs-attr">var:</span> <span class="hljs-string">pre_upgrade_facts</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Check</span> <span class="hljs-string">current</span> <span class="hljs-string">IOS</span> <span class="hljs-string">version</span>
      <span class="hljs-attr">ansible.builtin.set_fact:</span>    
        <span class="hljs-attr">current_ios_version:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ pre_upgrade_facts.ansible_facts.ansible_net_version }}</span>"</span>
</code></pre>
<p>You can view full Source Code for Phase 1 here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/01_upgrade_pre_check.yaml">01_upgrade_pre_check.yaml</a></p>
<p>Moving to <strong>Phase 2</strong>, we execute the actual upgrade. <strong>This is the main difference compared to Bundle Mode.</strong> Instead of changing the boot system variable, we utilize the <code>install add</code> command to unpack the image and the <code>install activate</code> command to reload the device and apply the new version.</p>
<pre><code class="lang-yaml">    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Show</span> <span class="hljs-string">install</span> <span class="hljs-string">summary</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">show</span> <span class="hljs-string">install</span> <span class="hljs-string">summary</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">install_summary</span>


    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Add</span> <span class="hljs-string">new</span> <span class="hljs-string">IOS</span> <span class="hljs-string">image</span> <span class="hljs-bullet">-</span> <span class="hljs-string">install</span> <span class="hljs-string">mode</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">command:</span> <span class="hljs-string">install</span> <span class="hljs-string">add</span> <span class="hljs-string">file</span> <span class="hljs-string">flash:/{{</span> <span class="hljs-string">new_ios_image</span> <span class="hljs-string">}}</span>
      <span class="hljs-attr">when:</span> <span class="hljs-string">new_ios_version</span> <span class="hljs-string">not</span> <span class="hljs-string">in</span> <span class="hljs-string">install_summary.stdout[0]</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Show</span> <span class="hljs-string">install</span> <span class="hljs-string">summary</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">show</span> <span class="hljs-string">install</span> <span class="hljs-string">summary</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">install_summary</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Display</span> <span class="hljs-string">and</span> <span class="hljs-string">verify</span> <span class="hljs-string">if</span> <span class="hljs-string">installation</span> <span class="hljs-string">was</span> <span class="hljs-string">successful</span>
      <span class="hljs-attr">ansible.builtin.assert:</span>
        <span class="hljs-attr">that:</span> 
          <span class="hljs-bullet">-</span> <span class="hljs-string">"new_ios_version in install_summary.stdout[0] "</span>
        <span class="hljs-attr">fail_msg:</span> <span class="hljs-string">"IOS XE installation failed or version mismatch."</span>
        <span class="hljs-attr">success_msg:</span> <span class="hljs-string">"IOS XE installation successful and version verified."</span>
</code></pre>
<p>You can access the Source Code for Phase 2 here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/02_upgrade_install_mode.yaml">02_upgrade_install_mode.yaml</a></p>
<p>In <strong>Phase 3</strong>, we validate the upgrade. The script compares the current network state against the "snapshot" taken in Phase 1. This ensures the new version is active and confirms there are no unintended changes to the routing table or interface status.</p>
<pre><code class="lang-yaml">    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Get</span> <span class="hljs-string">interface</span> <span class="hljs-string">status</span> <span class="hljs-string">after</span> <span class="hljs-string">upgrade</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span> <span class="hljs-string">"show ip interface brief"</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">post_interface_status</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">get</span> <span class="hljs-string">post</span> <span class="hljs-string">check</span> <span class="hljs-string">interface</span> <span class="hljs-string">up</span> <span class="hljs-string">count</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span> <span class="hljs-string">"show ip interface brief | count up"</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">post_interface_up_raw</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Set</span> <span class="hljs-string">fact</span> <span class="hljs-string">for</span> <span class="hljs-string">interface</span> <span class="hljs-string">up</span> <span class="hljs-string">count</span>
      <span class="hljs-attr">ansible.builtin.set_fact:</span>
        <span class="hljs-attr">post_interface_up_count:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ (post_interface_up_raw.stdout[0] | regex_search('\\d+$') | int) }}</span>"</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Compare</span> <span class="hljs-string">Interface</span> <span class="hljs-string">Up</span> <span class="hljs-string">Count</span> <span class="hljs-string">before</span> <span class="hljs-string">and</span> <span class="hljs-string">after</span> <span class="hljs-string">upgrade</span>
      <span class="hljs-attr">ansible.builtin.assert:</span>
        <span class="hljs-attr">that:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">" post_interface_up_count == workflow_interface_up_count[inventory_hostname] "</span>
        <span class="hljs-attr">fail_msg:</span> <span class="hljs-string">"Number of interfaces in 'up' state has changed after upgrade."</span>
        <span class="hljs-attr">success_msg:</span> <span class="hljs-string">"Number of interfaces in 'up' state is consistent before and after upgrade."</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Get</span> <span class="hljs-string">IP</span> <span class="hljs-string">Routing</span> <span class="hljs-string">Table</span> <span class="hljs-string">Summary</span> <span class="hljs-string">after</span> <span class="hljs-string">upgrade</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span> <span class="hljs-string">"show ip route summary"</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">post_route_raw</span>
</code></pre>
<p>The full code for this validation phase is available here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/03_upgrade_post_check.yaml">03_upgrade_post_check.yaml</a></p>
<p>Once the individual templates are ready, we proceed to <strong>Ansible Automation Platform (AAP)</strong> to orchestrate them. I created a new Workflow Template that combines these three Job Templates into a single pipeline.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768657887669/4e8bbd01-5185-4a78-9a7c-9027c1a9e26b.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768657983194/32be5187-d8dc-4799-8c03-59a44f707bc6.png" alt class="image--center mx-auto" /></p>
<p>As shown below, I added an <strong>Approval Node</strong> right after the Pre-check (Node 1) completes. This allows a human engineer to review the pre-check status before authorizing the actual reboot.</p>
<p>Merged 3 job template as 3 node into 1 workflow, and add aproval node after completed Node 1</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658047336/ca93c235-582e-4692-8928-274e509ed114.png" alt class="image--center mx-auto" /></p>
<blockquote>
<p>I recommend using <strong>Surveys</strong> in AAP to prompt for these <code>extra_vars</code> when launching the job. This way, you can easily switch between devices or environments without modifying the playbook source code.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658463182/d44515c9-1d7b-4497-9b14-ccb604899bd8.png" alt class="image--center mx-auto" /></p>
<p>Finally, launch the job, grab a cup of coffee, and watch the automation do the heavy lifting!</p>
<h3 id="heading-the-outcome">The Outcome</h3>
<p>After sipping my coffee, I returned to the desk to find the workflow paused exactly where expected: the <strong>Approval Node</strong>.</p>
<p>I quickly reviewed the output from Node 1 (Pre-check) to ensure the device was ready and the image was successfully staged.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658539261/3fe1897d-ccd8-4627-9def-b71f9f26f190.png" alt class="image--center mx-auto" /></p>
<p>Satisfied with the results, I clicked <strong>"Approve"</strong> to authorize the reboot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658574041/466037fb-5220-4a41-8cbe-e5f1d3e55f7b.png" alt class="image--center mx-auto" /></p>
<p>Once approved, the automation continued to execute Phase 2 (Install &amp; Activate) and Phase 3 without any manual intervention. There is nothing quite like seeing that satisfying <strong>"All Green"</strong> status on the Ansible Automation Platform dashboard.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658605732/9b5965f8-165e-49fc-97ba-436c70f45250.png" alt class="image--center mx-auto" /></p>
<p>Finally, the post-check comparison confirmed the upgrade was successful with <strong>zero unintended changes</strong> to the routing table or interface statuses.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768658649284/fae6a5f8-0bb7-4871-be7d-64097943b7bb.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>By automating this workflow, we’ve transformed a high-risk, tedious night shift task into a predictable, <strong>click-of-a-button process</strong>. Whether you are using Bundle Mode or the more modern Install Mode, the logic remains the same: Prepare, Verify, and Execute safely.</p>
<p>If you have any questions about the playbooks or the nuances of Install Mode, feel free to drop a comment below!</p>
]]></content:encoded></item><item><title><![CDATA[Automating Cisco IOS Upgrades: A Step-by-Step Ansible Guide (BUNDLE MODE)]]></title><description><![CDATA[Introduction
Upgrading network devices manually is one of the most tedious tasks for any Network man, it involves repetitive steps: checking flash storage, setting up a SCP server, copying files, and praying that the console cable doesn’t disconnect....]]></description><link>https://blogs.hestinetlab.com/automating-cisco-os-upgrade-ansible-bundle-mode</link><guid isPermaLink="true">https://blogs.hestinetlab.com/automating-cisco-os-upgrade-ansible-bundle-mode</guid><dc:creator><![CDATA[Hùng Việt]]></dc:creator><pubDate>Mon, 22 Dec 2025 09:38:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766722595812/6911a28b-6bda-46b3-93d8-6559ff7657a4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Upgrading network devices manually is one of the most tedious tasks for any Network man, it involves repetitive steps: checking flash storage, setting up a SCP server, copying files, and praying that the console cable doesn’t disconnect.</p>
<p>When you have to upgrade 50 or 100 switches/routers, manual work is not an option. It is slow and prone to <strong>human error</strong>. In this post, I’ll share how to write Ansible playbook to automate this process efficiently, use Ansible Automation Platform for orchestration and manage the workflow.</p>
<p>For this blog, I assume your <strong>Ansible Automation Platform</strong> is already up and running.</p>
<h2 id="heading-diagram">Diagram</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766289223837/447e74d1-7016-4340-968c-9c79a6436119.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-the-logic-the-3-phase-workflow">The logic: The 3-phase Workflow</h2>
<p>Instead of writing a single playbook, I prefer to split the process into three distinct phases. This approach helps me isolate issues and ensures that if something fails, it fails early and safely.</p>
<p><strong>1. Phase 1: Preparation &amp; Staging</strong></p>
<ul>
<li><p>Health Checks: Version, Storage, Register, Routing.</p>
</li>
<li><p>Staging: Transfer image &amp; Verify MD5</p>
</li>
</ul>
<p><strong>2. Phase 2: Activation (The Upgrade)</strong></p>
<ul>
<li><p>Backup configuration</p>
</li>
<li><p>Boot &amp; Reload</p>
</li>
</ul>
<p><strong>3. Phase 3: Post-checks</strong></p>
<ul>
<li><p>Validate version</p>
</li>
<li><p>Compare with Pre-check</p>
<blockquote>
<p><em>💡</em> <strong><em>Why use AAP for this?</em></strong> <em>By leveraging</em> <em>Ansible Automation Platform (AAP), we can wrap these three distinct phases into a single</em> <em>Workflow Template</em>. This allows us to use the <em>Workflow Visualizer</em> <em>to drag-and-drop logic, add approval nodes easily, and—most importantly—if one phase fails (e.g., Staging), we can fix it and retry just that node without re-running the whole process.</em></p>
</blockquote>
</li>
</ul>
<h2 id="heading-the-orchestration">The Orchestration</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766323881986/afceced9-2717-4a08-875f-34ca9b65d4ec.png" alt class="image--center mx-auto" /></p>
<p>My logic: The “Human-in-the-Loop” Strategy</p>
<ul>
<li><p><strong>Phase 1 (Pre-checks):</strong> Runs automatically, it gathers data, checks storage, and stages image. While Ansible is copying the file, I simply grab a cup of coffee and relax ☕, this is the longest process.</p>
</li>
<li><p><strong>The Safety Pause (Approval Node):</strong> This is critical. The workflow pauses here. It waits for me to review the Pre-check report. If I see any red flags (like full storage or unstable routing), I deny the job. Nothing get rebooted.</p>
</li>
<li><p><strong>Phase 2 (The Upgrade):</strong> Only triggers after I manually click "Approve". This ensures the reboot happens exactly when I want it</p>
</li>
<li><p><strong>Visual Debugging:</strong> If Phase 3 turns red, I know instantly that the Post-check failed, without digging through thousands of log lines.</p>
</li>
</ul>
<h2 id="heading-the-implementation">The Implementation</h2>
<p>For this demonstration, I utilized a <strong>Cisco C8000v</strong> virtual router running <strong>IOS XE</strong>. The workflow aims to perform an upgrade from version <strong>17.06.03</strong> to <strong>17.15.04c</strong>.</p>
<blockquote>
<p>Additionally, to keep the lab simple, I configured the <strong>AAP controller itself</strong> to act as the backend <strong>SCP Server</strong> for hosting and transferring the image files.</p>
</blockquote>
<p>Here is the core logic for the Pre-check phase. My approach is to capture a full snapshot of the hardware inventory and current network state, including the routing table and interface status. I check each component individually and transfer the image file from the SCP Server, saving all these parameters as <code>extra_vars</code> to ensure a precise comparison during the post-check phase.</p>
<pre><code class="lang-yaml"><span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upgrade</span> <span class="hljs-string">Pre-Check</span> <span class="hljs-string">Playbook</span> <span class="hljs-string">for</span> <span class="hljs-string">Cisco</span> <span class="hljs-string">IOS</span> <span class="hljs-string">and</span> <span class="hljs-string">IOS-XE</span> <span class="hljs-string">Devices</span>
  <span class="hljs-attr">hosts:</span> <span class="hljs-string">all</span>
  <span class="hljs-attr">gather_facts:</span> <span class="hljs-literal">no</span>
  <span class="hljs-attr">vars:</span>
    <span class="hljs-attr">ansible_command_timeout:</span> <span class="hljs-number">10800</span>
    <span class="hljs-attr">scp_server:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_server }}</span>"</span>
    <span class="hljs-attr">scp_username:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_username }}</span>"</span>
    <span class="hljs-attr">scp_password:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_password }}</span>"</span>
    <span class="hljs-attr">new_ios_version:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_ios_version }}</span>"</span>
    <span class="hljs-attr">new_ios_image:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_name }}</span>"</span>
    <span class="hljs-attr">new_image_md5:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_md5 }}</span>"</span>

  <span class="hljs-attr">tasks:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Get</span> <span class="hljs-string">all</span> <span class="hljs-string">elements</span> <span class="hljs-string">hardware</span> <span class="hljs-string">device</span>
      <span class="hljs-attr">cisco.ios.ios_facts:</span>
        <span class="hljs-attr">gather_subset:</span> <span class="hljs-string">hardware</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">pre_upgrade_facts</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Display</span> <span class="hljs-string">pre-upgrade</span> <span class="hljs-string">facts</span>
      <span class="hljs-attr">debug:</span>
        <span class="hljs-attr">var:</span> <span class="hljs-string">pre_upgrade_facts</span>
</code></pre>
<p>You can view the full Source Code for Phase 1 here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/01_upgrade_pre_check.yaml">01_upgrade_pre_check.yaml</a></p>
<p>Moving to <strong>Phase 2</strong>, we execute the actual upgrade. This involves backing up the running configuration, setting the new boot system, saving the config, and finally reloading the router. The task then pauses to wait for the device to come back online.</p>
<pre><code class="lang-yaml"><span class="hljs-meta">---</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Upgrade</span> <span class="hljs-string">Bundle</span> <span class="hljs-string">Mode</span> <span class="hljs-string">IOS</span> <span class="hljs-string">Device</span>
  <span class="hljs-attr">hosts:</span> <span class="hljs-string">all</span>
  <span class="hljs-attr">gather_facts:</span> <span class="hljs-literal">yes</span>
  <span class="hljs-attr">vars:</span>
    <span class="hljs-attr">ansible_command_timeout:</span> <span class="hljs-number">30</span>
    <span class="hljs-attr">scp_server:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_server }}</span>"</span>
    <span class="hljs-attr">scp_username:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_username }}</span>"</span>
    <span class="hljs-attr">scp_password:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_password }}</span>"</span>
    <span class="hljs-attr">new_ios_version:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_ios_version }}</span>"</span>
    <span class="hljs-attr">new_ios_image:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_name }}</span>"</span>
    <span class="hljs-attr">new_image_md5:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_md5 }}</span>"</span>

  <span class="hljs-attr">tasks:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Backup</span> <span class="hljs-string">configuration</span> <span class="hljs-string">before</span> <span class="hljs-string">upgrade</span>
      <span class="hljs-attr">cisco.ios.ios_command:</span>
        <span class="hljs-attr">commands:</span> 
          <span class="hljs-bullet">-</span> <span class="hljs-attr">command:</span> <span class="hljs-string">copy</span> <span class="hljs-string">running-config</span> <span class="hljs-string">startup-config</span>
            <span class="hljs-attr">prompt:</span> <span class="hljs-string">'Destination filename \[startup-config\]?'</span>
            <span class="hljs-attr">answer:</span> <span class="hljs-string">"\n"</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">backup_result</span>
</code></pre>
<p>You can access the Source Code for Phase 2 here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/02_upgrade_bundle_mode.yaml">02_upgrade_bundle_mode.yaml</a></p>
<p>In <strong>Phase 3</strong>, we validate the upgrade. The script compares the current network state against the "snapshot" taken in Phase 1. This ensures the new version is active and confirms there are no unintended changes to the routing table, interface status,…</p>
<pre><code class="lang-yaml"><span class="hljs-meta">---</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Post-check</span> <span class="hljs-string">after</span> <span class="hljs-string">IOS</span> <span class="hljs-string">Upgrade</span> <span class="hljs-string">Playbook</span>
  <span class="hljs-attr">hosts:</span> <span class="hljs-string">all</span>
  <span class="hljs-attr">gather_facts:</span> <span class="hljs-literal">false</span>
  <span class="hljs-attr">vars:</span>
    <span class="hljs-attr">ansible_command_timeout:</span> <span class="hljs-number">30</span>
    <span class="hljs-attr">scp_server:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_server }}</span>"</span>
    <span class="hljs-attr">scp_username:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_username }}</span>"</span>
    <span class="hljs-attr">scp_password:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ scp_password }}</span>"</span>
    <span class="hljs-attr">new_ios_version:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_ios_version }}</span>"</span>
    <span class="hljs-attr">new_ios_image:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_name }}</span>"</span>
    <span class="hljs-attr">new_image_md5:</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ new_image_md5 }}</span>"</span>
  <span class="hljs-attr">tasks:</span>
    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Gather</span> <span class="hljs-string">IOS</span> <span class="hljs-string">version</span> <span class="hljs-string">after</span> <span class="hljs-string">upgrade</span>
      <span class="hljs-attr">cisco.ios.ios_facts:</span>
        <span class="hljs-attr">gather_subset:</span> <span class="hljs-string">hardware</span>
      <span class="hljs-attr">register:</span> <span class="hljs-string">post_upgrade_facts</span>

    <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Validate</span> <span class="hljs-string">IOS</span> <span class="hljs-string">version</span>
      <span class="hljs-attr">ansible.builtin.assert:</span>
        <span class="hljs-attr">that:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">"<span class="hljs-template-variable">{{ post_upgrade_facts.ansible_facts.ansible_net_version == new_ios_version }}</span>"</span>
        <span class="hljs-attr">fail_msg:</span> <span class="hljs-string">"IOS version after upgrade does not match expected version."</span>
        <span class="hljs-attr">success_msg:</span> <span class="hljs-string">"IOS version after upgrade matches expected version."</span>
</code></pre>
<p>The full code for this validation phase is available here: <a target="_blank" href="https://github.com/haviethung3004/ansible-cisco-ios-upgrade/blob/main/03_upgrade_post_check.yaml">03_upgrade_post_check.yaml</a></p>
<p>Once the individual templates are ready, we proceed to <strong>Ansible Automation Platform (AAP)</strong> to orchestrate them. I created a new Workflow Template that combines these three Job Templates into a single pipeline.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766394013146/239b7ebe-cf98-4454-b6d2-44ff1c13e0d6.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766394150838/8240d124-d1f5-4623-bd6d-23117252faaf.png" alt class="image--center mx-auto" /></p>
<p>As shown below, I added an <strong>Approval Node</strong> right after the Pre-check (Node 1) completes. This allows a human engineer to review the pre-check status before authorizing the actual reboot.</p>
<p>Merged 3 job template as 3 node into 1 workflow, and add aproval node after completed Node 1</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766394345772/0b193cc9-6e36-4d2a-b716-5275d46f1e0e.png" alt class="image--center mx-auto" /></p>
<p>Note: Remember to replace extra_vars with your actual lab environment details.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766394502767/ac9f889d-f8ad-4e57-a649-d51d5fe76211.png" alt class="image--center mx-auto" /></p>
<p>Finally, launch the job, grab a cup of coffee, and watch the automation do the heavy lifting!</p>
<h2 id="heading-the-outcome">The Outcome</h2>
<p>After sipping my coffee, I returned to the desk to find the workflow paused exactly where expected: the <strong>Approval Node</strong>.</p>
<p>I quickly reviewed the output from Node 1 (Pre-check) to ensure the device was ready.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766395327798/ff664c18-d753-4b79-afab-8979d3cff64a.png" alt class="image--center mx-auto" /></p>
<p>Satisfied with the results, I clicked <strong>"Approve"</strong> to authorize the reboot.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766395391140/6d50ec23-8910-4795-a009-a58fff8ff520.png" alt class="image--center mx-auto" /></p>
<p>Once approved, the automation continued to execute Phase 2 and Phase 3 without any manual intervention. There is nothing quite like seeing that satisfying <strong>"All Green"</strong> status on the Ansible Automation Platform dashboard:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766395455596/1dd15308-dac5-42ce-b55b-8f0149d40ced.png" alt class="image--center mx-auto" /></p>
<p>Finally, the post-check comparison confirmed the upgrade was successful with <strong>zero unintended changes</strong> to the <strong>routing table</strong> or <strong>interface statuses</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766395522699/05bdaefe-774d-4ffd-a61a-99332883756c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>By automating this workflow, we’ve transformed a high-risk, tedious night shift task into a predictable, click-of-a-button process. Not only does this save time, but it also ensures consistency across hundreds of devices—something manual typing can never guarantee.</p>
<blockquote>
<p>If you have any questions about the playbooks or the logic, feel free to drop a comment below!</p>
</blockquote>
]]></content:encoded></item></channel></rss>