Taking the Network Automation Journey
There’s no doubt about it, Network Automation is probably one of the, if not the, hottest topics in the Networking industry right now. But, it’s not without just cause. Automating can save network operators time and free them up to focus on more important tasks. But, why is it so many people I’ve talked to seem to avoid it?
Whenever I’ve tried to discuss network automation with teams I work with I get answers like “We want to, but we don’t have time.” “It’s on our road map, maybe next year.” I’ve ever heard “Automation doesn’t apply to us.” And, “I’m too old to learn something new now.”
Sure, learning Network Automation, like anything, will take a certain amount of personal commitment and discipline. And, absolutely, it’ll be challenging – just like everything else you’ve done in your IT career!
During Networking Field Day 21 we (the Delegates) got a great presentation from Network to Code on the Network Automation Journey. I also recently got a chance to discuss the Network Automation Journey with none other than Jason Edelman from Network to Code.
Phase 1 – Hygiene
A lot of people probably get excited when it comes to exploring a new topic and maybe want to jump right in! However, you might be setting yourself up for failure if you don’t first take a step back and take a look at your environment to make sure that both you and it are ready for the Automation Journey.
In Phase 1 there’s a big emphasis on Network Hygiene. Knowing the environment, knowing where information is, really understanding your workflows and the things you do most frequently. You might not even think about certain things because you just do them all the time – but those are the perfect kinds of tasks to start automating with. Look for patterns, common tasks, things you do daily. Just take note of what these common things are. Are you bouncing ports, creating VLANs and dropping ports into them, maybe it’s common set of steps you take when you begin troubleshooting an issue.
Also, begin standardizing if it’s not already in place. Standardize on network device naming conventions, standard ports (example – gi0/1 is always WAN and gi0/0 is always). Standardize network configurations – there will certainly be valid reasons to stray from a standard, or maybe it’s another standard because it’s a Data Center switch version a switch in the access layer.
Phase 2 – Data Management
Start Simple! Make sure you have a source for inventory – know exactly what you have. Know what’s deployed, know what’s on standby, maybe you have a stock room of spare gear, know exactly what’s in the pile to be decommissioned. Have a source for config backups – maybe it’s Ansible, but it doesn’t have to be. Maybe you’re using SolarWinds or maybe you just have a Linux server doing TFTP. Have a cabling plan – what connects to what and where. Some form of IP Address Management too. Know what IP ranges are in use and where, know what static IPs have been consumed and what’s available.
All of these things should exist somewhere – it doesn’t matter where – could even be a flat file like a .csv – as long as it exists.
Phase 3 – Quick Wins
In phase 3 look for quick wins. Quick wins help the team realize the benefits of automation and creates a hunger to further develop these skills. You might look to automate common workflows, like a set of commonly use show commands when diagnosing an issue. Maybe you’re looking at a routing issue and want to see what neighborships are formed, grabbing the routing table, and then grab the routing protocol specific rib. If you’re troubleshooting a VLAN issue you may want to see what VLANs are configured and grab port configurations. You may also create a specific routine for backing up device configurations. Whatever the case may be these are quick, easily consumable, highly valued pieces. Quick wins show value back to the business and usually help in getting continued buy-in from leadership.
Phase 4 – Sources of Truth
A lot of network operators view their network as the source of truth. What VLANs are in use at site XYZ – login to the switches at the site and do a show vlan brief, right? Not anymore! It’s time to start trusting the data sources and your tools. Maybe you add a temporary VLAN, or maybe this one site has some extra config you wouldn’t normally do at any other site. Maybe you configured something as a test of a hypothesis during troubleshooting and forgot to remove it. You know what I’m talking about – we’ve all done it.
Need to issue a new management IP address to that new device – don’t go logging into the last device you configured, check your source of truth.
Phase 4 is all about migrating the source of truth from ‘show run’ to your tools and data. This is a major component of holistic data management.
Phase 5 – Read Only Automation
Now let’s start putting it together and building confidence in our tools, data, and newly learned skills. To do that we’ll start with Read Only automation workflows. The last thing we want to do is push a bad config and that can easily happen if we’re not yet 100% confident in our new skills.
Maybe you want to return a list of IP Addresses in use on a set of devices. Or, you’re troubleshooting a routing issue and want to see what neighbor relationships look like across a section of your network.
Link these to tools that others can utilize too – like Slack. How cool would it be to do a show ip ospf neighbors from Slack and have the results returned in line in your conversation!!! Impossible? Hardly!! This is called ChatOps and is very easy to do utilizing the APIs of your favorite collab clients – like MS Teams, Cisco WebEx Teams, and of course Slack. This should be anything you need or want to ensure there is confidence to automate configuration deployments.
Phase 6 – Read Only to Execute
In Phase 5 we focused on read only automation. While doing that we built up confidence in the tools and the skill sets needed to do this. It’s time to take it to the next level and automate execution.
Phase 7 – Focus on Consumability
Time to integrate into more tools – like your service desk. I’ve seen lots of demos where people have successfully configured the ability to send commands via the ITSM tool, like ServiceNow.
By now you have a lot of great tools and source of truth. It’s time to put them all together. Maybe it’s a Network Vitals Dashboard that can show the health of the network at a glance and allow you to drill down into issue. This may be part of tool you have or something that gets created.
The point in Phase 7 is to make these tools easily consumable, easy to use, and easy to trust.
Phase N – The Journey Continues
This is a journey and there is no finite stopping point. Start to look for ways to close the loop. If a particular event happens on a device trigger a notification – but don’t just tell someone, tell the right team that can resolve the issue the fastest. Look towards ZTP (Zero Touch Provisioning) of new devices. The priorities are going to be different based on the needs of the IT Team.
I asked Jason his thoughts on what the best thing might be to start with when beginning the Network Automation Journey. He said more than anything is to just start! Too often people start to look into Network Automation and get paralyzed because there’s just so much out there on the topic.
Jason said to keep in mind the tool you learn on just might not be the tool that goes into production and to de-couple the personal learning journey from a transformation network automation journey that you may be involved in at work. When it comes to personally learning something, you may be better off learning something that’s open source rather than a Commercial Off The Shelf tool, at least in the beginning. If you’re learning a commercial tool you may just become proficient with that tool. Two examples Jason highlighted are Ansible and Salt are solid starting points – again even if you’re not going to use them in production. With either of these tools you’ll need to know YAML, JSON, Jinja Templates, understand variables and more. Knowing all of these will undoubtedly help you on your Network Automation Journey even if you need to use a commercial tool in production for any reason such as size, scale, speed, or just the nature of not using open source in production.
If you didn’t see it live make sure you go check out the NFD21 presentation from Network to Code.
I’d also recommend the book Network Programmability and Automation, by Jason Edelman, Scott Lowe and Matt Oswalt.
Also, make sure you join the Network to Code Slack. They have a self-registration page you can find here. In there you’ll find channels on all sorts of topics, including Ansible. If you have questions you can pop in there and chat with others on their Network Automation Journey.
Also, be sure to checkout these other great articles from fellow NFD21 delegates!
Remington Loose – http://localpref.net/2019/10/02/nfd21-delegate/
Jeremy Shulman – https://medium.com/network-automaniac/nfd21-networktocode-d9ad989be3c7
Micheline Murphy – https://www.linkedin.com/pulse/bringing-butter-knife-screwdriver-party-three-network-murphy/
Ed Horley – https://www.howfunky.com/2019/11/network-field-day-21-network-to-code.html
thanks for this breakdown, I feel it’s quite appropriate.
I also like the Configuration Complexity Clock concept: