The Network Services Team has developed a way to dramatically reduce outage time within Emory’s network using a tool that looks at one thing: consistency.
In networking parlance, the “edge” refers to machines that are accessed by endusers. Emory’s network infrastructure is a complicated architecture of computers that form an edge ring throughout the enterprise. At over 2000 devices, with over 160,000 access ports, the task of continuously maintaining and refreshing (replacing old devices with new ones) this network is arduous.
Built over many years, networks become a hodgepodge of different devices that have been configured by a variety of network administrators, many of whom no longer work at an organization. As new admins were trained to work on Emory’s network, the consistency of each network upgrade eroded, much like the telephone game when we were kids. As a result, machine #1 might not be configured exactly like machine #2000.
These slight differences in configuration are a major cause of network outages.
With a focus on reducing outage time, and thus improving customer satisfaction, the Network Services Team asked the question, can a piece of technology help monitor our network consistency? The result is a piece of software called NetMRI.
NetMRI allows the team to remotely compare all of the devices on the network and see if they comply with a current configuration gold standard. Every deviation from the standard is listed for each machine, showing network engineers exactly which machines are out of compliance, as well as exactly what needs to be adjusted.
“Bringing our security policies up to standard across the enterprise is key,” said Nayef Smith, manager of the network monitoring team. “Before, we had to deal with many different administrative passwords. Now we can make sure the environment is compliant by ensuring our ability to touch these machines in a uniform way. It’s a convergence of change, process, and tools.”
Previously, Emory had tried to set up a standardized deployment system. But over time, different engineers made minute changes in implementation. People were trained to perform configurations the right way, but the right way changed from person to person.
Therefore, in addition to the new tool, the Network Services Team began an initiative to train the staff about the unique network topology to help eliminate configuration errors during equipment refresh. Named Emory Cert, the baseline certification training was created by Jiann Su-Ming and all members of network services must get certified before doing configurations.
The early returns have been promising. NetMRI already shows the quality improvement before and since the engineers received the Emory Cert. A NetMRI snapshot of the 1639 Building (Emory Woodruff Memorial Research) that was refreshed before Emory Cert training showed an 11% error rate on configurations.
The team refreshed 40 Peachtree after the training and NetMRI revealed a mere 1.7% error rate. “All of the machines were actually working at both facilities, but now they will have far fewer errors in the future,” said Steve Lee, who is the service owner for NetMRI.
“By reducing the number of defects, we have improved the quality of the service,” said Smith. “The end goal is to replace our older network infrastructure with new devices that have cleaner network configurations. That will lead to our version of network utopia: zero-error refresh.”
Leave a Reply