Troubleshooting complex network issues
By Amit Rao, Director – APAC Channels
Tuesday, 10 January, 2017
Troubleshooting complex and sprawling networks can be difficult, with network professionals spending up to 25% of their time trying to get to the bottom of an issue, according to network performance management experts NETSCOUT.
Finding the root cause of network issues is time-consuming. If they’re intermittent issues, it can seem almost impossible to find the cause and resolve the problem. However, by taking a methodical approach, it’s possible to effectively troubleshoot enterprise network problems. We’ve identified the six most common network issues and how to troubleshoot them.
1. Infrastructure performance
End-user complaints often signify that there is an infrastructure issue. However, when application servers and infrastructure devices are operating normally, obvious error states can’t be located, and legacy network monitoring tools report ‘green’. Finding the root cause can be challenging, but possible causes include bad cabling, network congestion, server network adapter issues or DNS issues.
There are four steps to troubleshooting these issues:
- Use existing monitoring tools and extract information from SYSLOG receivers.
- Check server and network device log files to understand if there are connectivity issues from the NIC side.
- Examine WAN links and logs to understand whether traffic-shaping devices or policies are affecting performance.
- Check errors including web server, load balancer and application log errors.
2. Network services
There are numerous issues that can affect network services, such as DHCP issues or a slow DNS response. Possible causes include misconfigured DHCP or DNS servers, duplicate IP addresses caused by overlapping DHCP scopes, rogue DHCP servers or users manually assigning static IPs. This can enable a ‘man-in-the-middle’ attack and create significant security issues. To troubleshoot, first confirm proper configuration of authorised DHCP servers.
3. Prove it’s not the network
Most of the time, the network is not to blame for performance issues. People blame the network due to lack of visibility into network operations, not enough bandwidth, network complexity, insufficient network expertise and lack of effective, easy-to-use troubleshooting tools.
To troubleshoot, the IT team should use packet captures, gather network data, review dropped packets and check for excessive retries and congestion in capture files. They should also check network device logs and ping to check response times, as well as using tracert to verify that the network path is correct.
4. Wi-Fi and BYOD threats
Wi-Fi networks, combined with bring your own device (BYOD) policies, can create security and performance issues if not managed carefully. These can include chatter, dropped connections, excessive bandwidth issues, poor device behaviour from users (such as streaming music) and congestion. The sheer number of devices can swamp the network.
To troubleshoot, conduct regular Wi-Fi SSID surveys to detect rogue access points and routers. Look up MAC addresses to discover the types of devices attached to networks and implement MAC address filtering if necessary. Also, understand that some devices are well known for causing problems if improperly configured, for example, Apple TV Airplay can badly impact performance.
5. Poor Wi-Fi performance
When the Wi-Fi network is underperforming, network teams should check for frequency interference, rogue routers (such as phones being used as hotspots), misconfigured Wi-Fi routers and compatibility issues between certain Wi-Fi clients and routers. Even excessive heat can cause strange symptoms.
To troubleshoot, teams should regularly use an SSID scanner to identify rogue routers and APs in infrastructure; remember that strange DHCP behaviour is an indicator of rogue DHCP servers. Relocate routers that may be suffering interference due to proximity to EMI sources and ensure that all Wi-Fi devices are within their designed operating environment.
6. Intermittent performance issues
Transient issues can take time, and, sometimes, luck, to capture, diagnose and resolve. Causes can include cabling issues, external sources, power fluctuations, hardware failures and excessive heat.
To troubleshoot, rule out logical sources, then look for illogical sources of interference. Track occurrences of the specific performance issue and look for patterns. As always, start at the physical layer, using a cable tester to see if the issue is related to cabling.
Understanding how to effectively troubleshoot the most common issues can potentially reduce the amount of time network professionals spend on issue resolution, so it’s well worth the effort.
The government's plan for the nbn is a mess and recent events provide an insight into why the...
Poor test data sharing processes are resulting in lost test results, delayed job completions and...
With soaring wholesale prices pushing up electricity bills, it's more important than ever for...