Guide to Application Performance
"Good troubleshooting requires having a broad base of experience, a solid knowledge of the technology involved and the availability of the proper tools."
It is always good to get useful stuff free. If you are new to troubleshooting performance issues or you want to fill in some gaps, the "Application Troubleshooting Guide" from Fluke Networks is the e-book for you. It is a 94 page document that takes the reader from the basics about TCP and UDP protocols and through the life of a packet. It then justifies why troubleshooting performance problems is so important. It goes on to cover application types and the sequence of events when connecting to a server. Each step is thoroughly explained.
The e-book discusses six primary issues that influence performance:
* DNS lookups
* ARP resolution
* Establishing the TCP connection
* Sender/receiver interaction
* Data flow
* Closing the TCP connection
I thought the 12-page section, "Understanding How Applications Fail," was well written with plenty of graphics to assist in the learning process.
The introduction to the section "Troubleshooting Applications" put the problem in perspective:
Often troubleshooting is challenging and a best method isn't obvious.... In network troubleshooting, technicians who know the physical level sometimes attempt to solve the problem with cable testers and meters. Technicians who have RF communications background gravitate towards spectrum analyzers.
On the other hand, team members working on a problem may each have a different view of the same problem. The systems analyst will think of CPU performance, insufficient memory and fragmented disk space……The network architect might see it as routing issues. Or, the manager of network infrastructure might decide to test the WAN links for throughput or replace copper link with fiber connections [to increase bandwidth and reduce noise].
"Good troubleshooting requires having a broad base of experience, a solid knowledge of the technology involved and the availability of the proper tools." The troubleshooting tool should never be intrusive and affect the network operation. This document is after all a marketing tool for Fluke products.
It may seem like common sense, but the troubleshooter should start by:
* Analyzing the network as a whole. This is where the documentation on the network becomes important. I have encountered networks that are good on paper but what is installed and operating is different. In one case, the document displayed a 10-Mbps connection, because that what was ordered. In reality, the connection was a T1 circuit between the buildings.
* Follow the sequence of steps that are actually in use for the application access. Too many people make assumptions rather knowing what is really happening.
* When you think you have located the possible solution, make only one change/correction at a time. When two or more changes are made, how can you tell which was the right change? It is also possible that the two changes negate each other and the troubleshooter assumes both changes are worthless. The two changes can cancel each other out. If the change does not work, reverse the process to return to the conditions of the network before the change was made.
* Do not assume you fully understand the problem after you have made a successful improvement. First, other improvements may also be necessary. The first improvement may be a partial solution. Secondly, without a full understanding of the problem, conclusions may be made that will not be valid in the future.
* Go back to the user to provide feedback so that the user can help in the problem definition in the future. Also train the user to provide a better description of the problems in the future which can help resolve the problem more quickly. The feedback also is good PR for the IT department.
The Guide provides five steps for troubleshooting success: 1. You need to determine and isolate the problem. It could be an application, CPU, network or capacity problem.
2. Ensure that a complete application flow analysis is performed. You need to know all the stages of this flow. Missing a stage may mean missing the problem.
3. The obvious next step is to fix the problem.
4. But fixing the problem is not enough. The application must be exercised in its normal state to validate that the fix works under a wide range of conditions.
5. Fully document the problem and the fix in such a way that any other troubleshooter can understand the problem and solution. This is the part that many troubleshooters have difficulty doing.
Two case studies are presented at the end of the Guide:
* Obtaining Switch Statistics
* Investigating WAN Link Performance.
Both case studies feature the Fluke product. Even though there is some Fluke emphasis in the Guide, it is well worth archiving the Guide for future use. It is also useful for new technicians encountering application performance problems. Think of the Guide as a short course on troubleshooting. If you download the View short Application-Centric Video Tour you will see a demonstration of problem analysis using the Fluke product.