Troubleshooting Connectivity Issues

As a developer supporting software running across multiple tiers, you always need to find out if all the layers are connecting and when things don’t work, you need to know which layers may be having connectivity issues. An issue, as simple as one user not able to logon to your web site, you may end up with hours of troubleshooting. Recently, when I started working on the Windows 7 upgrade and later PowerBuilder/EAServer machine upgrades, I got some first hand experience. I used the below commands to troubleshoot various connectivity issues between client, EAServers and the database.

When I say connectivity, I mean, machine to machine, more at the software level (client/server, app servers, web servers etc). This is typically done through TCP/IP using sockets. Between your client application and the server, sockets are opened and closed constantly. Packets are sent across these sockets. Sockets and packets are purely software abstractions of bytestreams sent across the wire. If you have complete internet connectivity failure in your department or area, chances are there are Network Engineers working on it already. In any case below commands may come in handy in troubleshooting some of those issues or at least they will help you to report the network issues to the proper team(s).

 

Note: These commands are available on almost all OS’s. I’ve listed the syntax for using them on Windows, but they are very similar except for spelling differences and/or parameters allowed. Even different flavors of Unix (AIX, Solaris etc) may have these variations across versions. Search for the command for your OS to get the right syntax.

IPCONFIG

Whenever you have network issue, the first thing you would do is to run ipconfig (or ifconfig eth0 on Unix). This helps you to establish that you have network (or internet, if you are at home) connection and thus an IP Address. I am focusing on machine to machine connectivity, in this post. If you need more information on IPCONFIG see here. If you want to know how to get your machine connected to the network properly, please see here. Now on to the commands, I frequently use, to troubleshoot network/connectivity issues.

PING

Ping is your first line of attack when you are on a mission to find out why your PC or software is not connecting to the Server/machine as expected. For a simple is that server there type of questions, you can simply use ping <server>.

Ping command has several options. See here. I will just mention -t option will help you loop on a server until canceled (CTRL-C).

Example

ping www.google.com results in,

Pinging www.google.com [74.125.224.110] with 32 bytes of data:

Reply from 74.125.224.110: bytes=32 time=6ms TTL=49

Reply from 74.125.224.110: bytes=32 time=5ms TTL=49

Reply from 74.125.224.110: bytes=32 time=5ms TTL=49

Reply from 74.125.224.110: bytes=32 time=5ms TTL=49

Ping statistics for 74.125.224.110:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 5ms, Maximum = 6ms, Average = 5ms

You ping to find out if the server is out there, online or if your local machine can connect to the remote one (server/software). Your connection to the remote server may interrupted due to many number of things, including server going down, server software (like a Web Server or Application Server) going down, a virus (trojan kind) blocking/rerouting your connections or simply your wall outlet (ethernet) not plugged into the right socket. You may also use Ping to check if you have slow/reasonable connection to the server. That’s about all you can do.

Recently, we had some issues with our PowerBuilder software. It was connecting to the remote (EAServer) server, but get errors every now and then. I suspected network timeouts and wanted to find out if the average connection speed always the same. I left below command running on command line for an extended period of time, it runs until canceled (CTRL-C).

ping -t <server> > C:Tempping.log

This kept writing ping results to ping.log. Later I was able to see, there were periods of slowness. Then we followed this up with tracert etc and Network Engineer’s support to find a faulty switch in the path!!

Tracert

 

This is my next line of attack, it shows little more than Ping. When I know ping works, but seems slow or often timing out, then I could use the Trace Route command (Windows: Tracert, Linux/Unix: traceroute).

Example

Tracing route to www.google.com [74.125.239.116]

over a maximum of 30 hops:

​​

​I blurred out those addresses, but you can see how it builds a route from point A to B. This one took 19 hops (legs) to reach the final destination. It’s always better, if you have less hops each taking less time. When you complain about network being slow in a corporate environment, somebody is pulling some (patch) cables to reroute you through different set off stations (routers/switches).

Someone here asked the appropriate question as to why traceroute seems to be taking more time than a ping. There are some sensible answers too, particularly the analogies in 2nd answer. Ping above took only 6 ms, while if you add all the hops in tracert, it’s a large #?? The answer is that you shouldn’t add these timings for each hop. The timing for each hop is a round trip from your machine to that location. Tracert is more for seeing the “route” you are taking rather than how fast you are. Though the timing for each hop may help the Network Engineer to deduce if some router went bad. Also, some hops (routers) may end up in a timeout. They may be just blocking you (your IP address) from getting this type of information. (After all, routers have a better thing to do, like routing?!)

If you can’t reach a server, you may get something like this from Tracert:

​​It actually lists out all the 30 hops with “Request Timed out” message for each hop. Funnily, tracert to www.bing.com ended up in timeouts!!

As said above, note that timeouts could be just an indication that some appliance on the way (like a firewall) blocked you from sending tracert request directly to it. (Though it may allow other routers along the route to talk to it!)

And if a site does not exist, it simply says so.

C:>tracert www.gxxxoggle.com

Unable to resolve target system name www.gxxxoggle.com.

Here is a nice link from Comcast asking their home users to run Traceroute to find any issues.

This link from CISCO (a major router company) talks in detail about Ping and Tracert.

NetStat

The next command I would use is NetStat. This is powerful. In Ping and Tracert we only know the servers that we connect to. No information about ports(1). But, multi-tier software “talk” to each other through TCP/IP ports. So, when you are having connection problem it’s not enough that, the server is pingable and traceable. You need to know, if the server will respond to certain ports.

For e.g., 80 or 8080 is the standard port most web servers listen on. FTP servers wait on port #21. See this wikipedia post for a comprehensive list of standard port numbers used. Oracle databases by default run on port 1521 and Sybase (SAP) EAServers listen on 9000 by default. Software typically will have some settings to change the port used.

Netstat lists all the ports that the current machine (where the command is run) is using – listening on or connecting to. Typically a Server software (even if it’s run locally on a PC) listens and the client connects to it. In a n-tier environment any machine that initiates a request could be a client and the one that services the request will be the server – roles can change depending on who initiates the request.

Netstat output thus lists the end points of a connection, the status of such connection. Each status value clearly defines the state the connection is in, so troubleshooting will be easier.

Here is a typical output: of netstat /a:

Proto Local Address Foreign Address State

…..

TCP 0.0.0.0:135 SAM-PC:0 LISTENING

….

TCP 10.xx.xxx.173:10187 dbserver:1521 ESTABLISHED

TCP 10.xx.xxx.173:1192 proxyserver:8080 CLOSE_WAIT

….

The address/port (Local Address) on the left is my machine. The address/port (Foreign Address) on the right is the remote machine. Last field is the status. In this case, Foreign address is my PC, since I am running EAServer locally on my machine.

Possible values for status are

CLOSE_WAIT

CLOSED

ESTABLISHED

FIN_WAIT_1FIN_WAIT_2

LAST_ACK

LISTENING

SYN_RECEIVED

SYN_SEND

TIMED_WAIT

 

Those in bold are more frequently seen on Windows. On Server machines, you will typically see a lot of LISTENING ports. Otherwise, if all goes well, you will see ESTABLISHED or CLOSE_WAIT in netstat output. ESTABLISHED means the connection is active. CLOSE_WAIT means the connection (socket) is being closed currently.

Before the actual connection happens a SYN is sent from client and server acknowledges with a SYN_RECEIVED status. We may not see these a lot as these are interim statuses before a connection is established. But, if you see client connection remaining in SYN_SEND for too long, then chances are the the packet never reached server. (Instead it’s probably blocked by a firewall or something).

Too many CLOSE_WAITs are bad. That means the connection keeps closing down.

This post talks about CLOSE_WAIT and TIME_WAITs.

This post here talks about finding hack attacks using NetStat.

Examples:

Checking connection to Database Server

 

C:>netstat -a | findstr(2) 9000

TCP 10.xx.xx.173:10187 dbserver:1521 ESTABLISHED

This tells me there is a connection made to the Database server – (though actual database SID connected to, won’t be here. You will have to use Oracle’s own, TNSPING for that)

Checking connection to EAServer

C:>netstat -a | findstr 9000

TCP 10.xx.xxx.173:9000 SAM-PC:0 LISTENING

TCP 127.0.0.1:9000 SAM-PC:0 LISTENING

Notice there are 2 entries for the EAServer running on port 9000, one on localhost and another at the actual IP address for my machine. This is because have 2 listeners in EAServer at port 9000, one listening on Localhost and another on IP Address of the machine (for anyone connecting from outside).

Now, if I run a client connecting to EAServer running on LocalHost: (both client and server running on my machine), here is what Netstat will show me:

C:>Netstat -a | findstr 9000

TCP 127.0.0.1:9000 SAM-PC:0 LISTENING

TCP 127.0.0.1:9000 SAM-PC:15264 ESTABLISHED

TCP 127.0.0.1:15264 SAM-PC:9000 ESTABLISHED

Notice the original server LISTENING is still there. There are 2 new connections established – essentially 2 ends of the same connection. This is because both client and server are running on the same machine. If I am connecting to a remote EAServer, then I will only see one ESTABLISHED entry here and the netstat on the server will show the other end.

This IBM page, has a pretty extensive description of Netstat command. Just remember the actual parameters vary on each OS, so some of the parameters they mentioned may not be available elsewhere.

Notes:

(1) The simplest way to check if a particular port is open on a server is to use telnet. See here for the details.

(2) I typically use Findstr along with Netstat to find specific address/port. Findstr is like Unix grep with some quirky syntax.

References

http://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-software-releases-121-mainline/12778-ping-traceroute.html

http://ittrickstipssoftware.blogspot.com/2012/02/how-to-know-if-hacker-attack-on-your-pc.html?showComment=1408582116680#c400238382122762387

http://openmaniak.com/netstat.php

http://docs.oracle.com/cd/E23824_01/html/821-1453/ipconfig-142.html

http://www.business-superstar.com/talking-tech/using-traceroute-to-fix-slow-websites/

http://pcsupport.about.com/od/commandlinereference/p/netstat-command.htm

http://www-01.ibm.com/support/knowledgecenter/SSLTBW_1.12.0/com.ibm.zos.r12.halu101/concepts.htm%23concepts

http://j2eedebug.blogspot.com/2008/12/difference-between-closewait-and.html

Advertisements

Comments, please?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s