News

Troubleshot & repair Linux networks

No network connection on your laptop or problems with your web hosting? We’re here to help

Troubleshoot & repair Linux networks

“The Network is the computer,” is the famous, prescient quote made by Sun Microsystem’s chief scientist and employee number five, John Gage, in 1984. The growth of the web, mobile and cloud computing have borne out that phrase, and a computer without a network connection is just an expensive paperweight.

Fortunately, networking is central to Linux, with the Internet and the Web having been built on UNIX. Most distros have the built-in tools that will tell you what’s going on, or at least start you on your way in investigating your network problem. More sophisticated tools can be found in your distro’s repository and, as nearly all of them are command line based, will work just as well on your VPS as your laptop.

Linux puts the power in your hands – you just need to know where to look. We’re going to take you through the basics of the GNU/Linux networking stack, and what can go wrong with it (and the rest of the Internet). We’ll look at tools and config files to help you and finish with help for using four of the most useful tools: netcat, dig, traceroute and Wireshark.

Network essentials

Where is the network down? Don’t neglect hardware problems – after basic checks it’s worth looking for pulled cables or fault lights on your Wi-Fi router – but even following the route your IP packets take, there are lots of places for problems to occur. Some problems are easy to check, while some are more likely than others – let this guide you in the order you tackle your search.

First some background information. You don’t need to pass an RHCE (Red Hat Certified Engineer) or LPI (Linux Professional Institute) exam, you just need an appreciation of TCP/IP networking. Feel free to skip through this page lightly, and refer back after reading the more practical parts of the article.

TCP

TCP/IP – Transmission Control Protocol/Internet Protocol – is a set of rules for computers to communicate with each other. TCP sits on top of IP, demanding confirmation for each data packet sent – a lot of overhead compared to UDP (User Datagram Protocol) where no checks are made, but it means there is a lot of useful information available to tools that diagnose TCP/IP problems.

30 years ago, when TCP/IP standards were developed, the computing world was a different place and TCP/IP’s independence from the hardware and transmission medium, and open standards and common addressing scheme, have helped give us the networked world we have today.

OSI and TCP/IP
The OSI and TCP/IP networking models

IP, the Internet layer, defines the datagram – the basic unit of transmission in the Internet, consisting of a header and a block of data. The header contains all the information needed to deliver it – routing from the originating equipment to the destination – in five or six 32-bit words.

The header contains the destination address for the data. If it’s not on the local network, it will be passed to a gateway (or IP router) and continue until it reaches its destination, its journey being determined by routing protocols. The address in IP version 4 (IPv4) is a dotted quad, a 32-bit binary number normally expressed in the form n.n.n.n, where n is anywhere between zero and 255. Certain numbers are reserved, such as 127.0.0.1 for local host, a way for any computer to refer to “myself”, and private addresses used for local networks, such as 192.168.n.n.

When even your toaster wants to connect to the Internet, the 4.3 billion addresses provided by IPv4 aren’t enough. IPv6  (version 5 never got going) defining 128-bit addresses, attempts to fix this. Formalised in 1998, IPv6 still carries under 10 per cent of the world’s Internet traffic. We’ll refer to IPv4 as IP from now on.

In 192.168.0.0 networks, for example, a subnet mask tells other computers (hosts) and routers which part of the address is for the subnet (eg 192.168.0) and which is for the host. Our ADSL router has given our laptop the IP address of 192.168.0.2, so the host portion is two. The subnet mask is 255.255.255.0, which tells routing devices what parts of the IP address to treat as what.

TCP establishes a virtual connection between a destination and a source, ensuring packets are reassembled in order and re-sending any that get lost. It specifies a port at each end – numbered between 0 and 65535 to indicate the service or application. There’s a long list in /etc/services on your machine but well-known ones include 25 for sending mail and 80 for the web. The combination of IP address and port is known as a socket.

Below the level of IP, your physical network hardware (wireless or ethernet card) uses a MAC (Media Access Control) address – six colon-separated numbers. The protocols that deal with this are the ARP Protocol (Address Resolution Protocol), which translates IP addresses to MAC addresses and its reverse, RARP, which handles translation the other way.

Hostnames like wikipedia.org are used to save you putting 91.198.174.192 into Firefox. The Domain Name System (DNS) uses DNS servers on the Internet to store these names, and hiccoughs in contacting DNS servers account for many networking problems.

Diagnosing issues

What’s not working – connecting to one website or all of them? If it’s just one then it may still be a problem at your end, but if it’s everything, let’s find out where the problem lies.

First your network connection – most desktop distros ship with NetworkManager to manage connections. From the command line, typing nm-tool will report what it knows of your network – look for ‘State: connected’. If you don’t have nm-tool, use ifconfig to see which interfaces are recognised and ethtool for connection status information, or use iwconfig for wireless connections.

While ethtool will show you’re physically connected to the network (Link detected: yes) and iwconfig that you are connected to a wireless router, ifconfig will give you your IP address and netmask, telling you that this much of your networking is successfully configured.

Running route will show the routing table, which includes the default gateway to the rest of the Internet. If there’s no default gateway shown for addresses outside the local subnet, you will need to fix this. Route can be used to add routes but you need to address the cause of the problem.

Your servers will have fixed IP addresses, which can be edited to correct gateway and other network details. Laptops tend to be configured automatically by a DHCP (Dynamic Host Configuration Protocol) daemon, often running on an ADSL router, where settings can be changed for the problem machine if necessary.

Having corrected settings, a network restart:

sudo service networking restart

…will pick up the revised settings on Debian-based PCs – leave out the gerund (the -ing) for Red Hat boxes. Run route again to check for the appearance of the default gateway.

Ping uses another part of the TCP/IP protocol stack, ICMP, to send an ECHO_REQUEST datagram, and the ICMP ECHO_RESPONSE produced by the host or gateway pinged is used to calculate a time for the trip. Ping tells you if a machine is up, what latency there is in the network and how many packets are lost, all indicative of something unless the server has been set to drop ICMP requests by an overzealous sysadmin, something of negligible security use in most cases.

Use ping to check that you have a route to hosts on the Internet. Start by pinging your gateway:

ping 192.168.0.1

…then ping a reliable host like 8.8.8.8, one of Google’s public-facing DNS servers (the other is 8.8.4.4). We’ve been using IP addresses and -n switches to avoid DNS problems distracting us from other network faults, but now’s the time to check DNS functionality. Nslookup, less sophisticated than dig (part of dig’s output can be seen above), but is fine for checking that a domain name resolves to an IP address. If you don’t get an answer, have a look in /etc/resolv.conf.

If you’ve ruled DNS out, try some of the tools overleaf – traceroute to see if you can route all the way there, telnet and friends to see if a particular port is open, dig for more DNS and Wireshark for investigating unresponsive or slow services.

If it is your webserver that’s the problem, then ssh in and run:

netstat -lnp | grep -i apache

…(replacing apache with nginx, httpd or whatever is appropriate) to see if your web server is listening to all addresses on port 80. You could grep 80 if that’s the only port which you’re concerned with, but check what else Apache is up and listening on.

Configuration files

Everything is a file, even connected devices – that’s the Linux way. In the Eighties many Unix systems kept binary configurations, but inspired by the Plan9 operating system, Linux put most configuration information in text files. Knowing where they are and what to do with them means your text editor also becomes a powerful tool in checking, fixing and maintaining your Network.

This starts at the hardware level – physical interfaces are found under /dev, and /proc exposes the configuration of installed PCI buses and devices to be read by lshw when you call:

lshw -C network

…to check the logical name entry to use with tools like ethtool and ifconfig.

It’s not always simple though. When swapping between Red Hat and Debian/Ubuntu based machines, the ethernet interface on our Ubuntu  machine was configured in the file /etc/network/interfaces, while the Fedora 20 laptop’s NIC was /etc/sysconfig/network-scripts/ifcfg-em1, sharing a directory with ifcfg-*** files for every wireless hub to which we had ever connected it.

Linux’s everything-is-a-file approach also means that if you have issues with hardware, they can often be solved with a text editor. For example, if the kernel isn’t loading the module for your NIC then /etc/modules, or a similarly named file on your distro, is the place to add not just modules to load but also alisases to the device’s name, if that’s what is causing the error:

alias eth0 b44

DNS again

DNS is accessed by the resolver routines – read the config file /etc/resolv.conf to know where to search. Look at the file on your laptop and you may see something like:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.1.1
search Home

The 127.0.1.1 (rather than 127.0.0.1) is a pointer to a PC running dnsmasq that is a lightweight forwarding DNS server under the control of NetworkManager. In distros without this, dhclient will grab the address of the DNS server from the DHCP server.

It is best to use /etc/resolvconf/resolv.conf.d/base to place an entry like the following:

nameserver 8.8.8.8
nameserver 8.8.4.4

…for automatically writing to /etc/resolv.conf. Then running resolvconf -u (as root or with sudo) will update resolvconf.

A closer look at /etc/resolv.conf shows it to be a symlink to /run/resolvconf/resolv.conf, which is where dnsmasq writes it. To temporarily remove dnsmasq, try commenting out its entry in /etc/NetworkManager/NetworkManager.conf.

DNS servers are queried in the order they appear in your /etc/resolv.conf file – put the one you want to try out first and/or comment out the remainder by placing a # at the beginning of its line so that the resolvconf ignores it.

Opennicproject.org and http://freedns.zone offer DNS with no redirects and no logging, which is essential if you live in a place where what you do online is monitored or restricted.

Rounding off config files by returning to IPv6, it can be removed systemwide by editing /etc/modprobe.d/aliases to add:

alias net-pf-10 ipv6 off
alias net-pf-10 off
alias ipv6 off

…and rebooting. If you rule it out as a problem, remember to put it back again:

alias net-pf-10 ipv6

Fix network problems

(Telnet to) netcat

Netcat does everything that the humble telnet does plus much more, but you may find yourself on a box without netcat, so we’ll start with an example from old-school telnet.

01 Humble telnet

If you started using computers after the Nineties, when telnet was replaced by SSH in a suddenly far less secure world, you may have dismissed it as a relic from the past. But telnet lives on as a useful diagnostic tool available from any distro, connecting to specific ports to see what’s open and working.

telnet_Web

02 Enter netcat

If you can install netcat (nc) then you won’t fall back on telnet much, as it combines the simple testing abilities of telnet with abilities to do almost anything with TCP, UDP or Unix-domain sockets: open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports and port scan.

nc1_Web

03 Port scan

While it’s not good manners to check every port on someone’s machine to see what’s left open (a portscan), it’s useful on your own machines both for security (‘that shouldn’t be open’) and diagnostics (‘that should have been open and listening’). Try running something like nc -vnzu 192.168.0.1 1-65535 to do this.

04 Pass the port

One useful nc trick is to quickly set up an impromptu server listening on a particular port, to check there is nothing impeding a connection on that port between you and the server. In the image below, we set nc as a one-off web server and read info on the host that connects to it.

nc3_Web

Traceroute

You might not think about how the Internet works while you’re using it, but traceroute lifts the lid on where your packets are travelling – showing the time packets take to reach each gateway machine between your machine and the server.

01 Follow the hops

Traceroute tests each hop between you and the destination host. Although not always conclusive, output shows where problems may be occurring. While the screenshot shows the default number of hops and packet size, you can adjust that with:

traceroute -m 255 wikipedia.org 70

traceroute_Web

02 Journey times

Those times displayed in ms are the round trip times to each host for three packets sent. Adjust the number of packets with –q – for example, -q1 sends just a single packet. A longer time from the UK could be a channel hop.

traceroute2_Web

03 Mtr

A set of asterixes is an unreachable host but mtr provides a continuous traceroute to help to detect intermittent problems. You may only be able to fix problems found in your own networks, but knowing where the problem lies could help to generate a route around fix.

mtr_Web

04 Blocked ICMP

As we mentioned with ping, some systems administrators block ICMP, so standard traceroute won’t work. Tcptraceroute provides a traceroute through TCP instead of ICMP.

Dig

We’ve used the -n option a lot in this tutorial, as DNS issues can easily cloud other problems. Once you’ve cleared up suspected DNS problems on your machine with the resolver, it’s time to reach out through the hierarchical world of DNS servers to check everything is as it should be. Nslookup and host perform simple searches, but dig is the most flexible tool available.

01 Address search

Nslookup may be sufficient for resolving an address or checking that you can, but for useful information about DNS servers and their recursive connections across the Internet, fire up dig, whose flexibility means that it repays a little time spent getting to know some of its options.

02 Names are served

By default, dig returns A records, but it can be used to check other record types such as MX (mail servers). In the screenshot below we have used NS to find the nameservers for
a named domain.

dig2.png_Web

03 Hierarchical

DNS is hierarchical, with the TLD (top level domain), such as .com or .org.uk queried first, then the name part. With searches taking place recursively there’s plenty of room for errors – or even malicious attacks. “Dig +trace” shows you the successive hierarchical steps taken by your query.

04 +short option

Hierarchical searches output a lot of information that you probably don’t need – even from a standard DNS lookup you may only want the IP address. The +short option gives you just such an abbreviated output, which is also very useful in scripting searches.

dig4_Web

Wireshark

Like tcpdump, Wireshark can dump packets from the network, but it also performs interactive analysis – slightly over the top for minor networking problems, but handy for locating bottlenecks in the system. In most distros Wireshark will be the GUI (Gtk) version, with the console version packaged as tshark. Try them both so you can adapt to whichever is best when trouble strikes.

01 Powertool

Despite the baffling number of options available, starting is a simple matter of selecting interfaces from the Capture menu. To get Wireshark to see your interfaces and avoid running it as root user, see Capture Setup/Capture Privileges over at wiki.wireshark.org.

02 Portable troubleshooting

As Wireshark is useful for detecting many problems with packet loss or latency, and won’t be installed everywhere you go, you can avoid the dance around superuser permissions by carrying it around on a USB live distro.

03 Filter cut

Looking at the raw data is overwhelming and even the choice of filters is large, but you can right-click a suspicious entry and use that as the basis for a filter, or do the same from the filter hierarchy. Simplest case, you’re looking for a particular protocol – say DNS, or perhaps something encrypted via TLS – so you just put that in the filter toolbar.

wireshark3_Web

04 Command line shark

On your VPS, or other non-GUI box, tshark is functionally equivalent to Wireshark. It’s worth installing after Wireshark and then getting familiar with, so you are prepared if you ever need it.

 

×