Linux FreeS/WAN Troubleshooting Guide

Overview

This document covers several general places where you might have a problem:

During install. Includes troubleshooting OE.
During the negotiation process.
Using an established connection.

This document also contains notes which expand on points made in these sections, and tips for problem reporting. If the other end of your connection is not FreeS/WAN, you'll also want to read our interoperation document.

1. During Install

1.1 RPM install gotchas

With the RPM method:

Be sure you have installed both the userland tools and the kernel components. One will not work without the other. For example, when using FreeS/WAN-produced RPMs for our 2.06 release, you need both:
```
    freeswan-userland-2.06_2.4.20_20.9-0.i386.rpm
    freeswan-module-2.06_2.4.20_20.9-0.i386.rpm
```

1.2 Problems installing from source

When installing from source, you may find these problems:

Missing library. See this FAQ.
Missing utilities required for compile. See this checklist.
Kernel version incompatibility. See this FAQ.
Another compile problem. Find information in the out.* files, ie. out.kpatch, out.kbuild, created at compile time in the top-level Linux FreeS/WAN directory. Error messages generated by KLIPS during the boot sequence are accessible with the dmesg command.
Check the list archives and the List in Brief to see if this is a known issue. If it is not, report it to the bugs list as described in our problem reporting section. In some cases, you may be asked to provide debugging information using gdb; details below.
If your kernel compiles but you fail to install your new FreeS/WAN-enabled kernel, review the sections on installing the patched kernel, and testing to see if install succeeded.

1.3 Install checks

ipsec verify checks a number of FreeS/WAN essentials. Here are some hints on what do to when your system doesn't check out:

Problem Status Action

ipsec not on-path
Add /usr/local/sbin to your PATH.

Missing KLIPS support critical See this FAQ.

No RSA private key
Follow these instructions to create an RSA key pair for your host. RSA keys are:

required for opportunistic encryption, and

our preferred method to authenticate pre-configured connections.

pluto not running critical
service ipsec start

No port 500 hole critical Open port 500 for IKE negotiation.

Port 500 check N/A Check that port 500 is open for IKE negotiation.

Failed DNS checks Opportunistic encryption requires information from DNS. To set this up, see our instructions.

No public IP address Check that the interface which you want to protect with IPSec is up and running.

1.3 Troubleshooting OE

OE should work with no local configuration, if you have posted DNS TXT records according to the instructions in our quickstart guide. If you encounter trouble, try these hints. We welcome additional hints via the users' mailing list.

Symptom	Problem	Action
OE host is very slow to contact other hosts.	Slow DNS service while running OE.	It's a good idea to run a caching DNS server on your OE host, as outlined in this mailing list message. If your DNS servers are elsewhere, put their IPs in the `clear` policy group, and re-read groups with ipsec auto --rereadgroups
Can't Opportunistically initiate for 192.0.2.2 to 192.0.2.3: no TXT record for 13.2.0.192.in-addr.arpa.	Peer is not set up for OE.	None. Plenty of hosts on the Internet do not run OE. If, however, you have set OE up on that peer, this may indicate that you need to wait up to 48 hours for its DNS records to propagate.
`ipsec verify` does not find DNS records: ... Looking for TXT in forward map: xy.example.com...[FAILED] Looking for TXT in reverse map...[FAILED] ... You also experience authentication failure: Possible authentication failure: no acceptable response to our first encrypted message	DNS records are not posted or have not propagated.	Did you post the DNS records necessary for OE? If not, do so using the instructions in our quickstart guide. If so, wait up to 48 hours for the DNS records to propagate.
`ipsec verify` does not find DNS records, and you experience authentication failure.	For iOE, your ID does not match location of forward DNS record.	In `config setup`, change `myid=` to match the forward DNS where you posted the record. Restart FreeS/WAN. For reference, see our iOE instructions.
`ipsec verify` finds DNS records, yet there is still authentication failure. ( ? )	DNS records are malformed.	Re-create the records and send new copies to your DNS administrator.
`ipsec verify` finds DNS records, yet there is still authentication failure. ( ? )	DNS records show different keys for a gateway vs. its subnet hosts.	All TXT records for boxes protected by an OE gateway must contain the gateway's public key. Re-post any incorrect records using our our OE gateway instructions.
OE gateway loses connectivity to its subnet. The gateway's routing table shows routes to the subnet through IPsec interfaces.	The subnet is part of the `private` or `block` policy group on the gateway.	Remove the subnet from the group, and reread groups with ipsec auto --rereadgroups
OE does not work to hosts on the local LAN.	This is a known issue.	See our known issues document for details and a workaround.
FreeS/WAN does not seem to be executing your default policy. In your logs, you see a message like: /etc/ipsec.d/policies/iprivate-or-clear" line 14: subnet "0.0.0.0/0", source 192.0.2.13/32, already "private-or-clear"	You may have defined your default policy twice. Fullnet (0.0.0.0/0) in a policy group file defines your default policy. It should normally be present in only one policy group file. You may, however, define a second default policy to protect another local endpoint (e.g. a subnet behind the FreeS/WAN gateway)	Find all policies which contain fullnet with: grep -F 0.0.0.0/0 /etc/ipsec.d/policies/* then remove any unwanted occurrences.
You're running FreeS/WAN 2.01 (or later), and initiating a connection to FreeS/WAN 2.00 (or earlier). In your logs, you see a message like: no RSA public key known for '192.0.2.13'; DNS search for KEY failed (no KEY record for 13.2.0.192.in-addr.arpa.) The older FreeS/WAN logs no error.	A protocol level incompatibility between 2.01 (or later) and 2.00 (or earlier) causes this error. It occurs when a FreeS/WAN 2.01 (or later) box for which no KEY record is posted attempts to initiate an OE connection to older FreeS/WAN versions (2.00 and earlier). Note that older versions can initiate to newer versions without this error.	If you control the peer host, upgrade its FreeS/WAN to 2.01 (or later), and post new style TXT records for it. If not, but if you know its sysadmin, perhaps a quick note is in order. If neither option is possible, you can ease the transition by posting an old style KEY record (created with a command like "ipsec showhostkey --key") to the reverse map for the FreeS/WAN 2.01 (or later) box.

2. During Negotiation

When you fail to bring up a tunnel, you'll need to find out:

what your connection state is, and often
an error message.

before you can diagnose your problem.

2.1 Determine Connection State

Finding current state

You can see connection states (STATE_MAIN_I1 and so on) when you bring up a connection on the command line. If you have missed this, or brought up your connection automatically, use:

ipsec auto --status

The most relevant state is the last one reached.

`What's this supposed to look like?`

Negotiations should proceed though various states, in the processes of:

IKE negotiations (aka Phase 1, Main Mode, STATE_MAIN_*)
IPSEC negotiations (aka Phase 2, Quick Mode, STATE_QUICK_*)

These are done and a connection is established when you see messages like:

    000 #21: "myconn" STATE_MAIN_I4 (ISAKMP SA established)...
    000 #2: "myconn" STATE_QUICK_I2 (sent QI2, IPsec SA established)...

Look for the key phrases are "ISAKMP SA established" and "IPSec SA established", with the relevant connection name. Often, this happens at STATE_MAIN_I4 and STATE_QUICK_I2, respectively.

ipsec auto --status will tell you what states have been achieved, rather than the current state. Since determining the current state is rather more difficult to do, current state information is not available from Linux FreeS/WAN. If you are actively bringing a connection up, the status report's last states for that connection likely reflect its current state. Beware, though, of the case where a connection was correctly brought up but is now downed: Linux FreeS/WAN will not notice this until it attempts to rekey. Meanwhile, the last known state indicates that the connection has been established.

If your connection is stuck at STATE_MAIN_I1, skip straight to here.

2.2 Finding error text

Solving most errors will require you to find verbose error text, either on the command line or in the logs.

Verbose start for more information

Note that you can get more detail from ipsec auto using the --verbose flag:

    ipsec auto --verbose --up west-east

More complete information can be gleaned from the log files.

Debug levels count

The amount of description you'll get here depends on ipsec.conf debug settings, klipsdebug= and plutodebug=. When troubleshooting, set at least one of these to all, and when done, reset it to none so your logs don't fill up. Note that you must have enabled the klipsdebug compile-time option for the klipsdebug configuration switch to work.

For negotiation problems plutodebug is most relevant. klipsdebug applies mainly to attempts to use an already-established connection. See also this description of the division of duties within Linux FreeS/WAN.

After raising your debug levels, restart Linux FreeS/WAN to ensure that ipsec.conf is reread, then recreate the error to generate verbose logs.

`ipsec barf` for lots of debugging information

ipsec barf (8) collects a bunch of useful debugging information, including these logs Use the command

    ipsec barf > barf.west

to generate one.

Find the error

Search out the failure point in your logs. Are there a handful of lines which succinctly describe how things are going wrong or contrary to your expectation? Sometimes the failure point is not immediately obvious: Linux FreeS/WAN's errors are usually not marked "Error". Have a look in the FAQ for what some common failures look like.

Tip: problems snowball. Focus your efforts on the first problem, which is likely to be the cause of later errors.

Play both sides

Also find error text on the peer IPSec box. This gives you two perspectives on the same failure.

At times you will require information which only one side has. The peer can merely indicate the presence of an error, and its approximate point in the negotiations. If one side keeps retrying, it may be because there is a show stopper on the other side. Have a look at the other side and figure out what it doesn't like.

If the other end is not Linux FreeS/WAN, the principle is the same: replicate the error with its most verbose logging on, and capture the output to a file.

2.3 Interpreting a Negotiation Error

Connection stuck at STATE_MAIN_I1

This error commonly happens because IKE (port 500) packets, needed to negotiate an IPSec connection, cannot travel freely between your IPSec gateways. See our firewall document for details.

Other errors

Other errors require a bit more digging. Use the following resources:

the FAQ . Since this document is constantly updated, the snapshot's FAQ may have a new entry relevant to your problem.
our background document . Special considerations which, while not central to Linux FreeS/WAN, are often tripped over. Includes problems with packet fragmentation, and considerations for testing opportunism.
the list archives. Each of the searchable archives works differently, so it's worth checking each. Use a search term which is generic, but identifies your error, for example "No connection is known for".
Often, you will find that your question has been answered in the past. Finding an archived answer is quicker than asking the list. You may, however, find similar questions without answers. If you do, send their URLs to the list with your trouble report. The additional examples may help the list tech support person find your answer.
Look into the code where the error is being generated. The pluto code is nicely documented with comments and meaningful variable names.

If you have failed to solve your problem with the help of these resources, send a detailed problem report to the users list, following these guidelines.

3. Using a Connection

3.1 Orienting yourself

`How do I know if it works?`

Test your connection by sending packets through it. The simplest way to do this is with ping, but the ping needs to test the correct tunnel. See this example scenario if you don't understand this.

If your ping returns, test any other connections you've brought u all check out, great. You may wish to test with large packets for MTU problems.

`ipsec barf` is useful again

If your ping fails to return, generate an ipsec barf debugging report on each IPSec gateway. On a non-Linux FreeS/WAN implementation, gather equivalent information. Use this, and the tips in the next sections, to troubleshoot. Are you sure that both endpoints are capable of hearing and responding to ping?

3.2 Those pesky configuration errors

IPSec may be dropping your ping packets since they do not belong in the tunnels you have constructed:

Your ping may not test the tunnel you intend to test. For details, see our "I can't ping" FAQ.
Alternately, you may have a configuration error. For example, you may have configured one of the four possible tunnels between two gateways, but not the one required to secure the important traffic you're now testing. In this case, add and start the tunnel, and try again.

In either case, you will often see a message like:

klipsdebug... no eroute

which we discuss in this FAQ.

Note:

Network Address Translation (NAT) and IP masquerade may have an effect on which tunnels you need to configure.
When testing a tunnel that protects a multi-node subnet, try several subnet nodes as ping targets, in case one node is routing incorrectly.

3.3 Check Routing and Firewalling

If you've confirmed your configuration assumptions, the problem is almost certainly with routing or firewalling. Isolate the problem using interface statistics, firewall statistics, or a packet sniffer.

Background:

Linux FreeS/WAN supplies all the special routing it needs; you need only route packets out through your IPSec gateway. Verify that on the subnetted machines you are using for your ping-test, your routing is as expected. I have seen a tunnel "fail" because the subnet machine sending packets out an alternate gateway (not our IPSec gateway) on their return path.
Linux FreeS/WAN requires particular firewalling considerations. Check the firewall rules on your IPSec gateways and ensure that they allow IPSec traffic through. Be sure that no other machine - for example a router between the gateways - is blocking your IPSec packets.

View Interface and Firewall Statistics

Interface reports and firewall statistics can help you track down lost packets at a glance. Check any firewall statistics you may be keeping on your IPSec gateways, for dropped packets.

Tip: You can take a snapshot of the packets processed by your firewall with:

    iptables -L -n -v

You can get creative with "diff" to find out what happens to a particular packet during transmission.

Both cat /proc/net/dev and ifconfig display interface statistics, and both are included in ipsec barf. Use either to check if any interface has dropped packets. If you find that one has, test whether this is related to your ping. While you ping continuously, print that interface's statistics several times. Does its drop count increase in proportion to the ping? If so, check why the packets are dropped there.

To do this, look at the firewall rules that apply to that interface. If the interface is an IPSec interface, more information may be available in the log. Grep for the word "drop" in a log which was created with klipsdebug=all as the error happened.

See also this discussion on interpreting ifconfig statistics.

3.4 When in doubt, sniff it out

If you have checked configuration assumptions, routing, and firewall rules, and your interface statistics yield no clue, it remains for you to investigate the mystery of the lost packet by the most thorough method: with a packet sniffer (providing, of course, that this is legal where you are working).

In order to detect packets on the ipsec virtual interfaces, you will need an up-to-date sniffer (tcpdump, ethereal, ksnuffle) on your IPSec gateway machines. You may also find it useful to sniff the ping endpoints.

Anticipate your packets' path

Ping, and examine each interface along the projected path, checking for your ping's arrival. If it doesn't get to the the next stop, you have narrowed down where to look for it. In this way, you can isolate a problem area, and narrow your troubleshooting focus.

Within a machine running Linux FreeS/WAN, this packet flow diagram will help you anticipate a packet's path.

Note that:

from the perspective of the tunneled packet, the entire tunnel is one hop. That's explained in this FAQ.
an encapsulated IPSec packet will look different, when sniffed, from the plaintext packet which generated it. You can see plaintext packets entering an IPSec interface and the resulting cyphertext packets as they emerge from the corresponding physical interface.

Once you isolate where the packet is lost, take a closer look at firewall rules, routing and configuration assumptions as they affect that specific area. If the packet is lost on an IPSec gateway, comb through klipsdebug output for anomalies.

If the packet goes through both gateways successfully and reaches the ping target, but does not return, suspect routing. Check that the ping target routes packets back to the IPSec gateway.

3.5 Check your logs

Here, too, log information can be useful. Start with the guidelines above.

For connection use problems, set klipsdebug=all. Note that you must have enabled the klipsdebug compile-time option to do this. Restart Linux FreeS/WAN so that it rereads ipsec.conf, then recreate the error condition. When searching through klipsdebug data, look especially for the keywords "drop" (as in dropped packets) and "error".

Often the problem with connection use is not software error, but rather that the software is behaving contrary to expectation.

Interpreting log text

To interpret the Linux FreeS/WAN log text you've found, use the same resources as indicated for troubleshooting connection negotiation: the FAQ , our background document, and the list archives. Looking in the KLIPS code is only for the very brave.

If you are still stuck, send a detailed problem report to the users' list.

3.6 More testing for the truly thorough

Large Packets

If each of your connections passed the ping test, you may wish to test by pinging with large packets (2000 bytes or larger). If it does not return, suspect MTU issues, and see this discussion.

Stress Tests

In most users' view, a simple ping test, and perhaps a large-packet ping test suffice to indicate a working IPSec connection.

Some people might like to do additional stress tests prior to production use. They may be interested in this testing protocol we use at interoperation conferences, aka "bakeoffs". We also have a testing directory that ships with the release.

4. Problem Reporting

4.1 How to ask for help

Ask for troubleshooting help on the users' mailing list, users@lists.freeswan.org. While sometimes an initial query with a quick description of your intent and error will twig someone's memory of a similar problem, it's often necessary to send a second mail with a complete problem report.

When reporting problems to the mailing list(s), please include:

a brief description of the problem
if it's a compile problem, the actual output from make, showing the problem. Try to edit it down to only the relevant part, but when in doubt, be as complete as you can. If it's a kernel compile problem, any relevant out.* files
if it's a run-time problem, pointers to where we can find the complete output from "ipsec barf" from BOTH ENDS (not just one of them). Remember that it's common outside the US and Canada to pay for download volume, so if you can't post barfs on the web and send the URL to the mailing list, at least compress them with tar or gzip.
If you can, try to simplify the case that is causing the problem. In particular, if you clear your logs, start FreeS/WAN with no other connections running, cause the problem to happen, and then do ipsec barf on both ends immediately, that gives the smallest and least cluttered output.
any other error messages, complaints, etc. that you saw. Please send the complete text of the messages, not just a summary.
what your network setup is. Include subnets, gateway addresses, etc. A schematic diagram is a good format for this information.
exactly what you were trying to do with Linux FreeS/WAN, and exactly what went wrong
a fix, if you have one. But remember, you are sending mail to people all over the world; US residents and US citizens in particular, please read doc/exportlaws.html before sending code -- even small bug fixes -- to the list or to us.
When in doubt about whether to include some seemingly-trivial item of information, include it. It is rare for problem reports to have too much information, and common for them to have too little.

Here are some good general guidelines on bug reporting: How To Ask Questions The Smart Way and How to Report Bugs Effectively.

4.2 Where to ask

To report a problem, send mail about it to the users' list. If you are certain that you have found a bug, report it to the bugs list. If you encounter a problem while doing your own coding on the Linux FreeS/WAN codebase and think it is of interest to the design team, notify the design list. When in doubt, default to the users' list. More information about the mailing lists is found here.

For a number of reasons -- including export-control regulations affecting almost any private discussion of encryption software -- we prefer that problem reports and discussions go to the lists, not directly to the team. Beware that the list goes worldwide; US citizens, read this important information about your export laws. If you're using this software, you really should be on the lists. To get onto them, visit lists.freeswan.org.

If you do send private mail to our coders or want a private reply from them, please make sure that the return address on your mail (From or Reply-To header) is a valid one. They have more important things to do than to unravel addresses that have been mangled in an attempt to confuse spammers.

5. Additional Notes on Troubleshooting

The following sections supplement the Guide: information available on your system; testing between security gateways; ifconfig reports for KLIPS debugging; using GDB on Pluto.

5.1 Information available on your system

Logs used

Linux FreeS/WAN logs to:

/var/log/secure (or, on Debian, /var/log/auth.log)
/var/log/messages

Check both places to get full information. If you find nothing, check your syslogd.conf(5) to see where your /etc/syslog.conf or equivalent is directing authpriv messages.

man pages provided

ipsec.conf(5): Manual page for IPSEC configuration file.
ipsec(8): Primary man page for ipsec utilities.

Other man pages are on this list and in

/usr/local/man/man3
/usr/local/man/man5
/usr/local/man/man8/ipsec_*

Status information

ipsec auto --status: Command to get status report from running system. Displays Pluto's state. Includes the list of connections which are currently "added" to Pluto's internal database; lists state objects reflecting ISAKMP and IPsec SAs being negotiated or installed.
ipsec look: Brief status info.
ipsec barf: Copious debugging info.

5.2 Testing between security gateways

Sometimes you need to test a subnet-subnet tunnel. This is a tunnel between two security gateways, which protects traffic on behalf of the subnets behind these gateways. On this network:

     Sunset==========West------------------East=========Sunrise
                     IPSec gateway         IPSec gateway
           local net       untrusted net       local net

you might name this tunnel sunset-sunrise. You can test this tunnel by having a machine behind one gateway ping a machine behind the other gateway, but this is not always convenient or even possible.

Simply pinging one gateway from the other is not useful. Such a ping does not normally go through the tunnel. The tunnel handles traffic between the two protected subnets, not between the gateways . Depending on the routing in place, a ping might

either succeed by finding an unencrypted route
or fail by finding no route. Packets without an IPSEC eroute are discarded.

Neither event tells you anything about the tunnel. You can explicitly create an eroute to force such packets through the tunnel, or you can create additional tunnels as described in our configuration document, but those may be unnecessary complications in your situation.

The trick is to explicitly test between both gateways' private-side IP addresses. Since the private-side interfaces are on the protected subnets, the resulting packets do go via the tunnel. Use either ping -I or traceroute -i, both of which allow you to specify a source interface. (Note: unsupported on older Linuxes). The same principles apply for a road warrior (or other) case where only one end of your tunnel is a subnet.

5.3 ifconfig reports for KLIPS debugging

When diagnosing problems using ifconfig statistics, you may wonder what type of activity increments a particular counter for an ipsecN device. Here's an index, posted by KLIPS developer Richard Guy Briggs:

Here is a catalogue of the types of errors that can occur for which
statistics are kept when transmitting and receiving packets via klips.
I notice that they are not necessarily logged in the right counter.
. . .

Sources of ifconfig statistics for ipsec devices

rx-errors:
- packet handed to ipsec_rcv that is not an ipsec packet.
- ipsec packet with payload length not modulo 4.
- ipsec packet with bad authenticator length.
- incoming packet with no SA.
- replayed packet.
- incoming authentication failed.
- got esp packet with length not modulo 8.

tx_dropped:
- cannot process ip_options.
- packet ttl expired.
- packet with no eroute.
- eroute with no SA.
- cannot allocate sk_buff.
- cannot allocate kernel memory.
- sk_buff internal error.


The standard counters are:

struct enet_statistics
{
        int        rx_packets;                /* total packets received */
        int        tx_packets;                /* total packets transmitted */
        int        rx_errors;                /* bad packets received */
        int        tx_errors;                /* packet transmit problems */
        int        rx_dropped;                /* no space in linux buffers */
        int        tx_dropped;                /* no space available in linux */
        int        multicast;                /* multicast packets received */
        int        collisions;

        /* detailed rx_errors: */
        int        rx_length_errors;
        int        rx_over_errors;                /* receiver ring buff overflow */
        int        rx_crc_errors;                /* recved pkt with crc error */
        int        rx_frame_errors;        /* recv'd frame alignment error */
        int        rx_fifo_errors;                /* recv'r fifo overrun */
        int        rx_missed_errors;        /* receiver missed packet */

        /* detailed tx_errors */
        int        tx_aborted_errors;
        int        tx_carrier_errors;
        int        tx_fifo_errors;
        int        tx_heartbeat_errors;
        int        tx_window_errors;
};

of which I think only the first 6 are useful.

5.4 Using GDB on Pluto

You may need to use the GNU debugger, gdb(1), on Pluto. This should be necessary only in unusual cases, for example if you encounter a problem which the Pluto developer cannot readily reproduce or if you are modifying Pluto.

Here are the Pluto developer's suggestions for doing this:

Can you get a core dump and use gdb to find out what Pluto was doing
when it died?

To get a core dump, you will have to set dumpdir to point to a
suitable directory (see ipsec.conf(5)).

To get gdb to tell you interesting stuff:
        $ script
        $ cd dump-directory-you-chose
        $ gdb /usr/local/lib/ipsec/pluto core
        (gdb) where
        (gdb) quit
        $ exit

The resulting output will have been captured by the script command in
a file called "typescript".  Send it to the list.

Do not delete the core file.  I may need to ask you to print out some
more relevant stuff.

Note that the dumpdir parameter takes effect only when the IPsec subsystem is restarted -- on reboot or ipsec setup restart.

Problem	Status	Action
`ipsec` not on-path		Add `/usr/local/sbin` to your PATH.
Missing KLIPS support	critical	See this FAQ.
No RSA private key		Follow these instructions to create an RSA key pair for your host. RSA keys are: required for opportunistic encryption, and our preferred method to authenticate pre-configured connections.
`pluto` not running	critical	service ipsec start
No port 500 hole	critical	Open port 500 for IKE negotiation.
Port 500 check N/A		Check that port 500 is open for IKE negotiation.
Failed DNS checks		Opportunistic encryption requires information from DNS. To set this up, see our instructions.
No public IP address		Check that the interface which you want to protect with IPSec is up and running.