Linux FreeS/WAN Troubleshooting

mis-configuration of IPSEC system in the /etc/ipsec.conf file
e.g. incorrect interface or next hop information
mis-configuration of manual connection in the /etc/ipsec.conf file
routing problems causing IPSEC packets to be lost
bugs in KLIPS
mismatch between the transforms we support and those another IPSEC implementation offers.

Interoperation problems

The IPSEC RFCs are complex and include a number of optional features. There is considerable opportunity for even two correct, standard-conforming, implementations to disagree on details in a way that blocks interoperation. Of course, misinterpretations of the standards and implementation or configuration errors on either end can also foul things up.

That said, FreeS/WAN interoperates successfully with many other implementations. There is a list in another of our documents.

Known areas where problems may appear are:

The RFCs define two modes for IKE negotiations -- main mode and agressive mode. Agressive mode is slightly faster, but reveals more information to an eavesdropper. FreeS/WAN does not implement aggressive mode, so any negotiation another implementation tries that way will fail.
FreeS/WAN does not implement single DES because DES is insecure. Suggestions on what to do if the device you want to talk to has only DES are in our FAQ document.
For automatic keying, the FreeS/WAN default is to provide perfect forward secrecy. We see no reason not to; this is more secure and costs little. Some other implementations, however, turn PFS off by default.

Dropped packets

The firewall ate my packets!

If firewalls filter out:

either the UDP port 500 packets used in IKE negotiations
or the ESP and AH (protocols 50 and 51) packets used to implement the IPSEC tunnel

then IPSEC cannot work. The first thing to check if packets seem to be vanishing is the firewall rules on the two gateway machines and any other machines along the path that you have access to.

For details, see our document on firewalls.

Small packets work, but large transfers fail

If tests with ping(1) and a small packet size succeed, but tests or transfers with larger packet sizes fail, suspect problems with path MTU discovery.

IPSEC makes packets larger by adding an ESP or AH header. This can tickle assorted bugs in path MTU discovery mechanisms and cause a variety of annoying symptoms. Here is one example of a discussion of this problem off the mailing list:

Date: Mon, 3 Apr 2000
From: "Michael H. Warfield" <mhw@wittsend.com>

Paul Koning wrote:

>  Chris>  It appears that the Osicom router discards IP
>  Chris> fragments...

> Amazing.  A device that discards fragments isn't a router, it's at
> best a boat anchor.

        It may not be exactly what it appears.  I ran into a similar problem
with an ISDN link a while ago giving similar symptoms.  Turned out that
the device was negotiating an MTU that it really couldn't handle and the
device in front of it (a Linux box with always defragment enabled) was
defragmenting the huge IPSec datagrams and then refragmenting them into
hunks that the ISDN PPP thought it could handle but couldn't.  Problem was
solved by manually capping the MTU on the ISDN link to a smaller value.

        I gave the FreeSwan guys a hard time until tracking it down since
FreeSwan was the only thing that appeared to be able to tickle the bug.
Nothing else seemed to be broken.  What it really was that MTU discovery
was avoiding the problem on normal links and it was only the IPSEC tunnels
that were creating huge datagrams that went through the defragment/refragment
process.

        Point here is that it "appeared" as though the ISDN link was
failing to handle fragments when it was really a configuration error and
a software bug resulting in a bad MTU that was really the culprit.  So
it may not be that the router is not handling fragments.  It may be that
it's missconfigured or has some other bug that only FreeSwan is tripping
over.

>       paul

        Mike

Dropped connections

Networks being what they are, IPSEC connections can be broken for any number of reasons, ranging from hardware failures to various software problems such as the path MTU problems discussed above. Fortunately, various diagnostic tools exist that help you sort many of the possible problems.

There is one situation, however, where FreeS/WAN may destroy a perfectly good connection for no readily apparent reason. This occurs when things are misconfigured so that two tunnels from the same gateway expect the same subnet on the far end.

In this situation, the first tunnel comes up fine and works until the second is established. At that point, because of the way we track connections internally, the first tunnel ceases to exist as far as this gateway is concerned. Of course the far end does not know that and a storm of error messages appears on both systems as it tries to use the tunnel.

If the far end gives up, goes back to square one and negotiates a new tunnel, then that wipes out the second tunnel and ...

The solution is simple. Do not build multiple conn descriptions with the same remote subnet.

This is actually intended to be a feature, rather than a bug. Consider the situation where a single remote system goes down, then comes back up and reconnects to the gateway. It is useful to have the gateway tear down the old tunnel and recover resources when the reconnection is made. It recognises that situation by checking the remote subnet for each tunnel it builds and discarding duplicates. This works fine as long as you don't configure multiple tunnels with the same remote subnet.

Interoperability problems

Systems that want to use DES

Linux FreeS/WAN does not support DES transforms. Neither Pluto's IKE connections nor KLIPS' IPSEC connections can use DES. Since DES is insecure we do not, and will not at any future time, provide it.

DES is a mandatory part of the IPSEC standard. Despite that, we will not implement DES. We believe it is more important to provide security than to comply with a standard which has been subverted into allowing weak algorithms.

Some implementations may offer DES as the default. In such cases we urge you to change them to Triple DES. If this is not possible, for example because export laws prevent your vendor from offerring you adequate crytography, we urge you to complain vigorously to

your government
the vendor
the embassy of the nation whose laws are problematic for you.

In the meanwhile, use FreeS/WAN to get strong crypto until the laws are fixed.

FreeS/WAN does have DES code in it as a sort of historical accident, since we need it to implement our default (currently, our only) block cipher, Triple DES. However, since DES is insecure, we do not provide any interface to that code and, as a matter of policy, will provide no help to anyone who may wish to use it.

Pluto problem hints

From a message posted to the mailing list Jan 14 2000 by Pluto developer Hugh Redelmeier:

Until ipsec auto and whack/pluto get fixed:

        When puzzled by Pluto behaviour, always look in
        /var/log/secure -- that's the unadulterated story.

        To get the whole whack output (almost a subset of
        the story from Pluto), give auto the --verbose flag
        on each invocation.  Eg:
                ipsec auto --verbose --up sadaisy


Bonus hint: problems snowball.  So look for the first problem first,
it is likely to be the cause of later problems.

And a final hint: If one side keeps retrying to no avail, it may be
because the other is unhappy about something and won't reply.  Go look
at the other side to figure out what it doesn't like.

Pluto error "no acceptable transform" message

From another of Hugh's messages:

Background:

When one IKE system (for example, Pluto) is negotiating with another
to create an SA, the Initiator proposes a bunch of choices and the
Responder replies with one that it has selected.

The structure of the choices is fairly complicated.  An SA payload
contains a list of lists of "Proposals".  The outer list is a set of
choices: the selection must be from one element of this list.

Each of these elements is a list of Proposals.  A selection must be
made from each of the elements of the inner list.  In other words,
*all* of them apply (that is how, for example, both AH and ESP can
apply at once).

Within each of these Proposals is a list of Transforms.  For each
Proposal selected, one Transform must be selected (in other words,
each Proposal provides a choice of Transforms).

Each Transform is made up of a list of Attributes describing, well,
attributes.  Such as lifetime of the SA.  Such as algorithm to be
used.  All the Attributes apply to a Transform.

You will have noticed a pattern here: layers alternate between being
disjunctions ("or") and conjunctions ("and").

For Phase 1 / Main Mode (negotiating an ISAKMP SA), this structure is
cut back.  There must be exactly one Proposal.  So this degenerates to
a list of Transforms, one of which must be chosen.

In your case, no proposal was considered acceptable to Pluto (the
Responder).  So negotiation ceased.  Pluto logs the reason it rejects
each Transform.  So look back in the log to see what is going wrong.

Connection names in Pluto error messages

From another of Hugh's messages:

| Jan 17 16:21:10 remus Pluto[13631]: "jumble" #1: responding to Main Mode from Road Warrior 130.205.82.46
| Jan 17 16:21:11 remus Pluto[13631]: "jumble" #1: no suitable connection for peer @banshee.wittsend.com
| 
|     The connection "jumble" has nothing to do with the incoming
| connection requests, which were meant for the connection "banshee".

You are right.  The message tells you which Connection Pluto is
currently using, which need not be the right one.  It need not be the
right one now for the negotiation to eventually succeed!  This is
described in ipsec_pluto(8) in the section "Road Warrior Support".

There are two times when Pluto will consider switching Connections for
a state object.  Both are in response to receiving ID payloads (one in
Phase 1 / Main Mode and one in Phase 2 / Quick Mode).  The second is
not unique to Road Warriors.  In fact, neither is the first any more
(two connections for the same pair of hosts could differ in Phase 1 ID
payload; probably nobody else has tried this).

ECONNREFUSED error message

From John Denker, on the mailing list:

1)  The log message
  some IKE message we sent has been rejected with 
  ECONNREFUSED (kernel supplied no details)
is much more suitable than the previous version.  Thanks.

2) Minor suggestion for further improvement: it might be worth mentioning
that the command
  tcpdump -i eth1 icmp[0] != 8 and icmp[0] != 0
is useful for tracking down the details in question.  We shouldn't expect
all IPsec users to figure that out on their own.  The log message might
even provide a hint as to where to look in the docs.

Reply From Pluto developer Hugh Redelmeier

Good idea.

I've added a bit pluto(8)'s BUGS section along these lines.
I didn't have the heart to lengthen this message.

Information available on your system

man pages provided

ipsec.conf(5): Manual page for IPSEC configuration file.
ipsec(8): Primary man page for ipsec utilities.

Other man pages are on this list and in

/usr/local/man/man3
/usr/local/man/man5
/usr/local/man/man8/ipsec_*

Status information

/proc/net/ipsec*: Various files reporting the status of IPSEC.
ipsec auto --status: Command to get status report from running system. Displays Pluto's state: the list of "added" conns and the list of state objects reflecting ISAKMP and IPsec SAs being negotiated or installed.
ipsec look: Brief status info.
ipsec barf: Copious debugging info.

ifconfig reports for KLIPS debugging

From a mail message from our KLIPS developer:

Here is a catalogue of the types of errors that can occur for which
statistics are kept when transmitting and receiving packets via klips.
I notice that they are not necessarily logged in the right counter.
. . .

Sources of ifconfig statistics for ipsec devices

rx-errors:
- packet handed to ipsec_rcv that is not an ipsec packet.
- ipsec packet with payload length not modulo 4.
- ipsec packet with bad authenticator length.
- incoming packet with no SA.
- replayed packet.
- incoming authentication failed.
- got esp packet with length not modulo 8.

tx_dropped:
- cannot process ip_options.
- packet ttl expired.
- packet with no eroute.
- eroute with no SA.
- cannot allocate sk_buff.
- cannot allocate kernel memory.
- sk_buff internal error.


The standard counters are:

struct enet_statistics
{
	int	rx_packets;		/* total packets received */
	int	tx_packets;		/* total packets transmitted */
	int	rx_errors;		/* bad packets received */
	int	tx_errors;		/* packet transmit problems */
	int	rx_dropped;		/* no space in linux buffers */
	int	tx_dropped;		/* no space available in linux */
	int	multicast;		/* multicast packets received */
	int	collisions;

	/* detailed rx_errors: */
	int	rx_length_errors;
	int	rx_over_errors;		/* receiver ring buff overflow */
	int	rx_crc_errors;		/* recved pkt with crc error */
	int	rx_frame_errors;	/* recv'd frame alignment error */
	int	rx_fifo_errors;		/* recv'r fifo overrun */
	int	rx_missed_errors;	/* receiver missed packet */

	/* detailed tx_errors */
	int	tx_aborted_errors;
	int	tx_carrier_errors;
	int	tx_fifo_errors;
	int	tx_heartbeat_errors;
	int	tx_window_errors;
};

of which I think only the first 6 are useful.

Testing between security gateways

Sometimes you need to test the tunnel between two security gateways. This can be done by having a machine behind one gateway ping a machine behind the other gateway, but this is not always convenient or even possible.

Simply pinging one gateway from the other is not useful. Such a ping does not normally go through the tunnel. The tunnel handles trafiic between the two protected subnets, not between the gateways. Depending on the routing in place, a ping might

either succeed by finding an unencrypted route
or fail by finding no route. Packets without an IPSEC eroute are discarded.

Neither event tells you anything about the tunnel. You can explicitly create an eroute to force such packets through the tunnel, or you can create additional tunnels as described in our configuration document, but those may be an unnecessary complications in your situation.

The trick is to explicitly use an IP address for the subnet-side interface of one gateway machine, either as the target of a ping or as the origin of a traceroute. Since that interface is on the protected subnet, the resulting packets do go via the tunnel.

From the mailing list:

>; > ;I have two gateways, SG1 and SG2, with I/Fs i and e (for internal and
>; > ;external), and two hosts, H1 and H2 set up as:
>; > ;
>; > ;     H1-----(i)SG1(e)===========(e)SG2(i)------H2
>; > ;
>; > ;And I want to test a tunnel set up between the H1 subnet and the H2
>; > ;subnet, but the H2 host may not exist yet, or may not be responding.
>; > ;
>; > ;If I ping SG2i from H1, all traffic in both directions is encrypted,
>; > ;testing the tunnel.
.....
>; > ;If I understand correctly, this could be accomplished by the 'ping -I'
>; > ;feature of which you spoke earlier or 'traceroute -i'?
>; 
>; Indeed, 
>;   traceroute -i eth0 -f 20 otherSG 
>; appears to give me a solution using only N machines, the SGs themselves.
>; This is very nice.  Note that in this example, eth0 is the *private* (i)
>; interface.  If you try it with the (e) interface or the ipsec0 interface,
>; you won't get the desired result.  If you leave off the -f 20, the trace
>; will hang in some totally bizarre way.

Linux does not seem to support ping -I as some other Unix versions do, according to mailing list comments. However traceroute as described above does work.

Click below to go to:

Document index file
Beginning of this file
FreeS/WAN home page