Known issues with Opportunistic Encryption Claudia Schmeing ------------------------------------------ This is an overview of known issues with OE, current June 2002. This document supplements: FreeS/WAN Quickstart Guide doc/quickstart.html#opp.setup Opportunism HOWTO doc/opportunism.howto Opportunism spec doc/opportunism.spec Internet Draft doc/draft-richardson-ipsec-opportunistic.txt * Use a recent Linux FreeS/WAN to try OE. Pre-1.95, pluto's use of %trap and %hold were not yet ironed out. Best practise: use 1.97 and up. In particular, 1.91 to 1.94 have serious problems with %trap and %hold bugs. These bugs, introduced while coding the support structure for OE, affect both OE and VPN connections. * At the moment, you can only run OE on one physical interface. You can only define one OE connection in ipsec.conf, and that conn will apply to one interface. Normally this will be the public interface which your default route uses; it is, however, configurable. Background: if you were to define two or more OE conns, there is currently no syntax to express when you'd prefer one over the others. Further, Pluto may not know the gateway address early in connection negotiations. Theoretically, it might make sense to select between multiple OE conns based on some criterion, such as address ranges. This might be useful for * "local OE", essentially a low maintenance VPN * a complex routing scenario. At this time, such selection is not on the coders' to-do list. Currently, Pluto expects only one OE connection. If you add another, Pluto may choose randomly between them, producing unpredictable results. * When placing your DNS server within your security perimeter, it is good to put it * on your security gateway * on a machine to which you have a VPN connection Due to Stoopid Routing Tricks, it will not work to have your DNS server on your LAN. References: "Getting DNS through" in opportunism.howto http://lists.freeswan.org/pipermail/design/2002-April/002285.html * There is no good clean facility to delete OE connections. Available are: ipsec auto --status to list connections ipsec whack --deletestate to delete by state#. * Building OE conns between nodes on a LAN is not possible. This is a side effect of conflicts about ARP entries in the rt_cache and our "stupid routing tricks". There is no known workaround at this time. "Stupid routing tricks" are an ongoing issue, and should go away in KLIPS 3, planned for fall 2002. There may also be a solution before this. * When one end restarts, encryption is one-way until rekey. When two machines have an OE link, but one restarts, neither end acts to create a fresh connection until rekey time. Until then, the continuously running machine sends encrypted text to its rebooted peer, while the rebooted machine returns cleartext. This condition can be replicated in a testbed. This is a serious problem; Linux FreeS/WAN should never encrypt only in one direction. Further, the system administrator of the continuously running system has no way, shy of a tcpdump, to know that encrypted packets are being replied to in the clear. Workaround: the _updown script should be altered to block cleartext packets from a host we expect to send encrypted. For a longer explanation see: http://lists.freeswan.org/pipermail/design/2002-May/002582.html http://lists.freeswan.org/pipermail/design/2002-June/002610.html * CNAME lookups can fail. In some instances, Opportunistic Encryption does not work with reverse delegation. A common method of reverse delegation is to use CNAMES. When delegating DNS, a provider uses a CNAME to indicate the domain which is now authoritative for the reverse lookup. Because of errors in BIND implementations, Linux FreeS/WAN may not properly follow a well constructed CNAME trail ending in a KEY. If you were to follow the same trail manually, you would find the correct KEY. When a DNS server queried by Linux FreeS/WAN follows a CNAME, it seems to forget what record type it is looking for, and it returns a PTR, despite the fact that a KEY was requested. We have seen that this affects KEY lookups. We suspect that it affects all non-PTR records, including KEY and TXT. Fix (untested): Set up a DNS locally (within your security perimeter) which includes a fix. We believe the problem is fixed in Bind 9, and if not, we can create a patch to fix a Bind 9 implementation, and submit it to the maintainers. Workaround: Meanwhile, if you are experiencing this bug, you can send your provider KEY and TXT records for direct insertion into the reverse ZONE files, rather than asking your provider to delegate authority using CNAME. People who own IP blocks, rather than leasing them, may not experience this problem. If you were assigned IPs more than five years ago, you may own your IPs. * Failure to look up own host name is a show stopper. If Linux FreeS/WAN can't get the key for your public interface from DNS, it will not keep trying, and you will not be able to do OE. The error message is: May 14 09:40:24 road Pluto[21210]: failure to fetch key for 193.110.157.18 from DNS: failure querying DNS for KEY of 18.157.110.193.in-addr.arpa.: Host name lookup failure Proposed fix: patch Pluto to question the DNS again in a sensible backoff pattern. Workaround: fix the hostname lookup failure and try again. * To make OE operation smoother, we may need a script that runs and warns if we have the reverse DNS records, but not the software running. The reverse records advertise that we can do OE, but when the software is not running this is false advertising. * Consider potential weirdness when trying coaxial OE. If you have coaxial connections (one within another) you should have use of one of these encrypted links, but it is not clear which KLIPS will prefer. In particular, the behaviour may not be symmetrical. Weird but predictable: If you have both a gateway and a host who advertise (via DNS) an ability to do OE you need to be serious about doing host-based OE, or you will be stuck in initiator-only mode. If your host advertises but does not run OE, then when a peer tries to connect to your host, it will fail to clear. The peer will then not try to encrypt traffic bound for that host as it travels to the gateway. To remedy the situation, restart ipsec on the peer (or otherwise flush out the %pass eroute), and ping the peer from your host to initiate OE.