Monday, November 17, 2008

Observe Loopback and Inter-Zone IP Packets With OpenSolaris

I'm happy to announce that the IP Observability Devices component of the Clearview project has integrated into OpenSolaris build 103 (also see Phil Kirk's announcement to the ON community).  This adds the following new capabilities to OpenSolaris:

  • Network observability at the IP layer for traditional DLPI-based tools such as snoop
  • Observability of loopback IP packets
  • Observability of inter-zone IP packets
  • Tools such as snoop can be run from within a non-global zone to observe packets associated with that zone
  • Snoop filtering based on zone id

The snoop command has grown a new "-I <interface-name>" option to access this feature.  Its semantics are to snoop the IP interface named <interface-name> at the IP layer.  When observing a particular IP interface with this facility, packets that have a source or destination IP address assigned to that interface can be observed, as well as packets that are forwarded to or from that IP interface, and broadcast and multicast packets received by that interface.  Additional internal filtering is performed to ensure that an observer from a non-global zone can only see packets that belong to that zone, with the exception of the global zone, from which packets to or from any zone that shares its stack can be observed.  Any IP interface visible through "ifconfig -a" can be observed using this feature.

We are also working towards integrating support for these IP Observability Devices into Wireshark and tcpdump in the near future.


Here are some examples using snoop:

Example 1: Observing the Loopback Interface


bash-3.2# snoop -I lo0
Using device ipnet/lo0 (promiscuous mode)
localhost -> localhost    ICMP Echo request (ID: 37110 Sequence number: 0)
localhost -> localhost    ICMP Echo reply (ID: 37110 Sequence number: 0)

The lo0 interface has the 127.0.0.1 address assigned to it, and so any communication using the address 127.0.0.1 is seen above (in this case, I was simply doing "ping 127.0.0.1").  Snoop's verbose output mode displays a new "ipnet" header that precedes all IP packets observed:

bash-3.2# snoop -v -I lo0
Using device ipnet/lo0 (promiscuous mode)
IPNET:  ----- IPNET Header -----
IPNET: 
IPNET:  Packet 1 arrived at 10:40:33.68506
IPNET:  Packet size = 108 bytes
IPNET:  dli_version = 1
IPNET:  dli_type = 4
IPNET:  dli_srczone = 0
IPNET:  dli_dstzone = 0
IPNET: 
...

Note above that the source and destination zone ids are displayed.  In this case, I was running "ping 127.0.0.1" in the global zone, and so both the source and destination zone ids are "0".


Example 2: Running Snoop From a Non-Global Zone


bash-3.2# zoneadm list -v
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
4 test             running    /zones/test                    native   shared
bash-3.2# zlogin test
[Connected to zone 'test' pts/2]
...
bash-3.2# ifconfig -a
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 10.8.57.34 netmask ffffff00 broadcast 10.8.57.255
lo0:1: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128
bge0:2: flags=202000841<UP,RUNNING,MULTICAST,IPv6,CoS> mtu 1500 index 2
        inet6 2002:a08:39f0:1::f/64
bash-3.2# snoop -I bge0
Using device ipnet/bge0 (promiscuous mode)
whitestar1-2.East.Sun.COM -> mf-ubur-01.East.Sun.COM DNS C 253.57.8.10.in-addr.arpa. Internet PTR ?
mf-ubur-01.East.Sun.COM -> whitestar1-2.East.Sun.COM DNS R 2.0.0.224.in-addr.arpa. Internet PTR ALL-ROUTERS.MCAST.NET.
whitestar1-6.East.Sun.COM -> whitestar1-2.East.Sun.COM TCP D=22 S=62117 Syn Seq=195630514 Len=0 Win=49152 Options=<mss
whitestar1-2.East.Sun.COM -> whitestar1-6.East.Sun.COM TCP D=62117 S=22 Syn Ack=195630515 Seq=195794440 Len=0 Win=49152
whitestar1-6.East.Sun.COM -> whitestar1-2.East.Sun.COM TCP D=22 S=62117 Ack=195794441 Seq=195630515 Len=0 Win=49152
whitestar1-2.East.Sun.COM -> whitestar1-6.East.Sun.COM TCP D=62117 S=22 Push Ack=195630515 Seq=195794441 Len=20 Win=491

Although not evident from the snoop output above, whitestar1-2 is 10.8.57.34 (the bge0:1 IP address in this non-global zone), and whitestar1-6 is actually an IP address in another zone on the same system.  By snooping the bge0 interface, the user sees all packets associated with the bge0 IP addresses in the zone; even those that are locally delivered to other zones.  Using snoop's verbose output mode allows us to see which zones these packets are flowing between:

bash-3.2# snoop -v -I bge0 whitestar1-6
Using device ipnet/bge0 (promiscuous mode)
IPNET:  ----- IPNET Header -----
IPNET: 
IPNET:  Packet 1 arrived at 10:44:10.86739
IPNET:  Packet size = 76 bytes
IPNET:  dli_version = 1
IPNET:  dli_type = 4
IPNET:  dli_srczone = 0
IPNET:  dli_dstzone = 4
IPNET: 
...

We can see above that the packet was from the global zone to the test zone.

Example 3: Filtering by Zone ID


Filtering by zone id can be useful on a system that has multiple zones.  In this example, an administrator in the global zone observes packets being sent to or from IP addresses in the "test" zone.

bash-3.2# zoneadm list -v
ID NAME             STATUS     PATH                           BRAND    IP
0 global           running    /                              native   shared
4 test             running    /zones/test                    native   shared
bash-3.2# snoop -I bge0 zone 4
Using device ipnet/bge0 (promiscuous mode)
whitestar1-6.East.Sun.COM -> whitestar1-2.East.Sun.COM TCP D=22 S=61658 Syn Seq=374055417 Len=0 Win=49152 Options=<mss
whitestar1-2.East.Sun.COM -> whitestar1-6.East.Sun.COM TCP D=61658 S=22 Syn Ack=374055418 Seq=374124525 Len=0 Win=49152
whitestar1-6.East.Sun.COM -> whitestar1-2.East.Sun.COM TCP D=22 S=61658 Ack=374124526 Seq=374055418 Len=0 Win=49152

This can be particularly useful with the loopback interface, as the 127.0.0.1 address is shared among all shared-stack zones, and it can be difficult to associate a loopback packet to an application in a zone.

Note that there is a pending RFE to also be able to enter a zone name as well as a zone id as the argument to the snoop "zone" filtering primitive.  For now, the zone id is the only allowable argument.

Thursday, June 5, 2008

Clearview Vanity Naming BigAdmin Article

I expanded upon one of my previous blog entries on network datalink vanity naming in OpenSolaris into a more thorough article with more examples.  The result is the following BigAdmin article:

http://www.sun.com/bigadmin/sundocs/articles/vnamingsol.jsp

Enjoy.

Friday, May 30, 2008

Maybe Some Ice Cream With That OpenSolaris

Well, the pickles and beer in the refrigerator were not enough to bribe my Ferrari into installing OpenSolaris.  Maybe some Ice Cream will coax it into behaving better.  Luckily, the cleaning people empty out the freezer on the last Friday of the month at 2:00pm (which is today!), leaving plenty of room for...




Props to Will Young for claiming to have done something like this first. ;-)

Not Too Much Mustard on That Ferrari Please

My Acer Ferrari 3400 cannot go through an OpenSolaris installation without overheating and powering itself down.  Because OpenSolaris has no power management for this laptop, the CPU runs at 100% clock rate 100% of the time, which isn't a problem for other OSs.

Luckily, the Ferrari has no problems sharing a cramped space with mustard, pickes, left-over Chinese food, and a beer.  Ferrari 3400, meet OpenSolaris 2008.05:


Friday, March 28, 2008

Configuring an OpenSolaris 6to4 router

A common problem in enterprise networks is that many IT departments have not begun to deploy IPv6 within their supported infrastructure, but developers need IPv6 networking in order to develop and test products which support IPv6.  6to4 (defined in RFC 3056) can be a quick way to obtain IPv6 connectivity between IPv6 nodes separated by IPv4 networks such as this.  The general idea is that each 6to4 site has a 6to4 router which is responsible for automatically tunneling IPv6 packets from its site to other 6to4 routers in other 6to4 sites (or native IPv6 networks with the use of relay routers[1]) over IPv4.  6to4, then, can often be the answer for such developers, where configuring a 6to4 router in a lab environment or in a small subnet within an enterprise network is very easy and addresses their basic IPv6 connectivity requirements.

OpenSolaris[2] can be used as a 6to4 router, and I've received so many requests for basic instructions on how to configure a 6to4 router with OpenSolaris, that I've decided to write a short blog entry on the subject.  Note that while this blog may come in handy, there is in fact official Sun documentation on 6to4 routing [3] which may be even more useful.
The following instructions configure a persistent configuration which will be enabled after a reboot of the system.  All of this can also be configured similarly on the running system, but it is simpler to give one set of instructions.  Experienced administrators will surely know how to interpret these instructions to apply configuration to the running system, and that's left as an exercise to the reader.
  1. Enable IPv6 on one of the physical interfaces of the 6to4 router:
    touch /etc/hostname6.<intf>
    Where <intf> is the interface in question (e.g., e1000g0).

  2. Configure a 6to4 tunneling interface on the 6to4 router:
    echo "tsrc <v4addr> up" > /etc/hostname6.ip.6to4tun0
    Where <v4addr> is the IPv4 address of the 6to4 router.

  3. Enable IPv6 forwarding on the 6to4 router:
    routeadm -e ipv6-forwarding

  4. Reboot the system.  When the system comes back up, it will have an IPv6 interface name ip.6to4tun0 which will have an address like 2002:<hex-v4addr>::1 [4].  The "2002:<hex-v4addr>::" part is the 48-bit 6to4 site-prefix for your 6to4 site.  All IPv6 nodes in the site that use this 6to4 router must share this common prefix, although it needs to be further sub-divided within each IPv6 subnet in the site in order to be useful (that's what the remaining 16 bits of the /64 prefix are for).  For example, if the site consists of a single IPv6 subnet, then it's easy enough to create a single "2002:<hex-v4addr>:1::/64" prefix by following the following remaining steps.

  5. Enable IPv6 router advertisements on the 6to4 router so that IPv6 hosts on the subnet automatically configure their IPv6 addresses and use this router as their default router:
    cat << EOF > /etc/inet/ndpd.conf
    ifdefault AdvSendAdvertisements 1
    prefix 2002:<hex-v4addr>:1::/64 <intf>
    EOF
    
    Where <hex-v4addr> is the same as the <hex-v4addr> displayed in step 4, and <intf> is the physical interface attached to the IPv6 subnet in question.  The ":1" following <hex-v4addr> is important, as this is the 16-bit subnet-id for the prefix being advertised.  It uniquely identifies this /64 prefix from other prefixes in the site, which all share a common /48.  The subnet-id must be non-zero (because the 0 subnet-id was allocated to the 6to4 router's ip.6to4tun0 interface) and unique within the site, so it doesn't necessarily need to be "1".

    If the 6to4 router is attached to more than one subnet, then there would be additional "prefix" entries in the ndpd.conf file above, one for each interface.  Each prefix would then have its own unique 16-bit subnet id.

  6. Restart the neighbor discovery daemon for the changes to take effect.
    svcadm restart routing/ndp
At this point, hosts which have IPv6 enabled in the link connected to the 6to4 router's <intf> interface will automatically
configure IPv6 addresses based on the advertised prefix, and will have a
default route to the 6to4 router.  All packets destined off-link to other
6to4 sites will be tunneled to the remote 6to4 routers.
<shameless plug>Of course, when the Clearview IP Tunneling Device Driver component delivers to Nevada, one will be able to use dladm(1M) to create a 6to4 tunnel with a meaningful name, and to observe packets in the 6to4 tunnel using snoop(1M), wireshark, or other such tools.</shameless plug>


[1] I'm skipping discussing relay routers for various reasons which I won't go into here.
[2] In fact, Solaris starting with Solaris 9.
[3] Look for 6to4.  Within this documentation, there are also instructions on how to configure 6to4 on Solaris, similar to this blog entry.

[4] The 2002::/16 prefix is the "magic" 6to4 prefix that allows 6to4 routers to tunnel to one another.  The 32 bits that follow these initial 16 bits is an IPv4 address.  It is the IPv4 address of the 6to4 router which is responsible for the automatic IPv6 tunneling of packets for its 6to4 site.  For example, when a 6to4 router needs to tunnel an IPv6 packet with a destination of 2002:0a01:0203:1::1, it will know to automatically encapsulate this IPv6 packet in an IPv4 header with a destination of 10.1.2.3 (the IPv4 address of the remote 6to4 router).

Tuesday, January 29, 2008

Using New Networking Features in OpenSolaris

The Nemo Unification and Vanity Naming component of project Clearview has integrated into OpenSolaris build 83, which (among other things) allows administrators to give meaningful names to network datalink interfaces, including VLAN interfaces.  I thought I'd share how I used this feature on one of our lab routers here in Sun.

The system has four Ethernet NICs, but needs to be the router for 8 separate lab subnets.  The aggregate bandwidth of four Gigabit pipes is plenty for all of the lab subnets combined, so it wasn't really worthwhile to go and add four more NICs to the system (plus, that's not really scalable).  Instead, I created a single link aggregation (802.3ad) including all four Ethernet links, and created individual tagged VLAN interfaces (one for each of the 8 subnets) on top of this aggregation.

Step by step, here's what I did.  Keep in mind that this is done using a nightly build of OpenSolaris from after January 24th 2008.  Here was the list of datalinks on the system before I started changing things (bonus points for anyone who can tell me what kind of system I'm doing this on based on the devices listed below) :-) :

bash-3.2# dladm show-link
LINK CLASS MTU STATE OVER
nge0 phys 1500 up --
nge1 phys 1500 up --
e1000g0 phys 1500 up --
e1000g1 phys 1500 up --
bash-3.2# dladm show-phys
LINK MEDIA STATE SPEED DUPLEX DEVICE
nge0 Ethernet up 1000Mb full nge0
nge1 Ethernet up 1000Mb full nge1
e1000g0 Ethernet up 1000Mb full e1000g0
e1000g1 Ethernet up 1000Mb full e1000g1

First, I unplumbed all IP interfaces on each of these links by issuing appropriate "ifconfig <intf> unplumb" commands.  This was necessary since renaming datalinks requires that no IP interfaces be plumbed above them.  I then gave each of these interfaces more generic names.  The benefit of doing this is that if we replace the Ethernet cards in the future with cards of a different chip set, we won't have to change the interface names associated with that card (one of the big benefits of Clearview UV vanity naming).

bash-3.2# dladm rename-link nge0 eth0
bash-3.2# dladm rename-link nge1 eth1
bash-3.2# dladm rename-link e1000g0 eth2
bash-3.2# dladm rename-link e1000g1 eth3
LINK CLASS MTU STATE OVER
eth0 phys 1500 up --
eth1 phys 1500 up --
eth2 phys 1500 up --
eth3 phys 1500 up --
bash-3.2# dladm show-phys
LINK MEDIA STATE SPEED DUPLEX DEVICE
eth0 Ethernet up 1000Mb full nge0
eth1 Ethernet up 1000Mb full nge1
eth2 Ethernet up 1000Mb full e1000g0
eth3 Ethernet up 1000Mb full e1000g1

Then I created a link aggregation using these four Ethernet links:

bash-3.2# dladm create-aggr -P L2,L3 -l eth0 -l eth1 -l eth2 -l eth3 default0

I named the link "default0" because this is the main untagged subnet for the lab network, and the network to which the default route points.  Now the set of links looks like:

bash-3.2# dladm show-link
LINK CLASS MTU STATE OVER
eth0 phys 1500 up --
eth1 phys 1500 up --
eth2 phys 1500 up --
eth3 phys 1500 up --
default0 aggr 1500 up eth0 eth1 eth2 eth3

The next step was to create the VLAN links on top of this aggregation.  Our lab subnets have a color-coded naming scheme, which I used when naming the VLAN links.  This is convenient when diagnosing network problems with particular systems, as our DNS naming uses a paralell scheme.  For example, if a system's hostname is blue-98, I know to do my network snooping on the "blue" link.  Creating the VLAN links was as simple as:

bash-3.2# dladm create-vlan -v 2 -l default0 orange0
bash-3.2# dladm create-vlan -v 3 -l default0 green0
bash-3.2# dladm create-vlan -v 4 -l default0 blue0
bash-3.2# dladm create-vlan -v 5 -l default0 white0
bash-3.2# dladm create-vlan -v 6 -l default0 yellow0
bash-3.2# dladm create-vlan -v 7 -l default0 red0
bash-3.2# dladm create-vlan -v 8 -l default0 cyan0

There is now one link for each subnet in the lab (one untagged link, and seven tagged VLAN links).

bash-3.2# dladm show-link
LINK CLASS MTU STATE OVER
eth0 phys 1500 up --
eth1 phys 1500 up --
eth2 phys 1500 up --
eth3 phys 1500 up --
default0 aggr 1500 up eth0 eth1 eth2 eth3
orange0 vlan 1500 up default0
green0 vlan 1500 up default0
blue0 vlan 1500 up default0
white0 vlan 1500 up default0
yellow0 vlan 1500 up default0
red0 vlan 1500 up default0
cyan0 vlan 1500 up default0
bash-3.2# dladm show-vlan
LINK VID OVER FLAGS
orange0 2 default0 -----
green0 3 default0 -----
blue0 4 default0 -----
white0 5 default0 -----
yellow0 6 default0 -----
red0 7 default0 -----
cyan0 8 default0 -----

I then plumbed IP interfaces in each subnet.  For example:

bash-3.2# ifconfig orange0 plumb ...
bash-3.2# ifconfig green0 plumb ...
...

Configuring this router also involved configuring IPv4 dynamic routing and forwarding, IPv6 dynamic routing and forwarding, etc.  All of these latter steps involved placing the network interface names in some sort of persistent configuration (like /etc/hostname.<intf>, /etc/inet/ndpd.conf, and IP filter rules to name a few).  This is where giving meaningful names to network interfaces has the most value.  With all of these interface names in various configuration files, we don't want to ever have to go and reconfigure all of those things if the underlying hardware of the system were to change from under them.  Before Clearview UV's vanity naming feature, a VLAN interface above the e1000g1 interface would look something like e1000g80001 (for VLAN tag 8), thanks to the moldy "VLAN PPA-hack".  This is ridiculous enough as an interface name, but what happens when I replace my e1000g1 card with a Broadcom card which has a device name of bge0?  I need to go fetch every piece of configuration on the system that made reference to e1000g1 and e1000g8001, and change everything to bge0 and bge8000.

With Clearview UV's vanity naming feature I could have named the link something meaningful like "private1", and assigned the newly added bge0 card that same name (using the dladm rename-link command I showcased above) to keep all of my network configuration intact.