Xen network configuration and multiple VLANs

Xen networking is powerful enough to allow for extreme customization. Although the default networking configuration is usually more than enough for simple scenarios, it can fall short when trying to support multiple guests standing on different VLANs.

In this short article, I describe the steps needed to configure Xen to attach itself to multiple VLANs using a one-bridge-per-VLAN network interface mapping, then attaching each Xen domainU on as many VLANs as needed.

In the sample scenario, we will use a Cisco Catalyst 3560G-24TS switch carrying traffic from five different VLANs:

  • VLAN2 is the administrative VLAN used to administer all the networking gear and boxes.
  • VLAN10 carries Internet traffic coming from the first ISP.
  • VLAN20 carries Internet traffic coming from the second ISP.
  • VLAN100 carries the access network traffic.
  • VLAN200 carries the core network traffic.

The final Xen configuration will provide five bridging network interfaces, one per VLAN. Each Xen domainU can freely attach to any of these bridging network interfaces in order to gain access to the traffic being carried by each VLAN.

The bridging interface, |brname| is named after the following convention: xenbr|vlan|:

  • xenbr2 is the bridging interface standing on VLAN2.
  • xenbr10 is the bridging interface standing on VLAN10.
  • xenbr20 is the bridging interface standing on VLAN20.
  • xenbr100 is the bridging interface standing on VLAN100.
  • xenbr200 is the bridging interface standing on VLAN200.

Also, Xen creates an manages several virtual network interfaces, named in the form of vif|X|.|Y|, where |X| equals the Xen domain numeric ID and |Y| is a sequential interface index. Thus, starting up a Xen domainU given the following virtual network interface definition:

vif = [ 'mac=00:16:3e:00:00:44, bridge=xenbr10',
        'mac=00:16:e3:00:00:45, bridge=xenbr20' ]

Will cause the Xen domain to get assigned, let’s say, a domain ID of 2, and two virtual network interfaces named vif2.0 — attached to xenbr10 — and vif2.1 — attached to xenbr20.

Setting up the bridging interfaces:

This can be done manually, by invoking brctl addbr |brname| in order to create a new bridging interface.

For example, the following commands will create five bridging interfaces, one for each supported VLAN:

brctl addbr xenbr2
brctl addbr xenbr10
brctl addbr xenbr20
brctl addbr xenbr100
brctl addbr xenbr200

or else can be automated to get done during system startup, by creating a file named /etc/sysconfig/network-scripts/ifcfg-|brname|, where |brname| is the name assigned to the bridging interface, like /etc/sysconfig/network-scripts/ifcfg-xenbr2 (the configuration file for the bridging interface standing on VLAN2):

DEVICE=xenbr2
BOOTPROTO=static
IPADDR=192.168.0.10
NETMASK=255.255.0.0
ONBOOT=yes
TYPE=Bridge

Setting up the VLAN interfaces and add them up to the existing bridging interfaces:

This can be done manually, by invoking vconfig add |ifname| |vlan| to configure VLAN number |vlan| by using 802.1q tagging on interface |ifname|. This will active a virtual interface named |ifname|.|vlan|:

  • Any traffic sent to this interface will get tagged for VLAN |vlan|.
  • Any traffic received from interface |ifname| carrying an 802.1q VLAN tag matching |vlan| will be untagged and received by this interface.
vconfig add eth0 2
vconfig add eth0 10
vconfig add eth0 20
vconfig add eth0 100
vconfig add eth0 200

This will add five new VLAN interfaces, one for every supported VLAN.

Once the VLAN interfaces are ready, we add them to their corresponding bridging interfaces by using brctl addif |brname| |ifname|.|vlan|:

brctl addif xenbr2 eth0.2 brctl addif xenbr10 eth0.10 brctl addif xenbr20 eth0.20 brctl addif xenbr100 eth0.100 brctl addif xenbr200 eth0.200

The process of adding up a new VLAN interface and then adding it up to an existing bridging interface can be configured using a single configuration file named ifcfg-|ifname|.|vlan|, like /etc/sysconfig/network-scripts/ifcfg-eth0.2:

DEVICE=eth0.2 BOOTPROTO=none ONBOOT=yes TYPE=Ethernet VLAN=yes BRIDGE=xenbr2

Keeping Xen from reconfiguring the network:

Since we have already configured the network manually, we don’t want Xen to mess up with the configuration. In order to keep Xen from reconfiguring the network, simply make sure none of the following lines appear uncommented in the file /etc/xen/xend-config.sxp:

(network-script network-bridge)
(network-script network-route)
(network-script network-nat)

Additional notes:

I have been experiencing a very strange behavior on Xen domainU guests while using this network configuration: it seems that UDP traffic gets stuck at the network stack and does not flow through unless I load the ip_conntrack.ko kernel module.

Failing to load the ip_conntrack.ko kernel module, even with an unconfigured, empty firewall, allows ICMP and TCP traffic to flow from and to the guest network stack, but UDP traffic, like DNS queries, gets stuck and doesn’t even touch the physical network interface.

This is really strange, isn’t it?

Linksys, OpenWRT and multiple VLANs

The Cisco Linksys WRT54G/GS/GL is made up of a six-port configurable switch, a standard Ethernet controller (usually a Broadcom controller named eth0) and a Wireless controller (usually a Broadcom controller named eth1).

The following diagram tries to illustrate the different components that made up the Cisco Linksys and how are they interconnected:

                                            Linksys rear
 Trunk    Internet    1     2     3     4   port number
  ---        ---     ---   ---   ---   ---
  |5|        |4|     |3|   |2|   |1|   |0|  switch port number
  ---        ---     ---   ---   ---   ---
  |           |       |                 |
  |         vlan1     |----- vlan0 -----|
  |
  | Miniswitch
  ----------------------------------------
  | Linux
  |
  |           ---- vlan0 -> LAN
  |           |
  |----- eth0 -
              |
              ---- vlan1 -> Internet/WAN

The standard Ethernet controller is attached to the sixth port (port #5) of the switch and is configured as a 802.1q VLAN trunk port. This allows running several VLANs using a single connection to the switch.

By default, OpenWRT configures two per-VLAN network interfaces:

  • vlan0:

    stands on the VLAN0 (the Local Area Network which comprises the four ports labeled as 1, 2, 3 and 4 at the rear of the box).

  • vlan1:

    stands on the WAN network (the port labeled Internet at the rear of the box).

The VLAN configuration is controlled using NVRAM variables. The variable labeled vlan0ports defines which switch ports are assigned onto the VLAN0, while vlan1ports defines which switch ports are assigned onto the VLAN1.

This is the default NVRAM configuration:

nvram set vlan0ports="3 2 1 0 5*"
nvram set vlan0hwname=et0
nvram set vlan1ports="4 5"
nvram set vlan1hwname=et0
  • vlan0ports:

    states that ports #3, #2, #1 and #0 (the ports labeled as 1, 2, 3 and 4 at the rear of the box) are assigned onto VLAN0. Additionally, port #5 is also assigned onto VLAN0.

    The asterisk sitting besides the 5 means VLAN0 is the default, native VLAN for this port, so any untagged traffic is considered to belong to VLAN0.

  • vlan1ports:

    states that port #4 (the port labeled as Internet at the rear of the box) is assigned onto VLAN1. Additionally, port #5 is also assigned onto VLAN1 since it’s a trunk port.

    The lack of an asterisk means VLAN1 is not the default, native VLAN for this port.

NOTE: vlannhwname needs to have a value assigned to it, even when it’s value is never used by the init scripts. This value is usually et0.

NOTE: Care must be exercised as ports numbers are zero-based, as illustrated before, and the sixth-port (port #5) must be assigned to every VLAN, since it is an VLAN trunk port.

The following code snippet from /etc/init.d/S10boot shows how the init script tells the switch which ports are onto which VLANs:

# configure the switch based on nvram
[ -d /proc/switch/eth0 ] &ports)"
    [ -z "$vp" -o -z "$(nvram get vlan${nr}hwname)" ] || {
        echo "$vp" > /proc/switch/eth0/vlan/$nr/ports
    }
  done
}

We can also see that up to sixteen VLANs are supported by the switch.

Custom VLANs

The Linksys and OpenWRT combination is so flexible that we can configure additional VLANs. In fact, I was looking to add an additional administrative VLAN (VLAN2) granting me full access to the box while I could restrict access from the LAN and WAN to the minimum — for example, by using additional firewall rules.

This is depicted in the following figure:

                                            Linksys rear
 Trunk    Internet    1     2     3     4   port number
  ---        ---     ---   ---   ---   ---
  |5|        |4|     |3|   |2|   |1|   |0|  switch port number
  ---        ---     ---   ---   ---   ---
  |           |       |     |           |
  |         vlan1   vlan2   |-- vlan0 --|
  |
  | Linksys
  ----------------------------------------
  | Linux
  |
  |           ---- vlan0 -> LAN
  |           |
  |----- eth0 ---- vlan1 -> Internet/WAN
              |
              ---- vlan2 -> Administrative VLAN

To achieve this configuration, we need to remove port #3 (labeled as 1 at the rear of the box) from VLAN0 and assign it onto VLAN2. We also need to add port #5 to the VLAN2 since it is the VLAN trunk port used to carry the traffic from the switch to Linux through the standard Ethernet controller:

nvram set vlan0ports="2 1 0 5*"
nvram set vlan0hwname=et0
nvram set vlan1ports="4 5"
nvram set vlan1hwname=et0
nvram set vlan2ports="3 5"
nvram set vlan2hwname=et0

I’ve defined three custom NVRAM variables that will get used by an additional init script to configure the VLAN2 as an administrative VLAN, granting full access to the box:

  • adm_ifname:

    defines the Linux network interface name assigned to the administrative VLAN, in the form of vlann, where n is the VLAN number.

  • adm_ipaddr:

    defines the IP address for the administrative interface.

  • adm_netmask:

    defines the network mask for the administrative interface.

For example:

nvram set adm_ifname=vlan2
nvram set adm_ipaddr=192.168.0.100
nvram set adm_netmask=255.255.0.0

I’ve also coded up an additional init script, named /etc/init.d/S41network, used to bring up the administrative interface. I’ve decided not to fiddle with /etc/init.d/S40network to avoid breaking things and having problems during upgrades.

These are the contents of /etc/init.d/S41network:

#!/bin/sh
IFNAME=$(nvram get adm_ifname)
VLAN=${IFNAME##vlan}
IPADDR=$(nvram get adm_ipaddr)
NETMASK=$(nvram get adm_netmask)
vconfig add eth0 $VLAN
ifconfig vlan${VLAN} up ${IPADDR} netmask ${NETMASK}

Testing

To test this custom configuration, I recommend disabling the firewall, my removing the executable permission bit from /etc/init.d/S45firewall and /etc/init.d/S41network just to prevent being locked out from the box in case problems arise.

Firewalling

I’ve also replaced the firewalling init script, /etc/init.d/S45firewall, with my own version. This allows for a fine-grained and thighter configuration.

Since the box will act as a routing firewall, and since it has 3 VLANs, I wanted to apply the following policy:

  • Any traffic coming from or going to the administrative VLAN (VLAN2) is allowed:

    This rule allows administering the box from a computer attached to the VLAN2, while blocking administrative access from other VLANs.

  • Incoming ICMP Echo Requests and ICMP Time Exceeded control messages are allowed from any interface:

    This rule allows certain ICMP control messages to reach the box. ICMP Echo Request is needed in order for the box to respond to ping and ICMP Time Exceeded (TTL) so we don’t break the PMTU discovery algorithm.

  • Any other incoming traffic from the LAN is rejected:

    This rule rejects any other traffic which does not match previous rules. Traffic is explicitly rejected, so we avoid having clients blocked waiting for an RST TCP segment.

  • Any other incoming traffic from the WAN is dropped:

    This rule silently drops any traffic coming from the WAN which does not match any previous rule. This will make external scan attacks much slower.

  • Local DNS queries coming from the local box going to configured DNS servers are allowed:

    This rule allows the local machine to resolve DNS queries sent against configured DNS servers (those configured in the wan_dns NVRAM variable). This is rarely needed, but the ipkg command requires a working DNS name resolution.

  • HTTP traffic from the local machine to the WAN is allowed:

    This rule allows upgrading and installing packages using the ipkg command.

  • Outgoing ICMP Echo Requests and ICMP Time Exceeded control messages are allowed from any interface:

    This rule allows certain ICMP control messages to depart from the box. ICMP Echo Request is needed in order for the box to invoke ping and ICMP Time Exceeded (TTL) so we don’t break the PMTU discovery algorithm.

  • Forwarding SSH/NX traffic coming from WAN to the designated SSH/NX server in the LAN:

    This rule allows accesing the SSH/NX traffic from the WAN. In addition, I apply SNAT to make IP datagrams appear to come from the firewall box since I have multiple DSL links.

  • Forwarding HTTP and HTTP/S traffic coming from the LAN targeted to the WAN:

    This rule allows using HTTP and HTTP/S services from the LAN.

  • DNS queries coming from the LAN going to configured DNS servers are allowed:

    This rule allows the machines in the LAN to resolve DNS queries sent against configured DNS servers (those configured in the wan_dns NVRAM variable).

  • Forwarding ICMP Echo Requests coming from the LAN to the WAN:

    This allows pinging external hosts from the LAN. ICMP Time Exceeded, however, is not forwarded, since the firewall sits in the middle between the LAN and the WAN (and I do use SNAT and DNAT).

Here is the complete /etc/init.d/S45firewall script:

#!/bin/sh
IPTABLES=/usr/sbin/iptables
FW_INET_IFACE=$(nvram get wan_ifname)
FW_INET_IP=$(nvram get wan_ipaddr)
FW_PRIVATE_IFACE=$(nvram get lan_ifname)
FW_PRIVATE_IP=$(nvram get lan_ipaddr)
FW_ADM_IFACE=$(nvram get adm_ifname)
NX_IP=10.200.0.10

$IPTABLES -F
$IPTABLES -t nat -F

# Configure SNAT/DNAT/MASQUERADE
$IPTABLES -t nat -A PREROUTING -i ${FW_INET_IFACE} -p tcp 
                               -d ${FW_INET_IP} --dport 179 
                               -j DNAT --to-destination ${NX_IP}:22
$IPTABLES -t nat -A POSTROUTING -o ${FW_PRIVATE_IFACE} -p tcp 
                                -d ${NX_IP} --dport 22 
                                -j SNAT --to-source ${FW_PRIVATE_IP}
$IPTABLES -t nat -A POSTROUTING -o ${FW_INET_IFACE} -j MASQUERADE

# Configure input firewall filtering:
# Allow:
#   - Traffic flowing from the loopback interface
#   - Traffic coming from the administrative VLAN
#   - ICMP Echo Request coming from WAN
#   - ICMP Time Exceeded (TTL) coming from WAN
#   - Traffic from an already established or related connection
# Block:
#   - Any traffic coming from the WAN
# Reject:
#   - Any other traffic coming from the LAN
$IPTABLES -A INPUT -i lo -j ACCEPT
$IPTABLES -A INPUT -i ${FW_ADM_IFACE} -j ACCEPT
$IPTABLES -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A INPUT -p icmp --icmp-type echo-request -j ACCEPT
$IPTABLES -A INPUT -p icmp --icmp-type time-exceeded -j ACCEPT
$IPTABLES -A INPUT -i ${FW_INET_IFACE} -j DROP
$IPTABLES -A INPUT -j REJECT

# Configure output firewall filtering:
# Allow:
#   - Traffic flowing to the loopback interface
#   - HTTP traffic
#   - ICMP Echo Request going to WAN
#   - ICMP Time Exceeded (TTL) going to WAN
#   - DNS queries to configured WAN name servers
#   - Traffic from an already established or related connection
# Reject:
#   - Any other traffic
$IPTABLES -A OUTPUT -o lo -j ACCEPT
$IPTABLES -A OUTPUT -o ${FW_INET_IFACE} -p tcp -m tcp 
                     --dport 80 -m state --state NEW -j ACCEPT
$IPTABLES -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A OUTPUT -p icmp --icmp-type echo-request -j ACCEPT
$IPTABLES -A OUTPUT -p icmp --icmp-type time-exceeded -j ACCEPT
for ns in $(nvram get wan_dns); do
        $IPTABLES -A OUTPUT -o ${FW_INET_IFACE} -p udp -m udp 
                            -d "$ns" --dport 53 -j ACCEPT
        $IPTABLES -A OUTPUT -o ${FW_INET_IFACE} -p tcp -m tcp 
                            -d "$ns" --dport 53 -j ACCEPT
done
$IPTABLES -A OUTPUT -j REJECT

# Configure forward firewall filtering:
# Allow:
#   - Incoming SSH/NX traffic -> the filtering takes place after the
#     PREROUTING chain has been processed and, since DNAT has been already
#     being performed, the traffic is filtered accordingly to its final
#     destination (the SSH/NX server)
#   - Outgoing DNS queries to configured WAN name servers
#   - Outgoing HTTP and HTTP/S traffic
#   - ICMP Echo Request coming from LAN going to WAN
#   - Trafic from an already established or related connection
# Drop:
#   - Any other traffic
$IPTABLES -A FORWARD -i ${FW_INET_IFACE} -o ${FW_PRIVATE_IFACE} -p tcp -m tcp 
                     -d ${NX_IP} --dport 22 -m state --state NEW -j ACCEPT
$IPTABLES -A FORWARD -i ${FW_PRIVATE_IFACE} -o ${FW_INET_IFACE} -p tcp -m tcp 
                     --dport 80 -m state --state NEW -j ACCEPT
$IPTABLES -A FORWARD -i ${FW_PRIVATE_IFACE} -o ${FW_INET_IFACE} -p tcp -m tcp 
                     --dport 443 -m state --state NEW -j ACCEPT
$IPTABLES -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A FORWARD -i ${FW_PRIVATE_IFACE} -o ${FW_INET_IFACE} 
                     -p icmp --icmp-type echo-request -j ACCEPT
for ns in $(nvram get wan_dns); do
        $IPTABLES -A FORWARD -i ${FW_PRIVATE_IFACE} -o ${FW_INET_IFACE} 
                             -p udp -m udp -d "$ns" --dport 53 -j ACCEPT
        $IPTABLES -A FORWARD -i ${FW_PRIVATE_IFACE} -o ${FW_INET_IFACE} 
                             -p tcp -m tcp -d "$ns" --dport 53 -j ACCEPT
done
$IPTABLES -A FORWARD -j DROP