Status bar in “screen”

In order to have screen behave a little bit more like byobu, edit or create (if not present) /etc/screenrc or ~/.screenrc and add the following lines:

autodetach on 
startup_message off 
hardstatus alwayslastline 
shelltitle 'bash'
hardstatus string '%{gk}[%{wk}%?%-Lw%?%{=b kR}(%{W}%n*%f %t%?(%u)%?%{=b kR})%{= w}%?%+Lw%?%? %{g}][%{d}%l%{g}][ %{= w}%Y/%m/%d %0C:%s%a%{g} ]%{W}'

Mac OS X VPN and static routing

Problem

The VPN client built into Mac OS X has a checkbox saying “Send all traffic over VPN connection”. Turning this on causes all traffic to get routed over the VPN. Turning this off means that only the VPN IP block will get routed over the VPN. If there are additional IP networks behind the VPN gateway, they won’t be reachable unless you manually add static routes.

Solution

Mac OS X uses a program called pppd to negotiate a point-to-point connection. pppd is in charge of performing mutual authentication and creating a ppp network interface. pppd is used, at least, by PPTP and L2TP over IPSec VPNs in Mac OS X.

When a PPP connection is established, the pppd program will look for a script named /etc/ppp/ip-up and, if it exists and is executable, will run it. This file does not exist in a default, clean installation of Mac OS X, but it can easily be created and customized to add static routes whenever a VPN connection is established,

When pppd executes this script, it passes several pieces of information onto the command line. The following sample script describes them:

$ cat /etc/ppp/ip-up
#!/bin/sh
#
# This script is called with the following arguments:
#
# $2: VPN interface name (e.g. ppp0)
# $3: 0
# $4: local VPN address (e.g. 10.0.0.1)
# $5: remote VPN gateway (e.g. 10.255.255.0)
# $6: local gateway used to reach the remote VPN gateway
#
# Example:
#
# $ ifconfig ppp0
# ppp0: flags=8051 mtu 1280
#  inet 10.0.0.1 --> 10.255.255.0 netmask 0xfffffc00 

if [ "$5" = "10.255.255.0" ]; then
  # Add static routes to Hetzner OST3 environment
  /sbin/route add -net 192.0.2.0/24 -interface ppp0
  /sbin/route add -net 192.168.253.0/24 -interface ppp0
fi

Debugging MAAS 2.x Ephemeral images

MAAS 2.x relies on Ephemeral images during commissioning of nodes. Basically, an Ephemeral image consists of a kernel, a RAM disk and a squashfs file-system that is booted over the network (PXE) and relies on cloud-init to perform discovery of a node’s hardware (e.g. number of CPUs, RAM, disk, etc.)

There are times that, for some reason, the commissioning process fails and you need to perform some troubleshooting. Typically, the node boots over PXE but cloud-init fails and you are left on the login screen with an non-configured host (e.g. hostname is ‘ubuntu”). But Ephemeral images don’t allow anyone to log in interactively. The solution consists of injecting some backdoor into the Ephemeral image. Such backdoor could be enabling some password for the root user, for example. Next, I will explain how to do this.

Ephemeral images are downloaded from the Internet by the MAAS region controller and synchronized to MAAS rack controllers. These files are kept on disk under:

https://images.maas.io/ephemeral-v3/daily/

Inside this directory, there is a subdirectory named after the Ubuntu release code name (e.g. Xenial):

https://images.maas.io/ephemeral-v3/daily/xenial/amd64/20171011/

Under this, another subdirectory named after the CPU architecture (e.g. AMD64):

https://images.maas.io/ephemeral-v3/daily/xenial/amd64/

And under this, another subdirectory named with some timestamp:

https://images.maas.io/ephemeral-v3/daily/xenial/amd64/20171011/

If you browse this location, you will find something like this:

[DIR]  ga-16.04/                12-Oct-2017 01:57.      -    
[DIR]  hwe-16.04-edge/          12-Oct-2017 01:57       -    
[DIR]  hwe-16.04/               12-Oct-2017 01:57.      -    
[   ]  squashfs                 12-Oct-2017 01:57       156M     
[TXT]  squashfs.manifest        12-Oct-2017 01:57       13K

The squashfs filesystem is shared among different types of kernels/ramdisk combinations (GA which stands for General Availability, HWE or HWE Edge). As mentioned before, these files are downloaded and kept updated in MAAS rack controllers under:

/var/lib/maas/boot-resources/snapshot-20171020-091808/ubuntu/amd64/hwe-16.04-edge/xenial/daily

On-disk layout is different from the Web layout, as each kernel/ramdisk combination has its own subdirectory together with the squashfs filesystem. But let’s no diverge. To introduce a backdoor, such as a password for the root user, let’s do the following:

# cd /var/lib/maas/boot-resources/snapshot-20171020-091808/ubuntu/amd64/hwe-16.04-edge/xenial/daily
# unsquashfs squashfs
# openssl passwd -1 ubuntu
$1$lqVUYmVl$6QatT6qYPVpFo8nbEO4Ve1
# cp -r squashfs-root/etc/passwd squashfs-root/etc/passwd~
# sed 's,^root:x:0:0:root:/root:/bin/bash$,root:$1$lqVUYmVl$6QatT6qYPVpFo8nbEO4Ve1:0:0:root:/root:/bin/bash,g' > squashfs-root/etc/passwd < squashfs-root/etc/passwd~
# cp -r squashfs squashfs.dist
# mksquashfs squashfs-root squashfs -xattrs -comp xz -noappend
# chown maas:maas squashfs

Now that the squashfs image has been unpacked, patched and re-packed, one can try commissioning the node again. If it fails, one can log in interactively as user root and password ubuntu.

How to run Docker inside a Nova/LXD container

I’ve been experimenting with deploying OpenStack using Nova/LXD (instead of Nova/KVM) for quite some time, using conjure-up as the deployment tool. It is simple, easy to set up and use and produces a usable OpenStack cluster.

However, I’ve been unable to run Docker inside a Nova instance (implemented as an LXD container) using an out-of-the-box installation deployed by conjure-up. The underlying reason is that the LXD container where nova-compute is hosted lacks some privileges. Also, inside this nova-compute container Nova/LXD spawns nested LXD containers, one for each Nova instance, which again lack some additional privileges required by Docker.

Short story, you can apply the docker LXD profile to both the nova-compute container and those nested LXD containers inside it where you want to run Docker, and Docker will run fine:

⟫ juju status nova-compute
⟫ juju status nova-compute
Model                         Controller                Cloud/Region         Version    SLA
conjure-openstack-novalx-1d1  conjure-up-localhost-718  localhost/localhost  2.2-beta4  unsupported

App                  Version  Status  Scale  Charm                Store       Rev  OS      Notes
lxd                  2.0.9    active      1  lxd                  jujucharms   10  ubuntu
neutron-openvswitch  10.0.0   active      1  neutron-openvswitch  jujucharms  240  ubuntu
nova-compute         15.0.2   active      1  nova-compute         jujucharms  266  ubuntu

Unit                      Workload  Agent  Machine  Public address  Ports  Message
nova-compute/0*           active    idle   4        10.0.8.61              Unit is ready
  lxd/0*                  active    idle            10.0.8.61              Unit is ready
  neutron-openvswitch/0*  active    idle            10.0.8.61              Unit is ready

Machine  State    DNS        Inst id        Series  AZ  Message
4        started  10.0.8.61  juju-59ffc3-4  xenial      Running
...

From the previous output, notice how the nova-compute/0 unit is running in machine #4, and that the underlying LXD container is named juju-59ffc3-4. Now, let’s see the LXD profiles used by this container:

⟫ lxc info juju-59ffc3-4 | grep Profiles
Profiles: default, juju-conjure-openstack-novalx-1d1

The docker LXD profile is missing from this container, and this will cause that any nested container trying to use Docker will fail. Entering the nova-compute/0 container, we see initially no nested containers. That is, since there are no Nova instances, there are no LXD containers. Remember that when using Nova/LXD, there is a 1:1 mapping between a Nova instance and an LXD container:

⟫ lxc exec juju-59ffc3-4 /bin/bash
root@juju-59ffc3-4:~# lxc list
+------+-------+------+------+------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+-------+------+------+------+-----------+

Let’s spawn a Nova instance for testing:

⟫ juju ssh nova-cloud-controller/0
ubuntu@juju-59ffc3-13:~$ source novarc
ubuntu@juju-59ffc3-13:~$ openstack server create --flavor m1.small --image xenial-lxd --nic net-id=ubuntu-net test1

Now, if we take a look inside the nova-compute/0 container, we will see a nested container:

⟫ juju ssh nova-compute/0
ubuntu@juju-59ffc3-4:~$ sudo -i
root@juju-59ffc3-4:~# lxc list
+-------------------+---------+-------------------+------+------------+-----------+
|       NAME        |  STATE  |       IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+-------------------+---------+-------------------+------+------------+-----------+
| instance-00000001 | RUNNING | 10.101.0.9 (eth0) |      | PERSISTENT | 0         |
+-------------------+---------+-------------------+------+------------+-----------+
root@juju-59ffc3-4:~# lxc info instance-00000001 | grep Profiles
Profiles: instance-00000001

Here one can see that the nested container is using a profile named after the Nova instance. Let’s enter this nested container, install Docker and try to spawn a Docker container:

root@juju-59ffc3-4:~# lxc exec instance-00000001 /bin/bash
root@test1:~# apt-get update
...
root@test1:~# apt-get -y install docker.io
...
root@test1:~# docker run -it ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
b6f892c0043b: Pull complete
55010f332b04: Pull complete
2955fb827c94: Pull complete
3deef3fcbd30: Pull complete
cf9722e506aa: Pull complete
Digest: sha256:382452f82a8bbd34443b2c727650af46aced0f94a44463c62a9848133ecb1aa8
Status: Downloaded newer image for ubuntu:latest
docker: Error response from daemon: containerd: container not started.

Here we can see that Docker was unable to spawn the Docker container.

First thing we are going to try is to add the docker LXD profile to the nested container, the one hosting our Nova instance:

⟫ juju ssh nova-compute/0
ubuntu@juju-59ffc3-4:~$ sudo -i
root@juju-59ffc3-4:~# lxc list
+-------------------+---------+-------------------+------+------------+-----------+
|       NAME        |  STATE  |       IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+-------------------+---------+-------------------+------+------------+-----------+
| instance-00000001 | RUNNING | 10.101.0.5 (eth0) |      | PERSISTENT | 0         |
+-------------------+---------+-------------------+------+------------+-----------+
root@juju-59ffc3-4:~# lxc info instance-00000001 | grep Profiles
Profiles: instance-00000001
root@juju-59ffc3-4:~# lxc profile apply instance-00000001 instance-00000001,docker
Profile instance-00000001,docker applied to instance-00000001

Now, let’s try again to run a Docker container:

root@juju-59ffc3-4:~# lxc exec instance-00000001 /bin/bash
root@test1:~# docker run -it ubuntu /bin/bash
root@7fc441a9b0a5:/# uname -r
4.10.0-21-generic
root@7fc441a9b0a5:/#

But this, besides being a manual process, it is not elegant. There’s another solution which requires no operation intervention. It consists of a Python code patch to the Nova/LXD driver that allows selectively adding additional LXD profiles to Nova containers:

$ juju ssh nova-compute/0
ubuntu@juju-59ffc3-4:~$ sudo -i
root@juju-59ffc3-4:~# patch -d/ -p0 << EOF
--- /usr/lib/python2.7/dist-packages/nova_lxd/nova/virt/lxd/config.py.orig      2017-06-07 19:41:47.685278274 +0000
+++ /usr/lib/python2.7/dist-packages/nova_lxd/nova/virt/lxd/config.py   2017-06-07 19:42:58.891624467 +0000
@@ -56,11 +56,17 @@
         instance_name = instance.name
         try:
 
+            # Profiles to be applied to the container
+            profiles = [str(instance.name)]
+            lxd_profiles = instance.flavor.extra_specs.get('lxd:profiles')
+            if lxd_profiles:
+                profiles += lxd_profiles.split(',')
+
             # Fetch the container configuration from the current nova
             # instance object
             container_config = {
                 'name': instance_name,
-                'profiles': [str(instance.name)],
+                'profiles': profiles,
                 'source': self.get_container_source(instance),
                 'devices': {}
             }
EOF
root@juju-59ffc3-4:~# service nova-compute restart

Now, let’s create a new flavor named docker with the extra spec to include the docker LXD profile to all instances that rely on this flavor:

⟫ juju ssh nova-cloud-controller/0
ubuntu@juju-59ffc3-13:~$ source novarc
ubuntu@juju-59ffc3-13:~$ openstack flavor create --disk 20 --vcpus 2 --ram 1024 docker
ubuntu@juju-59ffc3-13:~$ openstack flavor set --property lxd:profiles=docker docker
ubuntu@juju-59ffc3-13:~$ openstack server create --flavor docker --image xenial-lxd --nic net-id=ubuntu-net test2

Then, inside the nova-compute container:

⟫ juju ssh nova-compute/0
ubuntu@juju-59ffc3-4:~$ sudo -i
root@juju-59ffc3-4:~# lxc list
+-------------------+---------+--------------------------------+------+------------+-----------+
|       NAME        |  STATE  |              IPV4              | IPV6 |    TYPE    | SNAPSHOTS |
+-------------------+---------+--------------------------------+------+------------+-----------+
| instance-00000001 | RUNNING | 172.17.0.1 (docker0)           |      | PERSISTENT | 0         |
|                   |         | 10.101.0.9 (eth0)              |      |            |           |
+-------------------+---------+--------------------------------+------+------------+-----------+
| instance-00000003 | RUNNING | 10.101.0.8 (eth0)              |      | PERSISTENT | 0         |
+-------------------+---------+--------------------------------+------+------------+-----------+
root@juju-59ffc3-4:~# lxc info instance-00000003 | grep Profiles
Profiles: instance-00000003, docker
root@juju-59ffc3-4:~# lxc exec instance-00000003 /bin/bash
root@test2:~# apt-get update
...
root@test2:~# apt-get -y install docker.io
...
root@test2:~# docker run -it ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
b6f892c0043b: Pull complete
55010f332b04: Pull complete
2955fb827c94: Pull complete
3deef3fcbd30: Pull complete
cf9722e506aa: Pull complete
Digest: sha256:382452f82a8bbd34443b2c727650af46aced0f94a44463c62a9848133ecb1aa8
Status: Downloaded newer image for ubuntu:latest
root@fd74cfa04876:/# uname -r
4.10.0-21-generic
root@fd74cfa04876:/#

So, that’s it. With this small patch, which enables support for the lxd:profiles extra spec, it is easier to allow Docker to run inside Nova instances hosted in LXD containers.

Juju and apt-cacher

I’ve been playing quite a lot lately with Juju and other related software projects, like conjure-up and LXD. They make so easy to spin up and down complex software stacks like OpenStack that you don’t even realize until your hosting provider start alerting you of high traffic consumption. And guess where most of this traffic usage comes from? From installing packages.

So I decided to save on bandwidth by using apt-cacher. It is straightforward and easy to set up and getting it running. In the end, if you follow the steps described in the previous link or this, you will end up with a Perl program listening on your machine in port 3142 that you can use as an Apt cache.

For Juju, one can use a YAML configuration file like this:

apt-http-proxy: http://localhost:3142
apt-https-proxy: http://localhost:3142

Then bootstrap Juju using the following command:

$ juju bootstrap --config config.yaml localhost lxd

For conjure-up is also very easy:

$ conjure-up \
    --apt-proxy http://localhost:3142 \
    --apt-https-proxy http://localhost:3142 \
    ...

OpenStack Newton and LXD

Background

This post is about deploying a minimal OpenStack newton cluster atop LXD on a single machine. Most of what is mentioned here is based on OpenStack on LXD.

Introduction

The rationale behind using LXD is simplicity and feasibility: it doesn’t require more than one x86_64 server with 8 CPU cores, 64GB of RAM and a SSD drive large enough to perform an all-in-one deployment of OpenStack Newton.

According to Canonical, “LXD is a pure-container hypervisor that runs unmodified Linux guest operating systems with VM-style operations at incredible speed and density.”. Instead of using pure virtual machines to run the different OpenStack components, LXD is used which allows for higher “machine” (container) density. In practice, an LXD container behaves pretty much like a virtual or baremetal machine.

For all purposes, I will be using Ubuntu 16.04.02 for this experiment on a 128GB machine with 12 CPU cores and 4x240GB SSD drives configured using software RAID0. For increased performance and efficiency ZFS is also used (dedicated partition separate from the base OS) as a backing store for LXD.

Preparation

$ sudo add-apt-repository ppa:juju/devel
$ sudo add-apt-repository ppa:ubuntu-lxc/lxd-stable
$ sudo apt update
$ sudo apt install \
    juju lxd zfsutils-linux squid-deb-proxy \
    python-novaclient python-keystoneclient \
    python-glanceclient python-neutronclient \
    python-openstackclient curl
$ git clone https://github.com/falfaro/openstack-on-lxd.git

It is important to run all the following commands inside the openstack-on-lxd directory where the Git repository has been cloned locally.

LXD set up
$ sudo lxd init

The relevant part here is the network configuration. IPv6 is not properly supported by Juju so make sure to not enable. For IPv4 use the 10.0.8.0/24 subnet and assign the 10.0.8.1 IPv4 address for LXD itself. The DHCP range could be something like 10.0.8.2 to 10.0.8.200.

NOTE: Having LXD listen on the network is also an option for remotely managing LXD, but beware of security issues when exposing it over a public network. Using ZFS (or btrfs) should also increase performance and efficiency (e.g. copy-on-write shall save disk space by prevent duplicate bits from all the containers running the same base image).

Using an MTU of 9000 for container interfaces will likely increase performance:

$ lxc profile device set default eth0 mtu 9000

Next step is to spawn an LXC container for testing purposes:

$ lxc launch ubuntu-daily:xenial openstack
$ lxc exec openstack bash
# exit

An specific LXC profile named juju-default will be used when deploying OpenStack. In particular this profile allows for nesting LXD (required by nova-compute), allows running privileged containers, and preloads certain kernel modules required inside OpenStack containers.

$ lxc profile create juju-default 2>/dev/null || \
  echo "juju-default profile already exists"
$ cat lxd-profile.yaml | lxc profile edit juju-default
Bootstrap Juju controller
$ juju bootstrap --config config.yaml localhost lxd
Deploy OpenStack
$ juju deploy bundle-newton-novalxd.yaml
$ watch juju status
Testing

After Juju has finished deploying OpenStack, make sure there is a file named novarc in the current directory. This file is required to be sourced in order to use the OpenStack CLI:

$ source novarc
$ openstack catalog list
$ nova service-list
$ neutron agent-list
$ cinder service-list

Create Nova flavors:

$ openstack flavor create --public \
    --ram   512 --disk  1 --ephemeral  0 --vcpus 1 m1.tiny
$ openstack flavor create --public \
    --ram  1024 --disk 20 --ephemeral 40 --vcpus 1 m1.small
$ openstack flavor create --public \
    --ram  2048 --disk 40 --ephemeral 40 --vcpus 2 m1.medium
$ openstack flavor create --public \
    --ram  8192 --disk 40 --ephemeral 40 --vcpus 4 m1.large
$ openstack flavor create --public \
    --ram 16384 --disk 80 --ephemeral 40 --vcpus 8 m1.xlarge

Add the typical SSH key:

$ openstack keypair create --public-key ~/.ssh/id_rsa.pub mykey

Create a Neutron external network and a virtual network for testing:

$ ./neutron-ext-net \
    -g 10.0.8.1 -c 10.0.8.0/24 \
    -f 10.0.8.201:10.0.8.254 ext_net
$ ./neutron-tenant-net \
    -t admin -r provider-router \
    -N 10.0.8.1 internal 192.168.20.0/24

CAVEAT: Nova/LXD does not support use of QCOW2 images in Glance. Instead one has to use RAW images. For example:

$ curl http://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-root.tar.gz | \
  glance image-create --name xenial --disk-format raw --container-format bare

Then:

$ openstack server create \
    --image xenial --flavor m1.tiny --key-name mykey --wait \
    --nic net-id=$(neutron net-list | grep internal | awk '{ print $2 }') \
    openstack-on-lxd-ftw

NOTE: For reasons I yet do not understand, one can’t use a flavor other than m1.tiny. Reason is that this flavor is the only one that does not request any ephemeral disk. As soon as ephemeral disk is requested, the LXD subsystem inside the nova-compute container will complain with the following error:

$ juju ssh nova-compute/0
$ sudo tail -f /var/log/nova/nova-compute.log
...
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2078, in _build_resources
    yield resources
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1920, in _build_and_run_instance
    block_device_info=block_device_info)
  File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 317, in spawn
    self._add_ephemeral(block_device_info, lxd_config, instance)
  File "/usr/lib/python2.7/dist-packages/nova/virt/lxd/driver.py", line 1069, in _add_ephemeral
    raise exception.NovaException(reason)
NovaException: Unsupport LXD storage detected. Supported storage drivers are zfs and btrfs.

If Cinder is available, create a test Cinder volume:

$ cinder create --name testvolume 10

Persistent loopback interfaces in Mac OS X

One of the things that I miss in Mac OS X is support for multiple loopback addresses. Not just 127.0.0.1, but anything in the form 127.* (e.g. 127.0.1.1 or 127.0.0.2).

To add an additional IPv4 address to the loopback interface, one can use the following command in a Mac OS X terminal:

$ sudo ifconfig lo0 alias 127.0.1.1

Problems is that this doesn’t persist across reboots. To make it persist across reboots, one can create a “launchd” daemon that configures this additional IPv4 address. Something like this:

$ cat << EOF | sudo tee -a /Library/LaunchDaemons/com.felipe-alfaro.loopback1.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
    <key>Label</key>
    <string>com.felipe-alfaro.loopback1</string>
    <key>ProgramArguments</key>
    <array>
        <string>/sbin/ifconfig</string>
        <string>lo0</string>
        <string>alias</string>
        <string>127.0.1.1</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
  </dict>
</plist>
EOF

Then, start the service up:

$ sudo launchctl load /Library/LaunchDaemons/com.felipe-alfaro.loopback1.plist

And make sure it did work:

$ sudo launchctl list | grep com.felipe-alfaro
-   0   com.felipe-alfaro.loopback1
$ ifconfig lo0
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
    options=1203<RXCSUM,TXCSUM,TXSTATUS,SW_TIMESTAMP>
    inet 127.0.0.1 netmask 0xff000000 
    inet6 ::1 prefixlen 128 
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
    inet 127.0.1.1 netmask 0xff000000 
    nd6 options=201<PERFORMNUD,DAD>

And, for the record, it is totally possible to have multiple services like the one before, each one for every additional IPv4 address. Just make sure to name the .plist files differently as well as the services name in the Label tag.

Default user in WSL

The Windows Subsystem for Linux (WSL) defaults to running as the “root” user. In order to change that behavior, just create a Linux user.Let’s imagine this user is named “jdoe”. To have WSL start the session as “jdoe” instead of “root”, just run the following command from a “cmd.exe” window:

C:\Users\JohnDoe> lxrun /setdefaultuser jdoe

Take into account that running any running WSL will be killed inmmediately.

QPID and OpenStack

If you are still using QPID in your OpenStack deployment, be careful with the QPID topology version used. It seems some components in Havana default to version 2 while others in Icehouse default to 1.

To avoid problems, perhaps you want to explicitly set the following configuration option in files like /etc/nova/nova.conf:

qpid_topology_version=1

How PKI-based tokens from Keystone are authenticated

This article tries to explain how tokens generated by Keystone (using the PKI token format, not UUID) can be authenticated by clients (e.g. cinder, neutron, nova, etc.)

The relevant fragment from /etc/keystone/keystone.conf that specifies the PKI material used to sign Keystone tokens (the signing key, the signing certificate and its corresponding CA certificate, together with key size and key expiration period) usually looks like this (default values are used next):

[signing]
token_format = PKI
certfile = /etc/keystone/ssl/certs/signing_cert.pem
keyfile = /etc/keystone/ssl/private/signing_key.pem
ca_certs = /etc/keystone/ssl/certs/ca.pem
cert_subject = /C=US/ST=Unset/L=Unset/O=Unset/CN=www.example.com
key_size = 2048
valid_days = 3650

The Keystone client middleware — implemented in the keystone client.middleware.auth_token Python module — verifies the signature of a given Keystone token (data is in IAW CMS syntax). The actual method from this module is cms_verify. This method relies on its counterpart cms_verify defined in keystoneclient.common.cms and requires the actual data, the signing certificate and corresponding CA certificate.

The token’s data, signing certificate and its corresponding CA certificate are stored on local disk, inside a directory specified by the signing_dir option in the keystone_authtoken section. By default, this option is set to None. When None or absent, a temporary directory is created, as one can see in the verify_signing_dir method:

def verify_signing_dir(self):
    if os.path.exists(self.signing_dirname):
        if not os.access(self.signing_dirname, os.W_OK):
            raise ConfigurationError(
                'unable to access signing_dir %s' % self.signing_dirname)
        uid = os.getuid()
        if os.stat(self.signing_dirname).st_uid != uid:
            self.LOG.warning(
                'signing_dir is not owned by %s', uid)
        current_mode = stat.S_IMODE(os.stat(self.signing_dirname).st_mode)
        if current_mode != stat.S_IRWXU:
            self.LOG.warning(
                'signing_dir mode is %s instead of %s',
                oct(current_mode), oct(stat.S_IRWXU))
    else:
        os.makedirs(self.signing_dirname, stat.S_IRWXU)

When debug is True for any particular OpenStack service, one can see the value of the signing_dir option during startup in the logs:

2015-04-15 19:03:25.069 9449 DEBUG glance.common.config [-] keystone_authtoken.signing_dir = None log_opt_values /usr/lib/python2.6/site-packages/oslo/config/cfg.py:1953

The signing certificate and its corresponding CA certificate are retrieved from Keystone via an HTTP request, and stored on local disk. The methods that implement this in keystone client.middleware.auth_token look like this:

def _fetch_cert_file(self, cert_file_name, cert_type):
    path = '/v2.0/certificates/' + cert_type
    response = self._http_request('GET', path)
    if response.status_code != 200:
        raise exceptions.CertificateConfigError(response.text)
    self._atomic_write_to_signing_dir(cert_file_name, response.text)

def fetch_signing_cert(self):
    self._fetch_cert_file(self.signing_cert_file_name, 'signing')

def fetch_ca_cert(self):
    self._fetch_cert_file(self.signing_ca_file_name, 'ca')

Which translates to HTTP requests to Keystone like this:

2015-04-15 19:03:34.704 9462 DEBUG urllib3.connectionpool [-] "GET /v2.0/certificates/signing HTTP/1.1" 200 4251 _make_request /usr/lib/python2.6/site-packages/urllib3/connectionpool.py:295
2015-04-15 19:03:34.727 9462 DEBUG urllib3.connectionpool [-] "GET /v2.0/certificates/ca HTTP/1.1" 200 1277 _make_request /usr/lib/python2.6/site-packages/urllib3/connectionpool.py:295

As said before, in order to verify the Keystone token, the cms_verify method uses the signing certificate and corresponding CA certificates (as stored on local disk) plus the token data, and passes them to an external openssl process for verification:

def cms_verify(self, data):
    """Verifies the signature of the provided data's IAW CMS syntax.

    If either of the certificate files are missing, fetch them and
    retry.
    """
    while True:
        try:
            output = cms.cms_verify(data, self.signing_cert_file_name,
                                    self.signing_ca_file_name)
        except exceptions.CertificateConfigError as err:
            if self.cert_file_missing(err.output,
                                      self.signing_cert_file_name):
                self.fetch_signing_cert()
                continue
            if self.cert_file_missing(err.output,
                                      self.signing_ca_file_name):
                self.fetch_ca_cert()
                continue
            self.LOG.error('CMS Verify output: %s', err.output)
            raise
...

This translates to having the Keystone middleware spawning a process to run an openssl command to validate the input (the Keystone token). Something like:

openssl cms -verify -certfile /tmp/keystone-signing-OFShms/signing_cert.pem -CAfile /tmp/keystone-signing-OFShms/cacert.pem -inform PEM -nosmimecap -nodetach -nocerts -noattr << EOF
-----BEGIN CMS-----
MIIBxgYJKoZIhvcNAQcCoIIBtzCCAbMCAQExCTAHBgUrDgMCGjAeBgkqhkiG9w0B
BwGgEQQPeyJyZXZva2VkIjogW119MYIBgTCCAX0CAQEwXDBXMQswCQYDVQQGEwJV
UzEOMAwGA1UECAwFVW5zZXQxDjAMBgNVBAcMBVVuc2V0MQ4wDAYDVQQKDAVVbnNl
dDEYMBYGA1UEAwwPd3d3LmV4YW1wbGUuY29tAgEBMAcGBSsOAwIaMA0GCSqGSIb3
DQEBAQUABIIBABzCPXw9Kv49gArUWpAOWPsK8WRRnt6WS9gMaACvkllQs8vHEN11
nLBFGmO/dSTQdyXR/gQU4TuohsJfnYdh9rr/lrC3sVp1pCO0TH/GKmf4Lp1axrQO
c/gZym7qCpFKDNv8mAAHIbGFWvBa8H8J+sos/jC/RQYDbX++7TgPTCZdCbLlzglh
jKZko07P86o3k14Hq6o7VGpMGu9EjOziM6uOg391yylCVbqRazwoSszKm29s/LHH
dyvEc+RM9iRaNNTiP5Sa/bU3Oo25Ke6cleTcTqIdBaw+H5C1XakCkhpw3f8z0GkY
h0CAN2plwwqkT8xPYavBLjccOz6Hl3MrjSU=
-----END CMS-----
EOF

One has to pay attention to the purposes of the signing certificate. If its purposes are wrong, tokens generated by Keystone won’t be validated by Keystone clients (middleware). This is reflected in the logs with an error message that typically looks like this:

2015-04-15 18:52:13.027 29533 WARNING keystoneclient.middleware.auth_token [-] Verify error: Command 'openssl' returned non-zero exit status 4
2015-04-15 18:52:13.027 29533 DEBUG keystoneclient.middleware.auth_token [-] Token validation failure. _validate_user_token /usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py:836
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token Traceback (most recent call last):
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 823, in _validate_user_token
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token verified = self.verify_signed_token(user_token)
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 1258, in verify_signed_token
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token if self.is_signed_token_revoked(signed_text):
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 1216, in is_signed_token_revoked
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token revocation_list = self.token_revocation_list
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 1312, in token_revocation_list
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token self.token_revocation_list = self.fetch_revocation_list()
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 1358, in fetch_revocation_list
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token return self.cms_verify(data['signed'])
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 1239, in cms_verify
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token self.signing_ca_file_name)
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token File "/usr/lib/python2.6/site-packages/keystoneclient/common/cms.py", line 148, in cms_verify
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token raise e
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token CalledProcessError: Command 'openssl' returned non-zero exit status 4
2015-04-15 18:52:13.027 29533 TRACE keystoneclient.middleware.auth_token
2015-04-15 18:52:13.028 29533 DEBUG keystoneclient.middleware.auth_token [-] Marking token as unauthorized in cache _cache_store_invalid /usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py:1154
2015-04-15 18:52:13.028 29533 WARNING keystoneclient.middleware.auth_token [-] Authorization failed for token
2015-04-15 18:52:13.029 29533 INFO keystoneclient.middleware.auth_token [-] Invalid user token - deferring reject downstream