avatar

Qiutong Men

Reaching the unreachable star

Forcing intra-host traffic to cable with Docker and macvlan

Motivation

It seems like a weird requirement to direct traffic out of intra-host network stack, traverse a switch, and bounce back to another interface: why one would ever need a intra-host communication with both worse latency and throughput? And if they does want a real network latency and thoughput, why not just put two ends of a protocol on two machines?

Well it happens, we were emulating for something. Two instances should be put under different filesystems and network namespaces: they must regard each other as separate nodes, while they must share access to the same hardware attached to the host machine, so we couldn’t put them on two baremetal machines.

Latency and thoughput noise injection was not good, either. Because TBD

Operation

We have a dual-port NIC with interfaces ens0 and ens1 recognized by host in default namespace, both are down, without address ever assigned.

  1. Separate them to isolated ns to avoid kernel shortcut
1
2
3
4
5
6
7
8
9
sudo ip netns add ns0
sudo ip link set ens0 netns ns0
sudo ip netns exec ns0 ip addr add 192.168.10.100/24 dev ens0
sudo ip netns exec ns0 ip link set ens0 up

sudo ip netns add ns1
sudo ip link set ens1 netns ns1
sudo ip netns exec ns1 ip addr add 192.168.10.101/24 dev ens1
sudo ip netns exec ns1 ip link set ens1 up
  1. Verify they can ping each other via fabric
1
2
sudo ip netns exec ns1 tcpdump -i ens1 icmp > logs.out &
sudo ip netns exec ns0 ping 192.168.10.101
  1. Create macvlans bridge with interfaces
1
2
3
sudo ip netns exec ns0 ip link macvlan0 link ens0 type macvlan mode bridge

sudo ip netns exec ns1 ip link macvlan1 link ens1 type macvlan mode bridge
  1. Create docker without network
1
2
3
4
docker run -dit --name c0 --network none alpine sh
docker run -dit --name c1 --network none alpine sh
pid0=$(docker inspect -f '{{.State.Pid}}' c0)
pid1=$(docker inspect -f '{{.State.Pid}}' c1)
  1. Move prepared macvlans to corresponding container ns
1
2
3
4
5
6
7
8
9
10
11
sudo ip link set macvlan0 netns $pid0
sudo nsenter -t $pid0 -n ip link set macvlan0 name eth0
sudo nsenter -t $pid0 -n ip link set eth0 up
# no need to be the same with host interface
sudo nsenter -t $pid0 -n ip addr add 192.168.1.100/24 dev eth0

sudo ip link set macvlan1 netns $pid1
sudo nsenter -t $pid1 -n ip link set macvlan1 name eth0
sudo nsenter -t $pid1 -n ip link set eth0 up
# no need to be the same with host interface
sudo nsenter -t $pid1 -n ip addr add 192.168.1.101/24 dev eth0
  1. Verify containers can ping each other via fabric:
1
2
sudo ip netns exec ns1 tcpdump -i ens1 icmp > logs.out &
docker exec -it c0 ping 192.168.1.101

But how many copies?

Should be only two, the same as a fabric communication: kernel -> fabric -> kernel.

Why it works?

TBD.

Lessons

TBD.

docker create macvlan network

create veth pairs