Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to vanilla firecracker snapshots #794

Closed
CuriousGeorgiy opened this issue Sep 4, 2023 · 0 comments · Fixed by #816
Closed

Move to vanilla firecracker snapshots #794

CuriousGeorgiy opened this issue Sep 4, 2023 · 0 comments · Fixed by #816
Assignees
Labels
enhancement New feature or request upstream

Comments

@CuriousGeorgiy
Copy link
Member

CuriousGeorgiy commented Sep 4, 2023

Currently, we have our own custom implementation of snapshots for firecracker, but it would be highly desirable to move to vanilla firecracker snapshots.

Firecracker supports snapshots since v0.23.0 1, they are also supported in the firecracker-go-sdk since v1.0.0 2. microVM snapshots are not supported in firecracker-containerd, so we need to patch it to support snapshot-restore requests.

Firecracker snapshots have a limitation on resource names during snapshot loading, which is not compatible with container snapshot mounts, so we will need to patch it, by adding a parameter to the snapshot loading request for the new container snapshot's path. We will also need to patch the firecracker-go-sdk to forward this new request parameter. See also firecracker-microvm/firecracker#4014 for details.

Footnotes

  1. https://github.com/firecracker-microvm/firecracker/releases/tag/v0.23.0

  2. https://github.com/firecracker-microvm/firecracker-go-sdk/releases/tag/v1.0.0

@CuriousGeorgiy CuriousGeorgiy added enhancement New feature or request upstream labels Sep 4, 2023
@CuriousGeorgiy CuriousGeorgiy self-assigned this Sep 4, 2023
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
The networking managing component is responsible for managing virtual
networks that will be used for firecracker microVMs to be compatible with
snapshot-restore.

Closes
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
The networking managing component is responsible for managing virtual
networks that will be used for firecracker microVMs to be compatible with
snapshot-restore.

Closes
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
The networking managing component is responsible for managing virtual
networks that will be used for firecracker microVMs to be compatible with
snapshot-restore.

Closes
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

The tap manager will be refactored into a new networking managing component
responsible for managing the network topology described above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

The tap manager will be refactored into a new networking managing component
responsible for managing the network topology described above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

The tap manager will be refactored into a new networking managing component
responsible for managing the network topology described above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

The tap manager will be refactored into a new networking managing component
responsible for managing the network topology described above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Encapsulate container image management into a separate module that provides
an image manager class.

Closes vhive-serverless#799
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 4, 2023
Encapsulate container image management into a separate module that provides
an image manager class.

Closes vhive-serverless#799
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 5, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, a new network topology is proposed (see vhive-serverless#797 for
details), and a new networking management component is introduced.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 5, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 5, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 5, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes vhive-serverless#797
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 5, 2023
Encapsulate container image management into a separate module that provides
an image manager class.

Closes vhive-serverless#799
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
leokondrashov pushed a commit that referenced this issue Sep 6, 2023
Currently, each firecracker VM needs to use a TAP network device, to route
its packages into the network stack of the physical host. When saving and
restoring a function instance, the tap device name and the IP address of
the functions’ server, running inside the container, are preserved (see
also the current requirements for vanilla firecracker snapshot
loading [1]). This leads to networking conflicts on the host and limits the
snapshot restoration to a single instance per physical machine.

To bypass this obstacle, the following network topology is proposed:

1. A new network namespace (e.g.: VMns4) is created for each VM, in which
the TAP device from the snapshotted VM is rebuilt and receives the original
IP address of the function. The TAP device will broadcast all the incoming
and outgoing packets to and from the serverless function and VM’s network
interface. Each VM will run in its own network namespace, leading to no
conflicts on the host due to networking resources.

2. A local virtual tunnel is established between the VM inside its network
namespace and the host node via a virtual ethernet pair (veth). A link is
then established between the two ends of the virtual ethernet pair, in the
network namespace (veth4-0) and the host namespace (veth4-1). In contrast,
the default vHive configuration sets up a similar forwarding system through
network bridges.

3. Inside the network namespace we add a routing rule that redirects all
packets via the veth VM end towards a default gateway (172.17.0.17). Thus,
all packets sent by the function will show at the hosts’ end of the tunnel.

4. To avoid IP conflicts when routing the packets to and from functions,
each VM is assigned a unique clone address (172.18.0.5). All packets
leaving the VM end of the virtual ethernet pair get their source address
rewritten to the clone address of the corresponding VM. Packets entering
the host end of the virtual ethernet pair get their destination address
written to the original address of the VM. As a result, each VM still
thinks it is using the original address while in reality, its address is
translated to a clone address, different for every VM. This is accomplished
using two rules in the NAT table corresponding to the virtual namespace of
the VM. One rule is added in the POSTROUTING chain and one in the
PREROUTING chain. The POSTROUTING rule alters the network packets before
they are sent out in the virtual tunnel, from the VM namespace to the host,
and rewrites the IP source address of the packet. Similarly, the PREROUTING
rule overwrites the destination address of incoming packets, before
routing. The two ensure that packets going into the virtual namespace have
their destination address the original IP address of the VM (172.16.0.2),
while packets coming out of the namespace have their source address the
clone IP address (172.18.05). The source IP address will remain the same
for all the VM in the enhanced snapshotting mode, being set to 172.16.0.2
respectively.

5. In the routing table of the host, we add a rule that dictates that any
package that has as destination IP the clone IP of a VM, will be routed
towards the end of the tunnel situated in the corresponding network
namespace, through a set gateway (172.17.0.18). This ensures that whenever
packages arrive on the host for a VM, they will be sent down the right
virtual tunnel instantaneously.

6. In the hosts NFT filter table we add 2 rules for the FORWARD chain, that
allow traffic from the host end of the veth pair (veth4-1) to the default
host interface (eno 49) and vice versa.

Introduce a new networking management component for the topology described
above.

1. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md#loading-snapshots

Closes #797
Part of #794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
leokondrashov pushed a commit to CuriousGeorgiy/vHive that referenced this issue Sep 6, 2023
Encapsulate container image management into a separate module that provides
an image manager class.

Closes vhive-serverless#799
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 6, 2023
Encapsulate container image management into a separate module that provides
an image manager class.

Closes vhive-serverless#799
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 6, 2023
Currently, VM snapshots are managed by the orchestrator via a table of idle
function instances. With vhive-serverless#794 snapshot management will become more
complicated, and thus it requires refactoring into a separate component.

Closes vhive-serverless#802
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 6, 2023
Currently, VM snapshots are managed by the orchestrator via a table of idle
function instances. With vhive-serverless#794 snapshot management will become more
complicated, and thus it requires refactoring into a separate component.

Closes vhive-serverless#802
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 6, 2023
In the scope of vhive-serverless#794, we will need to manage container snapshots backed by
the thin pool device mapper, so we need to introduce a separate module for
this.

Closes vhive-serverless#805
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 8, 2023
In the scope of vhive-serverless#794, we will need to manage container snapshots backed by
the thin pool device mapper, so we need to introduce a separate module for
this.

Closes vhive-serverless#805
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 12, 2023
Currently, the host interface name is separated from the VM pool and tap
manager, and as a result, it needs to be passed to all VM pool methods.

This is inconvenient, so encapsulate the host interface name into the VM
pool and tap manager.

Closes vhive-serverless#810
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 12, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 12, 2023
Currently, the host interface name is separated from the VM pool and tap
manager, and as a result, it needs to be passed to all VM pool methods.

This is inconvenient, so encapsulate the host interface name into the VM
pool and tap manager.

Closes vhive-serverless#810
Part of vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 12, 2023
Replace the tap manager with the new network manager and drop the tap
manager.

Rework snapshot manager to maintain one list of snapshots with a new usable
feature.

Create a device snapshot for the base container image during patch
creation.

Drop VM offloading APIs and replace them with snapshot creation or VM
shutdown.

Add a new network pool size option to the orchestrator.

Update firecracker-related binaries.

Update firecracker integration code.

Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
ustiugov pushed a commit that referenced this issue Sep 14, 2023
Currently, the host interface name is separated from the VM pool and tap
manager, and as a result, it needs to be passed to all VM pool methods.

This is inconvenient, so encapsulate the host interface name into the VM
pool and tap manager.

Closes #810
Part of #794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 14, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 14, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 15, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 18, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 18, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 18, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 18, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 18, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 26, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 27, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 27, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 27, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 27, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Sep 27, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
ustiugov pushed a commit to CuriousGeorgiy/vHive that referenced this issue Oct 3, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
leokondrashov pushed a commit to CuriousGeorgiy/vHive that referenced this issue Oct 13, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
CuriousGeorgiy added a commit to CuriousGeorgiy/vHive that referenced this issue Oct 13, 2023
Closes vhive-serverless#794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
leokondrashov pushed a commit that referenced this issue Oct 16, 2023
Closes #794

Signed-off-by: Georgiy Lebedev <lebedev.gk@phystech.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant