Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Containerd Support #1581

Closed
perithompson opened this issue Nov 19, 2020 · 10 comments
Closed

Windows Containerd Support #1581

perithompson opened this issue Nov 19, 2020 · 10 comments
Assignees
Labels
area/OS/windows Issues or PRs related to the Windows operating system. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@perithompson
Copy link
Contributor

perithompson commented Nov 19, 2020

Describe the problem/challenge you have
With the announcement of that windows containerd support is stable in 1.20 and the deprecation of dockershim, we would like to move forward with Windows support for containerd.

Anything else you would like to add?
I have been testing this out today and I have noticed a few errors that will need to be addressed. The first is that cni plugin will not initialise, if you add the configuration it is possible to get passed this step but then we are presented with the following error message.

I1119 15:51:27.094587    4108 server.go:187] Load network configurations: container_id:"d99d181484ef78a708d12a73c50887639aad4a202938b74cddb86ddb2f2b934a" netns:"dd6241ae-b05c-4484-a637-4e9d6c5f7e5c" ifname:"eth0" args:"K8S_POD_NAME=iis-2019-5446596bb4-jcx7k;K8S_POD_INFRA_CONTAINER_ID=d99d181484ef78a708d12a73c50887639aad4a202938b74cddb86ddb2f
2b934a;IgnoreUnknown=1;K8S_POD_NAMESPACE=default" path:"c:/opt/cni/bin" network_configuration:"{\"cniVersion\":\"0.3.0\",\"name\":\"antrea\",\"type\":\"antrea\",\"deviceID\":\"\",\"dns\":{},\"ipam\":{\"type\":\"host-local\",\"subnet\":\"192.168.6.0/24\",\"gateway\":\"192.168.6.1\"},\"runtimeConfig\":{\"dns\":{\"servers\":[\"10.96.0.10\"],\"s
earches\":[\"default.svc.cluster.local\",\"svc.cluster.local\",\"cluster.local\"]}}}"
E1119 15:51:27.094587    4108 server_windows.go:61] Cannot get infra container ID, unexpected netNS: dd6241ae-b05c-4484-a637-4e9d6c5f7e5c, fallback to containerID

SigWindowsTools has a script to install containerd on windows Server 2019 or later but once this gets setup antrea does not take over the nat network.

{
    "cniVersion": "0.2.0",
    "name": "nat",
    "type": "nat",
    "master": "Ethernet",
    "ipam": {
        "subnet": "$subnet",
        "routes": [
            {
                "GW": "$gateway"
            }
        ]
    },
    "capabilities": {
        "portMappings": true,
        "dns": true
    }
}

Sub issues

Proposals

@perithompson perithompson added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 19, 2020
@perithompson
Copy link
Contributor Author

/sig windows

@ruicao93 ruicao93 added the area/OS/windows Issues or PRs related to the Windows operating system. label Nov 20, 2020
@ruicao93
Copy link
Contributor

Thanks peri, we will investigate if there're any gap between docker and containerd on Windows.

@wenyingd
Copy link
Contributor

Hi @perithompson , may I know the version of kubelet?

Originally, kubelet on Windows format the "netNS" string in this style: 1) For sandbox container, the netns is "none" 2) for workload container, the netns is in the format of "container:$sandbox_container_ID". So we have a check for the existence of ":" in the netNS string to find the infra container ID if its value is not "none".

But according to the configuration of the request you have pasted, the value of netns string is neither "none" or a string including prefix "container:". I think we need to confirm the change is introduced by kubelet or containerd first.

@perithompson
Copy link
Contributor Author

HI @wenyingd, this was on v1.19.1, I used cluster-api to create the cluster and set the crisocket, with containerd, I've been told that a nat network must be created for any pods to start, once this is done any other plugin can be added on top

@wenyingd
Copy link
Contributor

HI @wenyingd, this was on v1.19.1, I used cluster-api to create the cluster and set the crisocket, with containerd, I've been told that a nat network must be created for any pods to start, once this is done any other plugin can be added on top

Got it, we will double check.

@perithompson
Copy link
Contributor Author

Some more information, if we create a nat network using New-HnsNetwork -Type NAT -Name nat from PrepareNode.ps1 the cni cannot create as it needs the nat plugin to work to initialise for the antrea-agent-windows pod to start-up

containerd logs

time="2020-11-20T16:27:47.744467400-08:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:windows DefaultRuntimeName:runhcs-wcow-process DefaultRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Op
tions:<nil> PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} UntrustedWorkloadRuntime:{Type: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:<nil> PrivilegedWithoutHostDevices:false BaseRuntimeSpec:} Runtimes:map[runhcs-wcow-process:{Type:i
o.containerd.runhcs.v1 Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:<nil> PrivilegedWithoutHostDevices:false BaseRuntimeSpec:}] NoPivot:false DisableSnapshotAnnotations:false DiscardUnpackedLayers:false} CniConfig:{NetworkPluginBinDir:C:\\op
t\\cni\\bin NetworkPluginConfDir:C:\\etc\\cni\\net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate:} Registry:{Mirrors:map[docker.io:{Endpoints:[https://registry-1.docker.io]}] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:} DisableT
CPService:true StreamServerAddress:127.0.0.1 StreamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:0 SandboxImage:mcr.microsoft.com/oss/kubernetes/pause:1.4.0 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X50
9KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup:false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:false DisableHu
getlbController:false IgnoreImageDefinedVolumes:false} ContainerdRootDir:C:\\ProgramData\\containerd\\root ContainerdEndpoint:\\\\.\\pipe\\containerd-containerd RootDir:C:\\ProgramData\\containerd\\root\\io.containerd.grpc.v1.cri StateDir:C:\\ProgramData\\cont
ainerd\\state\\io.containerd.grpc.v1.cri}"
time="2020-11-20T16:27:47.745455800-08:00" level=info msg="Connect containerd service"
time="2020-11-20T16:27:47.746467100-08:00" level=info msg="Get image filesystem path \"C:\\\\ProgramData\\\\containerd\\\\root\\\\io.containerd.snapshotter.v1.windows\""
time="2020-11-20T16:27:47.747462100-08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
time="2020-11-20T16:27:47.747462100-08:00" level=info msg="Start subscribing containerd event"
time="2020-11-20T16:27:47.748465800-08:00" level=info msg="Start recovering state"
time="2020-11-20T16:27:47.748465800-08:00" level=info msg=serving... address="\\\\.\\pipe\\containerd-containerd.ttrpc"
time="2020-11-20T16:27:47.748465800-08:00" level=info msg=serving... address="\\\\.\\pipe\\containerd-containerd"
time="2020-11-20T16:27:47.749463200-08:00" level=info msg="containerd successfully booted in 0.053992s"
time="2020-11-20T16:27:47.765467600-08:00" level=info msg="Start event monitor"
time="2020-11-20T16:27:47.765467600-08:00" level=info msg="Start snapshots syncer"
time="2020-11-20T16:27:47.765467600-08:00" level=info msg="Start cni network conf syncer"
time="2020-11-20T16:27:47.766455300-08:00" level=info msg="Start streaming server"
time="2020-11-20T16:28:49.765340300-08:00" level=info msg="No cni config template is specified, wait for other system components to drop the config."
time="2020-11-20T16:28:50.065717400-08:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-proxy-windows-zvg27,Uid:b533fe5f-4d50-4d51-8bad-050d0560ac2f,Namespace:kube-system,Attempt:0,}"

kubelet logs

time="2020-11-20T16:21:45.034336000-08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:antrea-agent-windows-n5wpc,Uid:1809e202-0489-4491-ac9f-47368abb4cc3,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandb
ox \"9c55be252c3af4776a75043c9e797af61cbbf5da96ad5c79fdf201781fcf5a51\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing open \\\\\\\\.\\\\pipe\\\\cnisock: The system cannot find the file specified.\""
time="2020-11-20T16:21:46.957768100-08:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-proxy-windows-zvg27,Uid:b533fe5f-4d50-4d51-8bad-050d0560ac2f,Namespace:kube-system,Attempt:0,}"
time="2020-11-20T16:21:47.027787600-08:00" level=error msg="Failed to destroy network for sandbox \"d4ce64f4fe8b794927b6d872f681df51b8b90e0a3d0bb1f04921e7fb8eda27d0\"" error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error whil
e dialing open \\\\\\\\.\\\\pipe\\\\cnisock: The system cannot find the file specified.\""
time="2020-11-20T16:21:47.030782600-08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-proxy-windows-zvg27,Uid:b533fe5f-4d50-4d51-8bad-050d0560ac2f,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandbox
 \"d4ce64f4fe8b794927b6d872f681df51b8b90e0a3d0bb1f04921e7fb8eda27d0\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing open \\\\\\\\.\\\\pipe\\\\cnisock: The system cannot find the file specified.\""
time="2020-11-20T16:21:56.961758300-08:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:antrea-agent-windows-n5wpc,Uid:1809e202-0489-4491-ac9f-47368abb4cc3,Namespace:kube-system,Attempt:0,}"
time="2020-11-20T16:21:57.030221300-08:00" level=error msg="Failed to destroy network for sandbox \"e525b925fa79e3ed88bf3032542e74502567173851daaa0555ab580ce9af3a64\"" error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error whil
e dialing open \\\\\\\\.\\\\pipe\\\\cnisock: The system cannot find the file specified.\""
time="2020-11-20T16:21:57.033224400-08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:antrea-agent-windows-n5wpc,Uid:1809e202-0489-4491-ac9f-47368abb4cc3,Namespace:kube-system,Attempt:0,} failed, error" error="failed to setup network for sandb
ox \"e525b925fa79e3ed88bf3032542e74502567173851daaa0555ab580ce9af3a64\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing open \\\\\\\\.\\\\pipe\\\\cnisock: The system cannot find the file specified.\""

@ruicao93
Copy link
Contributor

@perithompson : We do the test using Containerd as runtime and found that antrea-agent cannot work with Containerd for now.

Here're problems we found in test:

  • Containerd calls CNI plugin in different operation sequence and different params. So antrea-agent need to adapt to the differences.
  • Containerd might require CNI plugin to use different hcsshim API for network setup. We need more investigation for this problem.

@ruicao93
Copy link
Contributor

Create an issue for containerd because we need to let containerd team help to measure the workfow changes for Windows from Docker to containerd.

containerd/containerd#4851

@ruicao93
Copy link
Contributor

Open an issue in hcsshim repo to request for help about how to create VNIC in container. microsoft/hcsshim#911

@github-actions
Copy link
Contributor

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/OS/windows Issues or PRs related to the Windows operating system. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants