Quick-start tutorial runs with internal timeouts #3640

atombender · 2017-10-04T01:28:06Z

Gist with log files, ps output, netstat output, Kubernetes manifest.

I don't know if this is an actual error or not, but I get lots of log entries about not being able to connect to the config server at port 19070, lines like this:

1507078186.060	vespa-0.vespa.default.svc.cluster.local	544/1	configproxy	configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient	info	Could not connect to config source at tcp/localhost:19070
1507078186.061	vespa-0.vespa.default.svc.cluster.local	544/1	configproxy	configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient	info	Could not connect to any config source in set [tcp/localhost:19070], please make sure config server(s) are running.

I'm starting Vespa under Kubernetes, using the exact commands here. One difference is that Kubernetes is setting the host name, but I don't see why that matters. Using curl, I can confirm that connections to port 19070 hang and eventually time out.

From what I can tell, the config server has been started. The config sentinel (PID 833 in the ps output), however, keeps restarting.

Using netstat, I see that the config server process is indeed listening on port 19070.

Port 19070 is operational.

The text was updated successfully, but these errors were encountered:

jakaayp · 2017-10-04T05:08:56Z

yaps

jobergum · 2017-10-04T08:00:27Z

Thank you for your detailed ticket. We'll look into this.

jobergum · 2017-10-04T10:47:23Z

The docker container quick start http://docs.vespa.ai/documentation/vespa-quick-start.html uses https://github.com/vespa-engine/docker-image/blob/master/include/start-container.sh as ENTRYPOINT and will start both vespa-configserver and vespa-services in parallel if no arguments are present (It should be given arguments in a multi-node installation).

Since the configuration server is slower to start then vespa-services you get these could not connect errors. They are transient though. Here we see configuration server is initialising bundles while configproxy (which is a process started by vespa-services) tries to talk to it over RPC on port 19070.

1507078186.056 vespa-0.vespa.default.svc.cluster.local 285/1 jdisc/configserver /com.yahoo.jdisc.core.FelixFramework debug Installing OSGi bundle from 'file:/opt/vespa/lib/jars/docprocs-jar-with-dependencies.jar'. 1507078186.060 vespa-0.vespa.default.svc.cluster.local 544/1 configproxy configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient info Could not connect to config source at tcp/localhost:19070 1507078186.061 vespa-0.vespa.default.svc.cluster.local 544/1 configproxy configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient info Could not connect to any config source in set [tcp/localhost:19070], please make sure config server(s) are running.

The remaining log entries will be printed until you have deployed a application package (http://docs.vespa.ai/documentation/cloudconfig/application-packages.html) which would describe which services should be started by the sentinel process. See also #3588 as it has a few pointers.

1507078202.191 vespa-0.vespa.default.svc.cluster.local 544/23 configproxy configproxy.com.yahoo.vespa.config.proxy.DelayedResponseHandler warning No config found for name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local within timeout, will retry 1507078206.293269 vespa-0.vespa.default.svc.cluster.local 627/21269 config-sentinel config-sentinel.config.frt.frtconnection warning Connection to tcp/localhost:19090 failed or timed out 1507078206.306697 vespa-0.vespa.default.svc.cluster.local 627/21269 config-sentinel config-sentinel.config.frt.frtconnection warning FRT Connection tcp/localhost:19090 suspended until 2017-10-04 00:50:16 GMT 1507078206.306755 vespa-0.vespa.default.svc.cluster.local 627/21269 config-sentinel config-sentinel.config.frt.frtconfigagent info Error response or no response from config server (key: name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local) (errcode=103, validresponse:0), trying again in 6000 milliseconds 1507078230.380 vespa-0.vespa.default.svc.cluster.local 544/12 configproxy configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient info Subscribe for 'name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local,0944a8c189a502c0e2fe1930114897b7' failed, closing subscriber 1507078242.308485 vespa-0.vespa.default.svc.cluster.local 627/21269 config-sentinel config-sentinel.config.frt.frtconnection warning Connection to tcp/localhost:19090 failed or timed out

atombender · 2017-10-04T15:47:14Z

Sorry, I think I got confused by the many different ports mentioned. While the errors about 19070 happen, the ones that keep coming over and over are about port 19090:

1507130648.340445	vespa-0.vespa.default.svc.cluster.local	2214/24672	config-sentinel	config-sentinel.config.frt.frtconnection	warning	Connection to tcp/localhost:19090 failed or timed out
1507130648.340520	vespa-0.vespa.default.svc.cluster.local	2214/24672	config-sentinel	config-sentinel.config.frt.frtconnection	warning	FRT Connection tcp/localhost:19090 suspended until 2017-10-04 15:24:58 GMT
1507130648.340540	vespa-0.vespa.default.svc.cluster.local	2214/24672	config-sentinel	config-sentinel.config.frt.frtconfigagent	info	Error response or no response from config server (key: name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local) (errcode=103, validresponse:0), trying again in 10000 milliseconds
1507130649.790	vespa-0.vespa.default.svc.cluster.local	543/12	configproxy	configproxy.com.yahoo.vespa.config.proxy.RpcConfigSourceClient	info	Subscribe for 'name=sentinel,namespace=cloud.config,configId=hosts/vespa-0.vespa.default.svc.cluster.local,0944a8c189a502c0e2fe1930114897b7' failed, closing subscriber
1507130658.244562	vespa-0.vespa.default.svc.cluster.local	2214/54059	config-sentinel	config-sentinel	warning	Timout getting config, please check your setup. Will exit and restart: Timed out while subscribing to 'cloud.config.sentinel', configid 'hosts/vespa-0.vespa.default.svc.cluster.local'

They also keep coming after preparing a sample application with /opt/vespa/bin/vespa-deploy prepare /vespa-sample-apps/basic-search/src/main/application/ && /opt/vespa/bin/vespa-deploy activate, after which a lot of apparent failures happen. I can't really make heads or tails out of the log since I don't know what's failing, but there are warnings about RPC errors and timeouts while activating the application. Here is a log from the moment I do the deploy.

Re Kubernetes: I'm running this in a single container locally in Minikube. I'm not yet trying out a multi-node setup.

(Protip: I recommend using triple backtics when pasting chunks of raw text, otherwise your lines will be wrapped and become unreadable.)

atombender · 2017-10-04T15:48:59Z

http://localhost:8080/ApplicationStatus does return 200 OK, but trying to post a document fails with 404 No binding for URI 'http://192.168.64.3:30090/document/v1/music/music/docid/1' (192.168.64.3:30090 is the exposed endpoint of the container).

atombender · 2017-10-04T15:52:40Z

Edit: Ignore that last comment. That is mapped to the port of the config server. Port 8080 is not operational.

jobergum · 2017-10-04T17:02:04Z

Thanks for the update. How much memory do you have available in the Minikube? From a few searches it seems like 1G is the default and that is going to be insufficient.

Vespa is a beast with a lot of java jvm's with rather high max heap settings and I'm suspecting that your trouble is from insufficient memory available inside the Minikube. Does it work if you enable more memory? 4G is great, 2G sufficient for getting through sample quick start on docker.

Thanks for the pro tip on quotes. These are what I found from the log snippets provided in the gist.

1507131006.896	vespa-0.vespa.default.svc.cluster.local	283/156	configserver	Container.com.yahoo.vespa.config.server.deploy.Deployment	info	Session 2 activated successfully using no host provisioner. Config generation 2

Which is great, then finally sentinel gets configuration and tries to start the set of services

507131018.367062	vespa-0.vespa.default.svc.cluster.local	608/35910	config-sentinel	runserver	event	starting/1 name="sbin/vespa-config-sentinel -c hosts/vespa-0.vespa.default.svc.cluster.local (pid 2301)"
1507131018.371818	vespa-0.vespa.default.svc.cluster.local	2301/1079	config-sentinel	config-sentinel	event	started/1 name="config-sentinel"
1507131018.384211	vespa-0.vespa.default.svc.cluster.local	2307/1079	config-sentinel	config-sentinel.service	event	starting/1 name="container"
1507131018.385391	vespa-0.vespa.default.svc.cluster.local	2313/1079	config-sentinel	config-sentinel.service	event	starting/1 name="topleveldispatch"
1507131018.385725	vespa-0.vespa.default.svc.cluster.local	2308/1079	config-sentinel	config-sentinel.service	event	starting/1 name="logserver"
1507131018.386405	vespa-0.vespa.default.svc.cluster.local	2309/1079	config-sentinel	config-sentinel.service	event	starting/1 name="searchnode"
1507131018.386697	vespa-0.vespa.default.svc.cluster.local	2310/1079	config-sentinel	config-sentinel.service	event	starting/1 name="distributor"
1507131018.386955	vespa-0.vespa.default.svc.cluster.local	2311/1079	config-sentinel	config-sentinel.service	event	starting/1 name="container-clustercontroller"
1507131018.387050	vespa-0.vespa.default.svc.cluster.local	2314/1079	config-sentinel	config-sentinel.service	event	starting/1 name="slobrok"
1507131018.387212	vespa-0.vespa.default.svc.cluster.local	2312/1079	config-sentinel	config-sentinel.service	event	starting/1 name="docprocservice"
1507131018.387367	vespa-0.vespa.default.svc.cluster.local	2315/1079	config-sentinel	config-sentinel.service	event	starting/1 name="logd"
1507131018.387987	vespa-0.vespa.default.svc.cluster.local	2316/1079	config-sentinel	config-sentinel.service	event	starting/1 name="filedistributorservice"

So no connectivity issues, but then sentinel tries to start these services which again will subscribe to new configuration and this is when it starts to to go down badly where all of them complain about not being able to subscribe to configuration. I suspect that configuration server has been killed due to oom due to potentially memory constraints when starting all these processes.

bratseth · 2017-10-05T08:02:50Z

.

jobergum · 2017-10-06T10:01:59Z

@atombender were you able to make any progress on this issue? Thanks!

atombender · 2017-10-11T04:51:34Z

@jobergum Sorry, I've been taking a few days off from work. I'll get back to you. But I did try starting with 4GB of RAM, with no difference in behaviour. Shouldn't the logs say if something is being forcibly killed due to memory usage?

jobergum · 2017-10-11T11:34:24Z

Thanks for the update,

Yes, I was wrong. We can see the configuration server is alive and fine from the log.

The configurations server fails to produce configuration due to this

1507131019.253	vespa-0.vespa.default.svc.cluster.local	283/176	configserver	Container.com.yahoo.vespa.config.server.rpc.GetConfigProcessor	info	Failed request (Unknown config definition name=qr-start,namespace=search.config,configId=container/container.0) from Connection { Socket[addr=/127.0.0.1,port=49036,localport=19070] }

https://gist.githubusercontent.com/atombender/2c24a899ce8d051396ba0e97bba6822f/raw/8523e9c09834882d857267c1f78b1981c2750f99/statefulset.yml

Inside your docker container, what is the output of

cat /opt/vespa/var/db/vespa/config_server/serverdb/serverdefs/search.config.qr-start.def

Your manifest https://gist.githubusercontent.com/atombender/2c24a899ce8d051396ba0e97bba6822f/raw/8523e9c09834882d857267c1f78b1981c2750f99/statefulset.yml states

 volumeMounts:
        - name: data
          mountPath: /opt/vespa/var/db
        - name: sample-apps
          mountPath: /vespa-sample-apps
      volumes:
      - name: sample-apps
        hostPath:
          path: /data/vespa-sample-apps
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 1Gi

Data in Vespa is stored in /opt/vespa/var/db/vespa/search/ and I wonder if any of these volume settings have overwritten the config def installation files but not familiar with the syntax above.

atombender · 2017-10-11T15:44:13Z

The database volume is initially empty when Vespa starts for the first time. I'm not populating that volume with anything — all that data is exclusively being written to by Vespa.

Unfortunately:

$ kubectl exec -it vespa-0 -- cat /opt/vespa/var/db/vespa/config_server/serverdb/serverdefs/search.config.qr-start.def
cat: /opt/vespa/var/db/vespa/config_server/serverdb/serverdefs/search.config.qr-start.def: No such file or directory

Here's a tarball of the whole db directory: vespa-db.tar.gz.

jobergum · 2017-10-11T17:00:30Z

So that is the problem then.

docker exec -it vespa-test cat /opt/vespa/var/db/vespa/config_server/serverdb/serverdefs/search.config.qr-start.def
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
# Do update the start script with the new name if you change it:
namespace=search.config

Without these def files configuration server is not able to serve configuration. Please try change the mountPath to /opt/vespa/var/db/vespa/search.

atombender · 2017-10-11T17:24:09Z

D'oh. I just assumed that this would be the data folder and that Vespa would behave correctly if it was initially empty. I've never actually encountered an application before that puts non-volatile data under a folder named /var/db (var stands for variable, after all).

Does this mean that data in /opt/vespa/var/db (except for /opt/vespa/var/db/vespa/search) is transient and expendable? The reason I mount a volume is to get persistence with Kubernetes. But if Vespa writes important state outside of /opt/vespa/var/db/vespa/search, they'll be lost if the pod is stopped/rescheduled.

Started up again, looks much better now.

jobergum · 2017-10-11T18:36:05Z

Great, finally progress. I see that http://docs.vespa.ai/documentation/reference/files-processes-and-ports.html could need some work. We do run the exact same docker images in production so we should be able to provide some guidance on volumes etc on this in our documentation but little can be found on this subject today unfortunately. Please have patience while we build out the documentation on this.

$VESPA_HOME/var/db/vespa/search has all the index & storage data so you want to persist that, no state for search or storage should be stored outside of that directory.

atombender · 2017-10-11T19:52:55Z

Thanks, this was helpful. Closing this since it seems to get me further in the tutorial.

bratseth · 2017-10-12T06:21:35Z

Would you mind sending the steps needed for a Minikube version of the quickstart?

atombender · 2017-10-17T05:01:39Z

@bratseth: Sure, given these manifests, something like this:

# Start Minikube
$ minikube start --vm-driver=xhyve --memory 4096

# Mount sample apps, assuming they've been checking out in current dir
$ minikube mount $PWD/sample-apps:/data/vespa-sample-apps &

# Install statefulset and its service
$ kubectl apply -f service.yml statefulset.yml

# Wait for success by monitoring until pod shows it's running:
$ kubectl get pod --watch

# Deploy sample app
$ kubectl exec -it vespa-0 -- bash -c '/opt/vespa/bin/vespa-deploy prepare /vespa-sample-apps/basic-search/src/main/application/ && /opt/vespa/bin/vespa-deploy activate'

lundin · 2017-12-20T23:47:32Z

@atombender or anyone else
Did you ever managed to created a ingress (i am on GKE) and manage it to work with vespa tools and endpoints ?
I can reach the config endpoint mydomain.com/ApplicationStatus due to the mapping in the ingress pointing to the config port (nodeport service).
But now i am trying to use the deploy API with the endpoints in:
http://docs.vespa.ai/documentation/cloudconfig/deploy-rest-api-v2.html#use-cases

It seems that i need to have multiple (at least two ports 80(80) and 19071 on the same domain open) and ingress does not support that
kubernetes/ingress-nginx#1655

POSTING to
http://mydomain.com/application/v2/tenant/mytenant/session
just return 404 -default backend

and specify
http://mydomain.com:19701/application/v2/tenant/mytenant/session
hangs.

Anyone has a suggestion how i can create the ingress exposing needed ports ?

Service:

apiVersion: v1
kind: Service
metadata:
  name: vespa
  labels:
    app: vespa
spec:
  selector:
    app: vespa
  type: NodePort
  ports:
  - name: sample-app
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: config
    port: 19071
    targetPort: 19071
    protocol: TCP

Ingress:

- host: mydomain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: vespa
          servicePort: 80
      - path: /ApplicationStatus
        backend:
          serviceName: vespa
          servicePort: 19071

atombender closed this as completed Oct 11, 2017

lundin mentioned this issue Dec 20, 2017

kubernetes ingress and needed port to use vespa #4511

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick-start tutorial runs with internal timeouts #3640

Quick-start tutorial runs with internal timeouts #3640

atombender commented Oct 4, 2017

jakaayp commented Oct 4, 2017

jobergum commented Oct 4, 2017

jobergum commented Oct 4, 2017

atombender commented Oct 4, 2017

atombender commented Oct 4, 2017

atombender commented Oct 4, 2017

jobergum commented Oct 4, 2017

bratseth commented Oct 5, 2017 •

edited

Loading

jobergum commented Oct 6, 2017 •

edited

Loading

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

bratseth commented Oct 12, 2017

atombender commented Oct 17, 2017

lundin commented Dec 20, 2017

Quick-start tutorial runs with internal timeouts #3640

Quick-start tutorial runs with internal timeouts #3640

Comments

atombender commented Oct 4, 2017

jakaayp commented Oct 4, 2017

jobergum commented Oct 4, 2017

jobergum commented Oct 4, 2017

atombender commented Oct 4, 2017

atombender commented Oct 4, 2017

atombender commented Oct 4, 2017

jobergum commented Oct 4, 2017

bratseth commented Oct 5, 2017 • edited Loading

jobergum commented Oct 6, 2017 • edited Loading

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

jobergum commented Oct 11, 2017

atombender commented Oct 11, 2017

bratseth commented Oct 12, 2017

atombender commented Oct 17, 2017

lundin commented Dec 20, 2017

bratseth commented Oct 5, 2017 •

edited

Loading

jobergum commented Oct 6, 2017 •

edited

Loading