Fluentd Logging System Part 1 #652

rkooo567 · 2019-03-22T00:43:56Z

This is the part of Logging project. #625.

Summary of PR.

Booting up Fluentd instance inside Clipper cluster if users turn on the flag use_centralized_log=True for DockerContainerManager.
Fluentd container image will be pulled from official fluentd image. This will be changed in the second phase PR of this project.
Our Fluentd instance will copy the config file to clipper_admin/docker/logging/fluentd/clipper_fluentd.conf. It will be done within FluentdConfig class. It will also write the correct port number.
Fluentd conf file will be stored in a temp file like metric conf files.
Once everything is setup, Fluentd instance will centralize all the logs within the cluster. Currently, the config is very simple; It collects all the logs to stdout of fluentd instance, meaning it is not that useful yet.
README.md contains about how to use this feature.

Note

Since @simon-mo mentions that Grafana or other logging tools can be potentially used, I tried to make this part as pluggable as possible. If we want to use a different logging system, we can just change logging_system to a different class and create a new class that has same public functions as Fluentd class. I didn't define Interface because I thought it was too much at this point. I will write about it in README.md later.

Testing

Tested if connect() works correctly. (3 tests). 2 tests are checking if it raises a ConnectRefusedError when old connection and new connection has different flag value for use_centralized_log. You can check this from _is_valid_logging_state_to_connect within DockerContainerManager
Tested if other Clipper nodes logs are in Fluentd stdout logs. (which is docker logs )
Tested if logs of models are in fluentd stdout logs once they are deployed.

AmplabJenkins · 2019-03-22T00:44:45Z

Can one of the admins verify this patch?

simon-mo · 2019-03-22T01:47:19Z

jenkins ok to test

simon-mo · 2019-03-22T01:47:27Z

jenkins add to whitelist

AmplabJenkins · 2019-03-22T03:25:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1811/
Test FAILed.

rkooo567 · 2019-03-22T20:07:17Z

Okay. There are 2 issues in both fluentd integration test. Please let me know if you can think of any reason why it occurs.

integration_py2_fluentd: It cannot import fluentd module that I created for some reasons. I will test locally with python 2.

[integration_py2_fluentd] File "/clipper/clipper_admin/clipper_admin/docker/docker_container_manager.py", line 25, in <module> 
 [integration_py2_fluentd] from clipper_admin.docker.logging.fluentd.fluentd import Fluentd 
 [integration_py2_fluentd] ImportError: No module named fluentd.fluentd

integration_py3_fluentd: For some reasons, it cannot initialize docker fluentd logging driver. I will test it again in vm.

…Used requests' ConnectionError class instead of ConnectionRefusedError which is not supported in python2. Added user='root' inside fluentd container run function so that it can access conf file existing in a root folder within a container.

rkooo567 · 2019-03-23T05:47:13Z

I fixed errors. Please test it again!

simon-mo · 2019-03-23T06:28:37Z

Jenkins test this please

simon-mo · 2019-03-23T06:28:47Z

Jenkins ok to test

AmplabJenkins · 2019-03-23T09:36:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1815/
Test FAILed.

rkooo567 · 2019-03-24T04:20:49Z

Looks like there is an issue with Pytorch container?

 [pytorch-container] resp = super(CacheControlAdapter, self).send(request, **kw) 
 [pytorch-container] File "/usr/local/lib/python2.7/site-packages/pip/_vendor/requests/adapters.py", line 508, in send 
 [pytorch-container] raise ConnectionError(e, request=request) 
 [pytorch-container] ConnectionError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/93/b3/672813e65ac7605bac14a2a8e40e1a18e03bf0ac588c6a799508ee66c289/torch-1.0.1.post2-cp27-cp27mu-manylinux1_x86_64.whl (Caused by ReadTimeoutError("HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. (read timeout=15)",))

Do you think it is due to the change? Let me test deploy_pytorch_container locally in vm

AmplabJenkins · 2019-03-28T15:49:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1858/
Test FAILed.

simon-mo · 2019-03-28T15:53:23Z

@rkooo567 it seems there are two issues

The ports are not assigned to unbound port somehow. You can occupy a port with netcat -l 22424 and then try to start clipper with fluentd.
There are some sort of infinite loop, see the console log.

rkooo567 · 2019-04-05T21:46:13Z

clipper_admin/clipper_admin/docker/docker_container_manager.py

@@ -459,6 +489,26 @@ def stop_all(self, graceful=True):
            else:
                c.kill()

+    def _is_valid_logging_state_to_connect(self, all_labels):


I will change the logic of this part. I will make new clipper connection to turn on use_log_centralization flag if there is a fluentd instance running in a cluster regardless of use_log_centralization flag of the current DockerContainerManager instance. As you can see the current logic is that if the flag is different from the cluster context (meaning if use_centralization is on but there's no fluentd instance), it will cause an error. I will change this to

If Fluentd instance is within a cluster and the current flag is off -> current flag is on.

If current flag is on but there's no fluentd instance running -> ClipperException

Otherwise same

…uentd

AmplabJenkins · 2019-04-05T23:09:57Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1890/
Test FAILed.

AmplabJenkins · 2019-04-06T01:07:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1891/
Test FAILed.

AmplabJenkins · 2019-04-09T21:40:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1893/
Test FAILed.

AmplabJenkins · 2019-04-09T22:07:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1894/
Test FAILed.

AmplabJenkins · 2019-04-10T00:49:47Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1895/
Test PASSed.

rkooo567 · 2019-04-10T00:57:38Z

Yes! Finally passed tests. @simon-mo. Please review the PR and leave me some comments. Also, are there some other people who will be involved in code review?

rkooo567 · 2019-04-11T04:32:33Z

@withsmilo If you have time, can you also review the PR? I will appreciate it!

withsmilo · 2019-04-11T04:55:52Z

@rkooo567 Sure, I will review this PR tonight.

clipper_admin/clipper_admin/docker/docker_container_manager.py

simon-mo · 2019-04-27T04:24:55Z

Sorry about the delay. I'll review this over the weekend.

clipper_admin/clipper_admin/docker/docker_container_manager.py

clipper_admin/clipper_admin/docker/logging/docker_logging_utils.py

simon-mo · 2019-04-27T05:28:37Z

clipper_admin/clipper_admin/docker/logging/fluentd.py

+                or not os.path.isfile(self._file_path):
+            self._file_path = self.build_temp_file()
+
+        # Logging-TODO: Currently, it copies the default conf from clipper_fluentd.conf.


Add this as an issue.

I added a comment to #625 instead (because it will be anyway handled at PR2, and it is not merged yet). If you still want me to add this to a new issue, I will do it after this pr is merged! Please let me know

simon-mo · 2019-04-27T05:29:37Z

examples/basic_query/example_client.py

@@ -40,7 +40,7 @@ def signal_handler(signal, frame):

 if __name__ == '__main__':
    signal.signal(signal.SIGINT, signal_handler)
-    clipper_conn = ClipperConnection(DockerContainerManager())
+    clipper_conn = ClipperConnection(DockerContainerManager(use_centralized_log=False))


Isn't it turned off by default?

I wanted to expose this option to users by adding it to the example code. If you think it is better removing it, I will do that! Let me know

AmplabJenkins · 2019-04-27T19:42:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1910/
Test FAILed.

rkooo567 · 2019-04-27T20:00:31Z

@simon-mo Can you run Jenkins again? It failed at docker-metric test, but I could pass it locally on vm.
The log said

[integration_py3_docker_metric] 19-04-27:19:31:52 ERROR [clipper_metric_docker.py:126] Failed to parse: http://localhost:31119/api/v1/series?match[]=clipper_mc_pred_total

simon-mo · 2019-04-27T21:39:08Z

Jenkins test this please

AmplabJenkins · 2019-04-27T22:15:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1912/
Test FAILed.

rkooo567 · 2019-04-27T22:40:35Z

Hmm.. I got the same error. idk how we fail to parse url. When I urlparse in the interactive shell, it looks fine.. I will try to figure out soon

rkooo567 · 2019-04-29T22:00:36Z

~~It is supposed to pass tests. If so, I will rebase the commits.~~
nvm. It seems like it is automatically squashed once merged.

AmplabJenkins · 2019-04-29T22:44:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1921/
Test PASSed.

rkooo567 and others added 14 commits November 23, 2018 20:58

Update Basic example README.md

aa6e1db

Merge branch 'develop' into develop

f43fa27

Merge branch 'develop' into develop

c43e313

Added a basic fluentd support

a05443a

Merge remote-tracking branch 'clipper/develop' into fluentd

4525f96

log_config option added

d7bd73a

fluentd instance is now running

f0dae98

Now we can mount a fluentd conf file to a docker.

00c2d91

Changed pydocstring style

3b5b1ca

Cleaned up some styles

b255c4b

Created an integration test

90d480a

Cleaned up some part of code

bb6a990

Refactor done

09fc089

Cleaning up. Refactoring tests to unittest style

0418be7

simon-mo self-requested a review March 22, 2019 01:42

Merge branch 'develop' into fluentd

dff44da

Merge branch 'develop' into fluentd

132baa7

rkooo567 commented Apr 5, 2019

View reviewed changes

rkooo567 added 2 commits April 5, 2019 15:38

Changed is_valid_state logic

18bef18

Merge branch 'fluentd' of https://github.com/rkooo567/clipper into fl…

2d3657a

…uentd

Remove fluentd test to see if it happens due to the test

d9cd323

rkooo567 added 2 commits April 9, 2019 09:35

Clean some part of code and add some strings for prettifying test logs

008d0ac

Test with only one broken test to see if parallelization is an issue

9aec964

It will probably fix the bug

5c9f25f

Added a submodule logging in setup.py to resolve import error

2ea53ce

withsmilo reviewed Apr 15, 2019

View reviewed changes

clipper_admin/clipper_admin/docker/docker_container_manager.py Outdated Show resolved Hide resolved

simon-mo requested changes Apr 27, 2019

View reviewed changes

changed some code based on code review

6f5673a

simon-mo approved these changes Apr 27, 2019

View reviewed changes

Merge branch 'develop' into fluentd

0977ad5

simon-mo merged commit ac6aa42 into ucbrise:develop Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentd Logging System Part 1 #652

Fluentd Logging System Part 1 #652

rkooo567 commented Mar 22, 2019

AmplabJenkins commented Mar 22, 2019

simon-mo commented Mar 22, 2019

simon-mo commented Mar 22, 2019

AmplabJenkins commented Mar 22, 2019

rkooo567 commented Mar 22, 2019

rkooo567 commented Mar 23, 2019

simon-mo commented Mar 23, 2019

simon-mo commented Mar 23, 2019

AmplabJenkins commented Mar 23, 2019

rkooo567 commented Mar 24, 2019 •

edited

AmplabJenkins commented Mar 28, 2019

simon-mo commented Mar 28, 2019

rkooo567 Apr 5, 2019

AmplabJenkins commented Apr 5, 2019

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

AmplabJenkins commented Apr 10, 2019

rkooo567 commented Apr 10, 2019

rkooo567 commented Apr 11, 2019

withsmilo commented Apr 11, 2019

simon-mo commented Apr 27, 2019

simon-mo Apr 27, 2019

rkooo567 Apr 27, 2019

simon-mo Apr 27, 2019

rkooo567 Apr 27, 2019

AmplabJenkins commented Apr 27, 2019

rkooo567 commented Apr 27, 2019 •

edited

simon-mo commented Apr 27, 2019

AmplabJenkins commented Apr 27, 2019

rkooo567 commented Apr 27, 2019 •

edited

rkooo567 commented Apr 29, 2019 •

edited

AmplabJenkins commented Apr 29, 2019

Fluentd Logging System Part 1 #652

Fluentd Logging System Part 1 #652

Conversation

rkooo567 commented Mar 22, 2019

This is the part of Logging project. #625.

Summary of PR.

Note

Testing

AmplabJenkins commented Mar 22, 2019

simon-mo commented Mar 22, 2019

simon-mo commented Mar 22, 2019

AmplabJenkins commented Mar 22, 2019

rkooo567 commented Mar 22, 2019

rkooo567 commented Mar 23, 2019

simon-mo commented Mar 23, 2019

simon-mo commented Mar 23, 2019

AmplabJenkins commented Mar 23, 2019

rkooo567 commented Mar 24, 2019 • edited

AmplabJenkins commented Mar 28, 2019

simon-mo commented Mar 28, 2019

rkooo567 Apr 5, 2019

Choose a reason for hiding this comment

AmplabJenkins commented Apr 5, 2019

AmplabJenkins commented Apr 6, 2019

AmplabJenkins commented Apr 9, 2019

AmplabJenkins commented Apr 9, 2019

AmplabJenkins commented Apr 10, 2019

rkooo567 commented Apr 10, 2019

rkooo567 commented Apr 11, 2019

withsmilo commented Apr 11, 2019

simon-mo commented Apr 27, 2019

simon-mo Apr 27, 2019

Choose a reason for hiding this comment

rkooo567 Apr 27, 2019

Choose a reason for hiding this comment

simon-mo Apr 27, 2019

Choose a reason for hiding this comment

rkooo567 Apr 27, 2019

Choose a reason for hiding this comment

AmplabJenkins commented Apr 27, 2019

rkooo567 commented Apr 27, 2019 • edited

simon-mo commented Apr 27, 2019

AmplabJenkins commented Apr 27, 2019

rkooo567 commented Apr 27, 2019 • edited

rkooo567 commented Apr 29, 2019 • edited

AmplabJenkins commented Apr 29, 2019

rkooo567 commented Mar 24, 2019 •

edited

rkooo567 commented Apr 27, 2019 •

edited

rkooo567 commented Apr 27, 2019 •

edited

rkooo567 commented Apr 29, 2019 •

edited