New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Send video and logs to some external system for storage, indexing and reporting #430

Closed
adrichem opened this Issue Feb 7, 2018 · 18 comments

Comments

3 participants
@adrichem
Contributor

adrichem commented Feb 7, 2018

Zalenium Image Version(s): 3.8.1j
Docker Version: N/A
docker-compose, version: N/A
OS: N/A
Docker Command to start Zalenium: N/A

Expected Behavior -

Our tests produce artifacts that we want for later analysis. We use things like blobstorage, Elasticsearch and Kibana to store and index them along with relevant metadata. The video recordings that Zalenium creates are currently difficult to deal with because:

  • Scraping the file-system will be error-prone as the files can be overwritten by the dashboard code at any time
  • I prefer to avoid persistent volumes for containers.
  • The metadata in the dashboard html is interesting, however we want to decide at run-time what extra metadata is associated with each artifact.

On https://github.com/adrichem/zalenium ( branch artifact-store), I have created a beta version that can:

  1. Send video and logs to some external system for storage and indexing. Its a simple HTTP Post. The dashboard is not affected and remains in place.
  2. Include user supplied metadata into that Post request. We use the cookie mechanism just like for the test result

Before I open a PR, I was wondering how you feel about such a change? And if you can point out any work that needs to be done to make it good enough for a PR.

I hope you find it useful,
Regards

Actual Behavior -

Currently Zalenium has its own dashboard that testers user to look at videorecordings and logfiles. It does not yet allow these artifacts to be stored and indexed by some external system.

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Feb 8, 2018

Member

Hi @adrichem,

This is quite a surprise to be honest, and I kind of like the idea. But my main concern is that the Zalenium container is already doing a lot of stuff (dashboard logic, copying files locally, nginx proxy) in addition to handle test requests, and adding more logic would make things more complicated.

We are starting to work slowly into redoing the dashboard, to remove some of the heavy work that it is done by updating it directly in the core logic where test requests are handled. The solution is nothing really fancy, but just exposing the data the dashboard uses through a JSON in an endpoint, and then the dashboard reads this every X seconds.

What do you think if we do this as a separate service running in a side container, where it could read that endpoint and then just pull the data from the Zalenium container and then put it wherever is needed?

PS: Thanks a lot for your interest and intention to improve Zalenium.

Member

diemol commented Feb 8, 2018

Hi @adrichem,

This is quite a surprise to be honest, and I kind of like the idea. But my main concern is that the Zalenium container is already doing a lot of stuff (dashboard logic, copying files locally, nginx proxy) in addition to handle test requests, and adding more logic would make things more complicated.

We are starting to work slowly into redoing the dashboard, to remove some of the heavy work that it is done by updating it directly in the core logic where test requests are handled. The solution is nothing really fancy, but just exposing the data the dashboard uses through a JSON in an endpoint, and then the dashboard reads this every X seconds.

What do you think if we do this as a separate service running in a side container, where it could read that endpoint and then just pull the data from the Zalenium container and then put it wherever is needed?

PS: Thanks a lot for your interest and intention to improve Zalenium.

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Feb 9, 2018

Contributor

Hi @diemol,

I have a few reasons why I think its best to use the fire-and-forget HTTP Post approach as coded in the beta version:

  1. The net performance will probably be better instead of worse. I agree that I do introduce 1 extra read for each file. But that will be more than compensated by the fact that user’s who query for files and watch the videos no longer require any resources from the zalenium container and host machine.

  2. We will get an extra possibility for performance improvement. Instead of transferring the files from elgalu/selenium into zalenium, we can run a curl command inside elgalu/selenium to send the files directly to the storage/indexing endpoint. This requires less resources and allows it to be distributed over the cluster.

  3. If you use the query and pull based approach, then the user’s own storage/indexing components will continuously need to query that endpoint. When you have a lot of tests this will result in O(n) performance instead of O(1) with the fire-and-forget method.

  4. As a final argument, I think it is very important that the user can determine arbitrary metadata to associate with each file. This metadata can be set at any time during test execution. If you use the pull based approach then your new dashboard will need a little more complexity to remember which metadata belongs to which file.

PS: I'm willing to open-source my storage and indexing component, then users can have very nice dashboards with a simple docker-compose file

Contributor

adrichem commented Feb 9, 2018

Hi @diemol,

I have a few reasons why I think its best to use the fire-and-forget HTTP Post approach as coded in the beta version:

  1. The net performance will probably be better instead of worse. I agree that I do introduce 1 extra read for each file. But that will be more than compensated by the fact that user’s who query for files and watch the videos no longer require any resources from the zalenium container and host machine.

  2. We will get an extra possibility for performance improvement. Instead of transferring the files from elgalu/selenium into zalenium, we can run a curl command inside elgalu/selenium to send the files directly to the storage/indexing endpoint. This requires less resources and allows it to be distributed over the cluster.

  3. If you use the query and pull based approach, then the user’s own storage/indexing components will continuously need to query that endpoint. When you have a lot of tests this will result in O(n) performance instead of O(1) with the fire-and-forget method.

  4. As a final argument, I think it is very important that the user can determine arbitrary metadata to associate with each file. This metadata can be set at any time during test execution. If you use the pull based approach then your new dashboard will need a little more complexity to remember which metadata belongs to which file.

PS: I'm willing to open-source my storage and indexing component, then users can have very nice dashboards with a simple docker-compose file

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Feb 16, 2018

Member

Hi @adrichem,

Sorry for the late reply, busy week.

The net performance will probably be better instead of worse. I agree that I do introduce 1 extra read for each file. But that will be more than compensated by the fact that user’s who query for files and watch the videos no longer require any resources from the zalenium container and host machine.

I agree that the Zalenium container should have less work if the files are read from a different source. But the initial step is that the files need to be placed in that different source. That logic is the one I feel we should not put inside Zalenium but in a separate service that actually does that extra read and puts the files somewhere else.

We will get an extra possibility for performance improvement. Instead of transferring the files from elgalu/selenium into zalenium, we can run a curl command inside elgalu/selenium to send the files directly to the storage/indexing endpoint. This requires less resources and allows it to be distributed over the cluster.

That's a good idea, nevertheless the implementation might be complicated. The video and logs collection logic is the final step when the test is done, and the logic is not only copying files but also renaming them accordingly based on the supplied test capabilities. Just sending the files is not enough, we need to still establish the relation between those files and the test. Maybe this one is not so feasible for now.

If you use the query and pull based approach, then the user’s own storage/indexing components will continuously need to query that endpoint. When you have a lot of tests this will result in O(n) performance instead of O(1) with the fire-and-forget method.

I think this depends on the implementation. Right now there is an endpoint returning a big JSON array with all the available info. This will be tuned to return all the data or a delta based on a parameter (probably a timestamp), I guess we can implement some simple pagination as well.

As a final argument, I think it is very important that the user can determine arbitrary metadata to associate with each file. This metadata can be set at any time during test execution. If you use the pull based approach then your new dashboard will need a little more complexity to remember which metadata belongs to which file.

I didn't understand completely this one... Like having a test with extra capabilities that are just used for test metadata?

Member

diemol commented Feb 16, 2018

Hi @adrichem,

Sorry for the late reply, busy week.

The net performance will probably be better instead of worse. I agree that I do introduce 1 extra read for each file. But that will be more than compensated by the fact that user’s who query for files and watch the videos no longer require any resources from the zalenium container and host machine.

I agree that the Zalenium container should have less work if the files are read from a different source. But the initial step is that the files need to be placed in that different source. That logic is the one I feel we should not put inside Zalenium but in a separate service that actually does that extra read and puts the files somewhere else.

We will get an extra possibility for performance improvement. Instead of transferring the files from elgalu/selenium into zalenium, we can run a curl command inside elgalu/selenium to send the files directly to the storage/indexing endpoint. This requires less resources and allows it to be distributed over the cluster.

That's a good idea, nevertheless the implementation might be complicated. The video and logs collection logic is the final step when the test is done, and the logic is not only copying files but also renaming them accordingly based on the supplied test capabilities. Just sending the files is not enough, we need to still establish the relation between those files and the test. Maybe this one is not so feasible for now.

If you use the query and pull based approach, then the user’s own storage/indexing components will continuously need to query that endpoint. When you have a lot of tests this will result in O(n) performance instead of O(1) with the fire-and-forget method.

I think this depends on the implementation. Right now there is an endpoint returning a big JSON array with all the available info. This will be tuned to return all the data or a delta based on a parameter (probably a timestamp), I guess we can implement some simple pagination as well.

As a final argument, I think it is very important that the user can determine arbitrary metadata to associate with each file. This metadata can be set at any time during test execution. If you use the pull based approach then your new dashboard will need a little more complexity to remember which metadata belongs to which file.

I didn't understand completely this one... Like having a test with extra capabilities that are just used for test metadata?

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Feb 20, 2018

Contributor

Hi @diemol,

Thanks for the reply. Regarding my last point, I meant the following:

Whenever an artifact is produced (or found some time after) my tests, it is sent to the store/indexer together with a JSON object that contains metadata about the artifact and what kind of situation caused it. This data is useful for later analysis and also for correlating different artifacts to the same occurrence of a situation during testing.

Each JSON will be slightly different in structure and content depending on the type of artifact, who produced it and the progress during testexecution. In fact, as our testset is growing I expect more of structures to arise. This is why my changes to Zalenium allow the automated tests to send that JSON structure in a cookie and then Zalenium just forwards it together with the video's to the store/indexer.

Contributor

adrichem commented Feb 20, 2018

Hi @diemol,

Thanks for the reply. Regarding my last point, I meant the following:

Whenever an artifact is produced (or found some time after) my tests, it is sent to the store/indexer together with a JSON object that contains metadata about the artifact and what kind of situation caused it. This data is useful for later analysis and also for correlating different artifacts to the same occurrence of a situation during testing.

Each JSON will be slightly different in structure and content depending on the type of artifact, who produced it and the progress during testexecution. In fact, as our testset is growing I expect more of structures to arise. This is why my changes to Zalenium allow the automated tests to send that JSON structure in a cookie and then Zalenium just forwards it together with the video's to the store/indexer.

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Feb 20, 2018

Member

Hi @adrichem,

I missed that part in the fork you have, now I saw the code where the cookie is captured and the metadata is added to the test information.

Just curious, do you have an example of that metadata?

Member

diemol commented Feb 20, 2018

Hi @adrichem,

I missed that part in the fork you have, now I saw the code where the cookie is captured and the metadata is added to the test information.

Just curious, do you have an example of that metadata?

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Feb 22, 2018

Contributor

Hi @diemol

Below are two examples. However, please understand that the content and structure of this metadata will change over time. I don't want Zalenium to make any assumptions about its structure.

{"metadata":
{
    "Testcase":"xxxxx",
    "Feature":"xxxxx",
    "Date":"2018-02-15T10:53:35Z",
    "testOutcome":{"result":"NOK","error":"Object is not displaying a checkmark"},
    "Browser":"Firefox",
    "Type":"testresult"
}
}

{"metadata":
{
    "Testcase":"yyyyy",
    "Feature":"yyyyy",
    "Date":"2018-02-09T12:01:49Z",
    "testOutcome":{"result":"OK","error":""},
    "Browser":"Firefox",
    "Type":"logfile"
}}
Contributor

adrichem commented Feb 22, 2018

Hi @diemol

Below are two examples. However, please understand that the content and structure of this metadata will change over time. I don't want Zalenium to make any assumptions about its structure.

{"metadata":
{
    "Testcase":"xxxxx",
    "Feature":"xxxxx",
    "Date":"2018-02-15T10:53:35Z",
    "testOutcome":{"result":"NOK","error":"Object is not displaying a checkmark"},
    "Browser":"Firefox",
    "Type":"testresult"
}
}

{"metadata":
{
    "Testcase":"yyyyy",
    "Feature":"yyyyy",
    "Date":"2018-02-09T12:01:49Z",
    "testOutcome":{"result":"OK","error":""},
    "Browser":"Firefox",
    "Type":"logfile"
}}
@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Feb 23, 2018

Member

Hi @adrichem,

I think that part with the metadata we could add it, but the rest is maybe better to have a separate service, what do you think?

Member

diemol commented Feb 23, 2018

Hi @adrichem,

I think that part with the metadata we could add it, but the rest is maybe better to have a separate service, what do you think?

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Feb 26, 2018

Contributor

Hi @diemol I think it is a good idea that the dashboard and its storage/indexing is split away from the Java servlets in the Selenium hub. That will allow the users to make choices like

  1. Which node runs that part of the functionality
  2. Replace it with their own version. In my case, I have no need for the local dashboard
Contributor

adrichem commented Feb 26, 2018

Hi @diemol I think it is a good idea that the dashboard and its storage/indexing is split away from the Java servlets in the Selenium hub. That will allow the users to make choices like

  1. Which node runs that part of the functionality
  2. Replace it with their own version. In my case, I have no need for the local dashboard
@wonko

This comment has been minimized.

Show comment
Hide comment
@wonko

wonko Mar 19, 2018

We would have a use case for this - we run a stateless k8s cluster (no persistent, local storage available on the pods). A solution for us would be to upload each video to an S3 bucket, and assign metadata on it.

wonko commented Mar 19, 2018

We would have a use case for this - we run a stateless k8s cluster (no persistent, local storage available on the pods). A solution for us would be to upload each video to an S3 bucket, and assign metadata on it.

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Mar 19, 2018

Member

@adrichem I am sorry, I thought I replied here...

But yes, I am totally up to having this as a side service. Ideally as a container running next to Zalenium doing these tasks, and we could even start that container from Zalenium (based on a flag) to simplify things for users.

@wonko, I agree, it is a good user case. Hopefully we can put things together with @adrichem

Member

diemol commented Mar 19, 2018

@adrichem I am sorry, I thought I replied here...

But yes, I am totally up to having this as a side service. Ideally as a container running next to Zalenium doing these tasks, and we could even start that container from Zalenium (based on a flag) to simplify things for users.

@wonko, I agree, it is a good user case. Hopefully we can put things together with @adrichem

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Apr 4, 2018

Contributor

Hi guys,

I've open sourced the component for storage and indexing of the artifacts. See https://github.com/adrichem/artifact-store. It uses Azure blobstorage as we don't use S3. This should be enough for @wonko to get a basic setup running. A basic docker-compose is included in the README.

@diemol I think there's an easy way to integrate my change into Zalenium without worry about the extra IO I do. How do you feel about me changing my version to only use local or only the new remote dashboard execution paths? That's going to save a lot of work for me and will not cause the Zalenium proxy to need extra resources.

Contributor

adrichem commented Apr 4, 2018

Hi guys,

I've open sourced the component for storage and indexing of the artifacts. See https://github.com/adrichem/artifact-store. It uses Azure blobstorage as we don't use S3. This should be enough for @wonko to get a basic setup running. A basic docker-compose is included in the README.

@diemol I think there's an easy way to integrate my change into Zalenium without worry about the extra IO I do. How do you feel about me changing my version to only use local or only the new remote dashboard execution paths? That's going to save a lot of work for me and will not cause the Zalenium proxy to need extra resources.

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Apr 5, 2018

Member

That sounds great @adrichem! Actually I was looking towards this :)
It would be cool to have that in Zalenium, so one can decide to store things locally or in a remote storage. I'm completely open to this, thanks!

Member

diemol commented Apr 5, 2018

That sounds great @adrichem! Actually I was looking towards this :)
It would be cool to have that in Zalenium, so one can decide to store things locally or in a remote storage. I'm completely open to this, thanks!

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Jun 2, 2018

Member

@adrichem I hope you didn't change your mind :) Will you send us a PR?

Member

diemol commented Jun 2, 2018

@adrichem I hope you didn't change your mind :) Will you send us a PR?

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Jun 5, 2018

Contributor

Working on it. It got somewhat low in my stack of priorities :)

Contributor

adrichem commented Jun 5, 2018

Working on it. It got somewhat low in my stack of priorities :)

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Jun 27, 2018

Member

This was released a moment ago but I will keep it open for some days since the documentation is missing.

Member

diemol commented Jun 27, 2018

This was released a moment ago but I will keep it open for some days since the documentation is missing.

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Jul 12, 2018

Member

@adrichem I was trying to update the docs, and this is what I have:

As discussed in #430. I'm not sure how your documentation works. So I didn't update that. I hope you will do that. It needs to show, that

  1. Setting an environment variable tells Zalenium where to send the artifacts to the store. My compose file uses this:
    - REMOTE_DASHBOARD_HOST=http://artifact-store:4000
  2. If that environment variable does not exist, then the local dashboard will be used and Zalenium will not send the artifacts to the store.
  3. The tester can tell Zalenium to send arbitrary JSON metadata together with the files. Here's a snippet of C# code that does that:
this.Driver.Manage().Cookies.AddCookie(new Cookie("zaleniumMetadata", >JsonConvert.SerializeObject(Metadata)))

But what endpoints should the remote url have? To send videos, logs and metadata something is needed over there, right?

Member

diemol commented Jul 12, 2018

@adrichem I was trying to update the docs, and this is what I have:

As discussed in #430. I'm not sure how your documentation works. So I didn't update that. I hope you will do that. It needs to show, that

  1. Setting an environment variable tells Zalenium where to send the artifacts to the store. My compose file uses this:
    - REMOTE_DASHBOARD_HOST=http://artifact-store:4000
  2. If that environment variable does not exist, then the local dashboard will be used and Zalenium will not send the artifacts to the store.
  3. The tester can tell Zalenium to send arbitrary JSON metadata together with the files. Here's a snippet of C# code that does that:
this.Driver.Manage().Cookies.AddCookie(new Cookie("zaleniumMetadata", >JsonConvert.SerializeObject(Metadata)))

But what endpoints should the remote url have? To send videos, logs and metadata something is needed over there, right?

@adrichem

This comment has been minimized.

Show comment
Hide comment
@adrichem

adrichem Jul 12, 2018

Contributor

Hi @diemol

Thats correct. An endpoint needs to exist. This is likely to something specific to each different environment. Here is an example that I use:

I've open sourced the component for storage and indexing of the artifacts. See https://github.com/adrichem/artifact-store. It uses Azure blobstorage as we don't use S3. This should be enough for @wonko to get a basic setup running. A basic docker-compose is included in the README

Contributor

adrichem commented Jul 12, 2018

Hi @diemol

Thats correct. An endpoint needs to exist. This is likely to something specific to each different environment. Here is an example that I use:

I've open sourced the component for storage and indexing of the artifacts. See https://github.com/adrichem/artifact-store. It uses Azure blobstorage as we don't use S3. This should be enough for @wonko to get a basic setup running. A basic docker-compose is included in the README

diemol added a commit that referenced this issue Jul 24, 2018

@diemol

This comment has been minimized.

Show comment
Hide comment
@diemol

diemol Jul 24, 2018

Member

Closed with #667, thanks @adrichem!

Member

diemol commented Jul 24, 2018

Closed with #667, thanks @adrichem!

@diemol diemol closed this Jul 24, 2018

diemol added a commit that referenced this issue Jul 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment