Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds travis_time tags to logs [WIP] #199

Conversation

lukesarnacki
Copy link
Contributor

This adds travis_time:start and travis_time:end:s=0.01:u=0.01:r=0.01 tags to logs. This is marked as WIP because it is more like proof of concept which I would like to discuss. It was much harder to come up with than I expected and I would love to hear, if you also think that this is the way to go.

All commands with echo and timing are wrapped in time block and wrapped in travis_time tags:

  echo -en 'travis_time:start:\r'
  echo \$\ git\ fetch\ origin
  { time
  travis_retry git fetch origin;
  } 2> /tmp/timing.out
  travis_assert
  echo -en 'travis_time:end:`cat /tmp/timing.out`\r'

On the beginning I wanted to make time one line but it broke almost all of the tests, so for now there are new lines to avoid this. I am wondering whether I should make it one line and change matchers or leave it like that with new lines.

When command is finished, time is saved to timing.out file in u=%U:r=%E:s=%S format (it is set in environment variable TIMEFORMAT="u=%U:r=%E:s=%S"). Then on travis_time:end tag its content is added using cat command.

I am not sure if we need two tags, but I don't think that there is other way for knowing on what line command was displayed since we know time after output is printed. So basically time label in travis web would be shown where starting tag is, but time would be taken from ending tag.

I hope that I will proceed much faster with this feature from this point. Please let me know if you think that there is better way or if there is some hole in my logic. I will try to run travis locally soon and check this using travis-web (right now I am assuming that it will be possible to use it looking at how fold tags look like).

Related issue: travis-ci/travis-ci#1246

@@ -1,6 +1,8 @@
#!/bin/bash
source /etc/profile

IMEFORMAT="u=%U:r=%E:s=%S"

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@joshk
Copy link
Contributor

joshk commented Feb 3, 2014

This is super awesome! I swear all Sarnackis are god sends!

end

def measure(code)
"{ time\n#{code};\n} 2> /tmp/timing.out "

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@joshk
Copy link
Contributor

joshk commented Feb 3, 2014

This is very exciting and very cool. Might also be good to add a timing start marker so we know which command/s has been timed?

@lukesarnacki
Copy link
Contributor Author

@joshk, travis_time:start is in the code, I just didn't paste it in example script in my comment. I think I will push working code tomorrow :).

@lukesarnacki
Copy link
Contributor Author

Hey guys, so I pushed changed code. Let me write few words about this solution and where I need help with it.

There is Timing module which is included in Travis::Build::Shell same way as other filters. It wraps commands with travis_time_start and travis_time_finish, so example script with timing would be something like:

# (...)

travis_time_start() {
  echo -en "travis_time:start\\r"
}

travis_time_finish() {
  local result=$?
  echo -en "travis_time:finish:`cat /tmp/timing.out`\\r"
  return $result
}

# (...)

travis_time_start
{ time travis_retry bundle install 2>&3; } 3>&2 2> /tmp/timing.out
travis_time_finish
travis_assert

There are 2 main differences from what I pushed before.

  1. Redirection is little bit more complicated ({ time command 2>&3; } 3>&2 2> /tmp/timing.out). I found out that otherwise (using simple { time command } 2> /tmp/timing.out), error will be redirected to a file along with timing. Example script shows that this time it should work as expected:

    #!/bin/bash
    
    TIMEFORMAT="u=%U:r=%E:s=%S"
    
    travis_assert() {
      local result=$?
      if [ $result -ne 0 ]; then
        echo -e "assert failed"
      fi
    }
    
    travis_time_start() {
      echo "travis_time:start\\r"
    }
    
    travis_time_finish() {
      local result=$?
      echo "travis_time:end:`cat timing.out`\\r"
      return $result
    }
    
    echo \$ cat non_existing_file
    travis_time_start
    { time sleep 0.1 && cat non_existing_file 2>&3; } 3>&2 2>timing.out
    travis_time_finish
    travis_assert

    Such script would give following output:

    $ cat non_existing_file
    travis_time:start\r
    cat: non_existing_file: No such file or directory
    travis_time:end:u=0.001:r=0.105:s=0.002\r
    assert failed
    
  2. travis_assert is run after travis_time_finish. travis_time_finish will print travis_time:finish: tag, but it will also return last exit status. This is better than asserting before time tag, because if command fails, it would be also nice to see how long did it take.

To make specs work with that design I added measures_time? matcher similar to folds? and here is where I need your help guys. As far as I understood, folds? is based on name passed when folding code. Since I don't have name for travis_time, I can't test it in same way as folds? is tested. And this is needed to test travis_assert properly, when travis_time is before travis_assert.

Do you have any idea how this can be tested?

@joshk @drogus

@lukesarnacki
Copy link
Contributor Author

Btw: In a meantime I will try to add code to travis-web, to check this solution as a whole thing. Currently tests are not passing due to problem I have with writing matcher, but I think that tags are generated correctly.

@joshk
Copy link
Contributor

joshk commented Mar 12, 2014

@lukesarnacki one idea came to mind, instead of outputting the time to a file, couldn't we use date +%s to get the timestamp before running the command and then after and work out the seconds difference?

@lukesarnacki
Copy link
Contributor Author

The drawback of such solution would be that time command gives real / user / system times and we are limited to real time only using date +%s. If you think that this only real is needed, than that's fine, but I don't see any significant drawbacks of using files :).

Take a look at time man page (http://linux.die.net/man/1/time) and see if you find any of options interesting (memory? percentage of CPU?). I am not sure if this is useful at all, as I am not very familiar with travis.

@joshk
Copy link
Contributor

joshk commented Mar 12, 2014

I think only Real time is what we care about right now.

Writing to a file just feels a little odd to me, a little dirty in someways. Using date and a variable means everything is nicely self contained. Just some ideas.

@ruleant
Copy link

ruleant commented Mar 25, 2014

Hi,

Nice work. Great idea!

Some days ago, I started working on a project with some scripts to create a trend from timestamps generated during a travis build.
Currently the analysis script in my project, reads timestamps from a file that is created during the build process by a script that generates timestamps (it uses date %S), but it has to be added manually to the .travis.yml file. (see example)
If your addition/commit would generate a file with timestamps, or if the timestamps (or r/u/s times) would appear in the build log my scripts could read this to generate a trend graph.

It would be great if we could combine our efforts? Or at least make our efforts compatible. :)

I'm slightly in favor of having a seperate file with timestamps (date %S), because that's what my scripts read currently, but other things could work as well.

What do you think?

@drogus
Copy link
Contributor

drogus commented Apr 16, 2014

Writing to a file just feels a little odd to me, a little dirty in someways. Using date and a variable means everything is nicely self contained. Just some ideas.

If we don't want to use files, we could use named pipes which should be faster. Regardless what we use to get the result of time command, I'm all for using it, because we can get a lot of useful info which we can't get from our own date calculations. I would need to check in practice, but some potentially good looking options:

  • %M - Maximum resident set size of the process during its lifetime, in Kbytes.
  • %K - Average total (data+stack+text) memory use of the process, in Kbytes.
  • %W - Number of times the process was swapped out of main memory.
  • %w - Number of waits: times that the program was context-switched voluntarily, for instance while waiting for an I/O operation to complete.
  • %I - Number of file system inputs by the process.

More on time output formatting here: http://linux.die.net/man/1/time

Memory data looks especially tempting for me, besides the regular times.

@svenfuchs
Copy link
Contributor

I like the idea of being able to collect additional metrics.

@roidrage
Copy link
Contributor

roidrage commented May 1, 2014

@lukesarnacki what needs to happen for this to move forward? Would love to have this as part of our builds.

@lukesarnacki
Copy link
Contributor Author

Oops, I knew I have forgotten about something...

@roidrage current status is that code I wrote works (at least as far as I know), but tests won't pass. Please read last part of my comment #199 (comment) more or less when I write "here is where I need your help guys" ;).

The other part is using those tags in ember app, which I haven't started on yet... So any help would be appreciated.

@roidrage
Copy link
Contributor

roidrage commented May 7, 2014

@lukesarnacki that's awesome, thanks! @joshk @henrikhodne any ideas regarding the tests?

Also, regarding the web frontend side of things, we can definitely look into that next. When the code is good to go, it should be able to run without interfering with the existing user interface, no?

@lukesarnacki
Copy link
Contributor Author

I just pushed close to final version of this PR. I am a bit embarrassed, when I realise that I finally figured that out in an 20 minutes ;). I guess that I couldn't see forest for the trees before and I needed step back.

The only things that are in my opinion worth investigating a bit more are:

  1. Avoid using TIMEFORMAT - there is a difference between time command in osx and in linux and current code works same on linux and osx. But it would be safer to write:

      /usr/bin/time -f "u=%U:r=%E:s=%S" -o /tmp/timing.out  sleep 0.1

    In osx there is gtime command which should work with same options but I couldn't get it to work today. If anybody wants to try, this is the line that you are interested in:

    https://github.com/lukesarnacki/travis-build/blob/measure-commands-execution-time/lib/travis/build/shell/filters.rb#L43

  2. Adding some of the options that @drogus wrote about (Adds travis_time tags to logs [WIP] #199 (comment) ) - I think that they are definitely interesting, but I am not really linux expert, so it would be good to decide on which we want really.

Other then that - tests are passing and I think it works as it should.

@lukesarnacki
Copy link
Contributor Author

@ruleant I totally missed your comment, sorry. I will take a look at it tomorrow :) Sounds interesting!

@ruleant
Copy link

ruleant commented May 13, 2014

@lukesarnacki No problem, looking forward to your comments.
BTW : Do you have a link to an example timing.out or build log where your script is in action?

BanzaiMan added a commit to travis-ci/travis-worker that referenced this pull request May 28, 2014
@BanzaiMan
Copy link
Contributor

I brought this PR up to date and deployed to staging to test. It's not working quite right. See https://staging.travis-ci.org/BanzaiMan/travis_staging_test/jobs/391387#L14-L17

Not sure how git checkout can fail this way here (the commit is reachable), but when the command fails, the output is wrong:

$ git checkout -qf 5462b39f2865338a39d1ce4724b64335ef9eb52c
fatal: Not a git repository (or any of the parent directories): .git
travis_time:finish:u=0.001:r=0.004:s=0.001
The command "travis_time_finish" failed and exited with 128 during .

The output suggests that the filter does not play well with travis_result.

I'll look into this in more detail tomorrow.

@BanzaiMan
Copy link
Contributor

The message above comes from travis_assert. The idea is the same.

@joshk
Copy link
Contributor

joshk commented Jun 13, 2014

So whats the status of this PR? What do we need to do to finish it off?

@BanzaiMan
Copy link
Contributor

@joshk There is a serious problem with the way time command interacts with other travis_* functions. I haven't thought much about how to reconcile the difficulties, I'm afraid.

@joshk
Copy link
Contributor

joshk commented Jun 13, 2014

I almost think it would be simpler to do the timing ourselves.

eg. timestamp, run command, timestamp again, work out difference.

@ruleant
Copy link

ruleant commented Jun 13, 2014

That's more or less the way I do it in my project: log a timestamp, and calculate the time difference. With, after some parsing, these graphs as a result.

To take the example from above (pseudo-code) :

  start_time = `date +%s`
  echo -en 'travis_time:start:git.1:' start_time
  travis_retry git fetch origin;
  travis_assert
  end_time = `date +%s`
  duration = end_time - start_time
  echo -en 'travis_time:end:git.1:' end_time
  echo -en 'travis_time:duration:git.1:' duration

This would add both start/end timestamp and the duration to the logfile.

For my project those timestamps and/or duration logs would be very useful. is the location of the logfile available during after_succes/after_failure? Or should this be parsed during the notification step?

@joshk
Copy link
Contributor

joshk commented Jun 21, 2014

@henrikhodne @BanzaiMan @lukesarnacki soooo guise, what can we do to get this finished off? I would hate to see this PR gather dust and go stale. How can we work together to make magic happen?

@BanzaiMan
Copy link
Contributor

As written, I don't think we can merge this. I think that a simpler approach that @ruleant suggested is preferable. I'll have to think how that might look a little more.

@joshk
Copy link
Contributor

joshk commented Jun 21, 2014

agree!

@lukesarnacki
Copy link
Contributor Author

Hey guys, I think that I know the way to make it work with git checkout, but I found out that it won't work with functions anyway (like travis_retry). Also as you probably noticed using shell keyword time is not straightforward with its all redirections and I am not sure of all the differences and issues with gtime on osx or /usr/bin/time in linux (it is also not installed on every system).

So I would love to see more metrics but it seems that simple approach would be much better here.

Changing current code will be very easy fortunately ;).

In order to measure commands' time, variables with
timestamps are set before and after command execution.
Time difference is printed as travis tag `travis_time:finish:`
which can be used to show execution time on travis-web.
@lukesarnacki
Copy link
Contributor Author

@BanzaiMan Can you run this and check if this is ok right now? Thanks! :)

I changed it the way @ruleant suggested (using timestamps). I measure only seconds right now. We could add ms, but it won't work on OSX. I think we can go with seconds and see if we can get to miliseconds later.

Also me and @drogus noticed that \r is not enough to clear the line if next line is shorter. It works on site, though, because web log engine works differently than shell.

Try running these two scripts:

echo -en "some long line here\r"; echo -en "shorter\n"
echo -en "some long line here\r\033[0K"; echo -en "shorter\n"

@BanzaiMan
Copy link
Contributor

OK. I'll test this out tomorrow.

@lukesarnacki
Copy link
Contributor Author

Thanks a lot @BanzaiMan ! :)

@BanzaiMan
Copy link
Contributor

@lukesarnacki It has the same problem: https://staging.travis-ci.org/BanzaiMan/travis_staging_test/builds/391915#L27

travis_ci_-_free_hosted_continuous_integration_platform_for_the_open_source_community

Also, output from the command seems mangled.

@lukesarnacki
Copy link
Contributor Author

@BanzaiMan seems like I misunderstood the problem :( Thanks for help, will take a look once again.

@drogus
Copy link
Contributor

drogus commented Jun 23, 2014

I think that the problem might lay in TRAVIS_CMD, we set it after each command: https://github.com/lukesarnacki/travis-build/blob/measure-commands-execution-time/lib/travis/build/script/templates/header.sh#L134 I'm not sure what BASH_COMMAND#travis_retry does there, probably @svenfuchs could say more.

@@ -21,7 +21,7 @@ def raw(code, *args)
end

def set(var, value, options = {})
cmd "export #{var}=#{value}", options.merge(log: false)
cmd "export #{var}=#{value}", options.merge(log: false, timing: false)

This comment was marked as spam.

@joshk
Copy link
Contributor

joshk commented Jun 23, 2014

I think the issue is an order issue. If we make command assertion happen before timing then assertion will report the correct success/failure.

eg. instead of time_start:command:time_finish:assertion, we want time_start:command:assertion:time_finish

@drogus
Copy link
Contributor

drogus commented Jun 25, 2014

@joshk I think that might result in not reporting time for last fail command which may (or maybe not?) be useful, like: building a project failed, but I still want to know how long did it take.

@sarahhodne
Copy link
Contributor

We might be able to use the Bash time builtin for this:

$ trap 'TRAVIS_CMD=$TRAVIS_NEXT_CMD; TRAVIS_NEXT_CMD=${BASH_COMMAND#travis_retry }' DEBUG
$ TIMEFORMAT=$'travis_time:finish:duration=%3R'
$ time ruby -e 'sleep 3'
travis_time:finish:duration=3.047
$ echo $TRAVIS_CMD
ruby -e 'sleep 3'

We should add a \r to the end of the TIMEFORMAT, I left it off to be able to see the output more easily. We also need to print a travis_time:start line.

@sarahhodne
Copy link
Contributor

I should read the comments first… I'll see if there's a way to make it work with functions.

@sarahhodne
Copy link
Contributor

The use of time as a reserved word permits the timing of shell builtins, shell functions, and pipelines.

$ time travis_retry ruby -e 'exit (rand() * 3).to_i'

The command "ruby -e exit (rand() * 3).to_i" failed. Retrying, 2 of 3.


The command "ruby -e exit (rand() * 3).to_i" failed. Retrying, 3 of 3.

\nThe command "ruby -e exit (rand() * 3).to_i" failed 3 times.\n
travis_time:finish:duration=2.129
$ echo "$TRAVIS_CMD"
ruby -e 'exit (rand() * 3).to_i'

Looks like it works fine with functions too. I can look into updating the code and give it a spin.

@lukesarnacki
Copy link
Contributor Author

@henrikhodne it would be awesome! I haven't got time for open source for some time now and I am sad seeing this PR staying here for so long.

@sarahhodne
Copy link
Contributor

@lukesarnacki No worries! Thanks for laying all the groundwork for this.

@sarahhodne
Copy link
Contributor

Okay, I'm going to close this in favour of #263.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants