Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degradation of GPS signal during calibration results in skipped Transmit cycles #5

Open
ve3wmb opened this issue Oct 31, 2019 · 0 comments

Comments

@ve3wmb
Copy link
Owner

ve3wmb commented Oct 31, 2019

Note below that during calibration, each iteration should take 10 seconds, with the log trailing by about 1 second, so timestamps
for each iteration should be about 11 seconds apart. Note the frequency of the iteration logged at 2:36:9 that I have put in bold font below :
It is 73599940 !! Which I can't explain. Most importantly if you look at the timestamps, that iteration took 24 seconds !
You'll note that something similar happens a few more times during this calibration cycle (highlighted in bold). The cumulative effect is that
the total time needed for the whole calibration cycle is 5 minutes and 12 seconds which is way too long.

If you note the timestamp on the SWERR that I highlighted in green bold ... it is at 2:40:1. What this is saying is that we completed calibration
and are waiting for a TELEMETRY_TIME_EV (value 4) but instead we are getting a PRIMARY_WSPR_TX_TIME_EV (value 6) which is what you would
normally expect to see from the scheduler at 40 minutes after the hour. We expect this sort of scenario to happen on startup, as the startup time is asynchronous and the scheduler is synchronous.
Once we have completed a full transmit time cycle the two should be in sync.

In this case what has happened is that the overrun on calibration has broken that synchronization for the current TX cycle.
The approach of the state machine is to simply ignore events that it is not expecting so in this case we get stuck in WAIT_TELEMETRY_ST (value 3) till the next time around
when we see a TELEMETRY_TIME_EV to unstick us. That causes us to skip both transmissions for that TX cycle.

2019-10-14 2:33:53 ** running CALIBRATION for 4 minutes **
iteration: 1 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:34:15 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8930 New Corr factor : -8940
iteration: 2 overflowCounter: 488 TCNT1: 18434
2019-10-14 2:34:26 Sampled Freq Hz x 100 : 0320000020 Old Corr factor : -8940 New Corr factor : -8930
iteration: 3 overflowCounter: 488 TCNT1: 18432
2019-10-14 2:34:37 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930
iteration: 4 overflowCounter: 488 TCNT1: 18432
2019-10-14 2:34:48 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930
iteration: 5 overflowCounter: 488 TCNT1: 18432
2019-10-14 2:34:59 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8930 New Corr factor : -8930
iteration: 6 overflowCounter: 537 TCNT1: 7168
2019-10-14 2:35:11 Sampled Freq Hz x 100 : 0352000000 Old Corr factor : -8930 New Corr factor : -8920
iteration: 7 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:35:22 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8920 New Corr factor : -8930
iteration: 8 overflowCounter: 488 TCNT1: 18433
2019-10-14 2:35:33 Sampled Freq Hz x 100 : 0320000010 Old Corr factor : -8930 New Corr factor : -8920
iteration: 9 overflowCounter: 488 TCNT1: 18432
2019-10-14 2:35:45 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8920 New Corr factor : -8920
iteration: 10 overflowCounter: 1123 TCNT1: 3066
2019-10-14 2:36:9 Sampled Freq Hz x 100 : 0735999940 Old Corr factor : -8920 New Corr factor : -8910
iteration: 11 overflowCounter: 488 TCNT1: 18429
2019-10-14 2:36:25 Sampled Freq Hz x 100 : 0319999970 Old Corr factor : -8910 New Corr factor : -8920
iteration: 12 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:36:36 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8920 New Corr factor : -8930
iteration: 13 overflowCounter: 781 TCNT1: 16383
2019-10-14 2:36:53 Sampled Freq Hz x 100 : 0511999990 Old Corr factor : -8930 New Corr factor : -8920
iteration: 14 overflowCounter: 781 TCNT1: 16382
2019-10-14 2:37:10 Sampled Freq Hz x 100 : 0511999980 Old Corr factor : -8920 New Corr factor : -8910
iteration: 15 overflowCounter: 683 TCNT1: -26628
2019-10-14 2:37:25 Sampled Freq Hz x 100 : 0447344600 Old Corr factor : -8910 New Corr factor : -8900
iteration: 16 overflowCounter: 732 TCNT1: 27646
2019-10-14 2:37:41 Sampled Freq Hz x 100 : 0479999980 Old Corr factor : -8900 New Corr factor : -8890
iteration: 17 overflowCounter: 488 TCNT1: 18432
2019-10-14 2:37:57 Sampled Freq Hz x 100 : 0320000000 Old Corr factor : -8890 New Corr factor : -8890
iteration: 18 overflowCounter: 537 TCNT1: 7167
2019-10-14 2:38:9 Sampled Freq Hz x 100 : 0351999990 Old Corr factor : -8890 New Corr factor : -8880
iteration: 19 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:38:20 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8880 New Corr factor : -8890
iteration: 20 overflowCounter: 488 TCNT1: 18430
2019-10-14 2:38:31 Sampled Freq Hz x 100 : 0319999980 Old Corr factor : -8890 New Corr factor : -8900
iteration: 21 overflowCounter: 537 TCNT1: 7168
2019-10-14 2:38:43 Sampled Freq Hz x 100 : 0352000000 Old Corr factor : -8900 New Corr factor : -8890
iteration: 22 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:38:54 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8890 New Corr factor : -8900
iteration: 23 overflowCounter: 488 TCNT1: 18431
2019-10-14 2:39:5 Sampled Freq Hz x 100 : 0319999990 Old Corr factor : -8900 New Corr factor : -8910
2019-10-14 2:39:5 >> orion PRE sm trace: curr_state: 2 curr_event: 3
2019-10-14 2:39:5 << orion POST sm trace: curr_state: 3 event_just_processed: 3 action: 0
2019-10-14 2:40:1 >> orion PRE sm trace: curr_state: 3 curr_event: 6
2019-10-14 2:40:1 ***SWERR: 2 data dump in hex: 6

Because this is working most of the time, my suspicion is that this is due to missing PPS pulses.

The ISR simply counts the number of PPS pulses it gets, stops at 10 and then tallies the value in Timer1/Counter1.
If we lose some pulses then the frequency counter will continue to run until we have seen 10 PPS pulses so that could explain the much higher frequency count values
(double the time means double the count).

After some more consideration I think that the guard timer on Calibration would solve this problem.

The issue at hand is missing transmissions, due to calibration running over its allotted time and throwing off synchronization with the scheduler, causing lost events.

This is a side effect of something which we have little control over, which is the momentary loss of PPS due to degradation of the GPS signal (we think) during a calibration cycle.

Having a few frequency measurements that are out of wack isn’t a big problem as the most damage we can do is to erroneously change our frequency by + 2.3 Hz
(correction step size is 0.1 Hz with 23 samples). What I have typically seen is that 3 or 4 measurements are off due to missing PPS pulses so this error would be more like < +0.5 Hz.

That in itself isn’t going to cause a big issue. What is the bigger problem is taking more than the allotted 4 or so minutes and causing us to miss the next event which results in holes in the transmission schedule.

A virtual guard timer (using Chrono.h) would start on reception of PPS during calibration and be cancelled when calibration is completed. We would check for expiry before each calibration iteration and fail calibration if the value was exceeded. This would ensure that we don’t lose any events and impact transmission.

Note that we have agreed that this issue does not gate Release 1. - VE3WMB

@ve3wmb ve3wmb changed the title Degradation of GPS signal during calibration result in skipped Transmit cycles Degradation of GPS signal during calibration results in skipped Transmit cycles Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant