Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resetting the whole library #31

Closed
lakeroe opened this issue May 10, 2018 · 97 comments
Closed

Resetting the whole library #31

lakeroe opened this issue May 10, 2018 · 97 comments

Comments

@lakeroe
Copy link

lakeroe commented May 10, 2018

Hello,

I'm running the library on an ARM STM32F103C8T6 and this works very well, but now I've noticed a strange behaviour.

After power-on the first received time if (clock_state >= Clock::free) is always valid and accurate but after some hours (or days) the time starts to drift away and sometimes also the date is wrong allthough the inaccuracy should be below 200 ms according to the comment in dcf77.h. This seems to happen especially in a quite noisy environment.

I assume this problem is related to: #27

Because I use this library to synchronize my ARM built-in realtime-clock I need a reliable time only once in a while. So my idea is to reset the library everytime a valid time is received (clock_state >= Clock::free).
My C++ knowledge is quite limited but I've managed to add a reset-function which calls DCF77_Clock::setup() and DCF77_Local_Clock::setup() and I call this reset-function everytime a valid time is received.

I'm currently running a test for about 24h and this seems to work well.

Do you think this is the right approach for my problem and is the reset-function correctly implemented ?

Thanks and best regards,
lakeroe

@udoklein
Copy link
Owner

udoklein commented May 10, 2018

My suspicion would be that this is caused by an unstable local oscillator. You are right that a similar issue was described #27 in the past. However I was never able to reproduce it at all. Without any means to reproduce it I have no way to locate the issue and fix it.

I suggest that you first try to tell the library that your oscillator might be unstable, e.g. set has_stable_ambient_temperature to false.

dcf77/dcf77.h

Line 123 in 1103b38

// static const bool has_stable_ambient_temperature = false; // outdoor deployment

If this fixes the problem tell me about it. If this does not fix the problem I have no idea what is wrong. If you really need to enforce a reset of the libary this is in fact done by calling DCF77_Clock::setup(). This may work around your issue. For you this might be good enough. For me this is really annoying because I still have no clue why my library fails for some people.

Most probably I would need someone who can reproduce the issue and who lives close enough me such that I could visit and instrument the hardware with suitable measurement equipment.

@lakeroe
Copy link
Author

lakeroe commented May 10, 2018

Thanks for your reply.
I will do further tests and report back in a few days ...

@lakeroe
Copy link
Author

lakeroe commented May 14, 2018

I've done some more testing on two different hardwares:

Board 1: Arduino clone (but definitly quartz crystal based)
Board 2: STM32F103C8 blue pill board (http://wiki.stm32duino.com/index.php?title=Blue_Pill)

For all tests I used two different "Reichelt modules" and following time as a reference: https://www.uhrzeit.org/atomuhr.php

1st test: has_stable_ambient_temperature = false (as you suggested)
I powered-up the boards and waited for the best possible signal quality (clock_state==synced). Then I wrapped the antennas in aluminium foil to disable dcf77-reception. I verified this by the non-blinking dcf77_monitor_led.
Result after about 24h
Board 1: clock deviation ~ 0 seconds, correct date, clock_state=2 (free)
Board 2: clock deviation ~ +4 seconds, correct date, clock_state=2 (free)

2nd test: has_stable_ambient_temperature = true
Same procedure as before but changed software.
Result after about 24h
Board 1: clock deviation ~ 0 seconds, correct date, clock_state=2 (free)
Board 2: clock deviation ~ +4 seconds, correct date, clock_state=2 (free)

3rd test: has_stable_ambient_temperature = true
I powered-up the boards and waited for the best possible signal quality (clock_state==synced). Then I put the antennas in an upright position to simulate worse dcf77-reception.
Result after about 22h
Board 1: clock deviation ~ +94 minutes, correct date, clock_state=3 (unlocked)
Board 2: clock deviation ~ -4 seconds, wrong date (14.7.2053), clock_state=4 (locked)
exactly 62304s after power-on the date changes from 14.5.2018 to 14.7.2053

My modest opinion:
The first two tests show similar behaviour. Allthough the clock deviation on the second board is quite large it should be okay.
According to your github page a drift of up to 100 ppm is within the design margins of the library (4s in 24h is only about 46 ppm).
I think one problem is the library stays in clock_state 2 (free) even there's now reception for 24 hours.
One possible solution is to add a watchdog: If the input pin doesn't toggle for a specified amount of time it resets the library.

Test three shows the typical behaviour I mean. Board one shows large clock deviation and board 2 wrong date.
This test is not so meaningful as the result changes a bit everytime. For more reliable results more testing is needed.
But it should be quite easy to reproduce by simulating bad dcf77-reception (put the antenna near metal objects or partly wrap it in aluminium foil).

Maybe this way it's possible for you to reproduce the problem !

To finish with something different:
Further up you wrote resetting the library is done by calling DCF77_Clock::setup().
This does NOT work for me, I also have to call DCF77_Local_Clock::setup().

Thanks and best regards,
lakeroe

@udoklein
Copy link
Owner

Can you record some trace with the "Swiss Army Debug Helper" in mode Dm ? At least enough to reproduce the issue + some more hours?

With regards to the simulation of bad reception: I tried several things in the past. E.g. the standalone debug helper. So far I found no way to reliably simulate real word noise. I know that in theory a lot of the noise can be modelled, but reality is still different. Without real world data I have no chance. The only thing that I can say for sure that I can not reproduce the issue with my setup. And I tried quite a lot of things (e.g. putting the antenna on top of magnetic loudspeakers or on top of motors). Sometimes I just pulled the antenna for some hours. The issue is that this approach is very time consuming and I can not reproduce the behaviour so far.

@lakeroe
Copy link
Author

lakeroe commented May 14, 2018

Okay, I'll try to post a log to analyze the behaviour.

Regarding to reset the whole library:
Does it make sense to you to also call DCF77_Local_Clock::setup() ?

@udoklein
Copy link
Owner

Good point. Yes, this is a good idea. Maybe I should add it as already suggested in issue #30. Well, maybe if it hits the third time ;)

@lakeroe
Copy link
Author

lakeroe commented May 15, 2018

Please find a first LOG attached (this is just to give you a start).
The date is correct but the time is off by 195 seconds after about 22 hours.

BTW the procedure for this log (and the followings) is always the same. First I put the antenna in an optimal position, then I wait for the best clock_state possible (synced) and then I put the antenna in a "non-optimal" position ...

I'm currently running further tests (both on AVR and ARM board), so hopefully I can provide a more significant LOG soon ...

dcf_2018-05-15_AVR.zip

@lakeroe
Copy link
Author

lakeroe commented May 17, 2018

Just a short intermediate result:
So far I was not able to reproduce the wrong date behaviour using the "Swiss Army Debug Helper".
But I'll keep trying ...

Furtermore there are two more questions:

  1. In order to compile your library for an ARM STM32F103C8T6 running at 72MHz I added one
    #if defined(STM32F1) #endif section at the end of dcf77.cpp
    Could you just have a quick look to see if there's any obvious mistake ?
    Is there anything to change for the different CPU frequency ?
  2. In the meantime did you have a chance to look at my LOG above and do you have an explanation for the big clock deviation ?

@udoklein
Copy link
Owner

I do not have an STM board to verify. However the code looks sound.

With regards to log file I create a little analysis helper here. It detected no anomaly.

Then I looked at it with the standalone debug helper

Clock State Statistics
  0 useless : 511
  1 dirty   : 38
  2 free    : 77918
  3 unlocked: 0
  4 locked  : 1201
  5 synced  : 6

Clock State Transition Statistics
  useless  => useless : 511
  useless  => dirty   : 1
  dirty    => dirty   : 37
  dirty    => synced  : 1
  free     => free    : 77917
  locked   => free    : 1
  locked   => locked  : 1200
  synced   => locked  : 1
  synced   => synced  : 5

Quality Factor Statistics
  0: 79630
  1: 38
  2: 6
  3: 0

Conclusion: it was synced for just 6 seconds and then the signal degraded. I have two possible explanations.

  1. The local clock is unstable or has a large drift.
  2. The phase lock "locked" to the noise and thus started to drift away

That is for (2) there must have been some correlation of the noise with the filter kernel. This could be "easily" fixed by more advanced signal processing. The issue is that my main target platform (Atmega328 aka Arduino) has so little memory and CPU. With other words: I know how to do this on bigger and better machines but given the tiny resources I have not yet found any really good solution for this challenge.

Is the noise that you pick up real world relevant or is it just artifically introduced fading by turning the antenna? Where are you located? Is this issue really relevant in practice?

What I am aiming at: my decoder is of statistical nature. As the noise level increases I have only two options:
(1) reject the signal
(2) try to decode anyway accepting that there is a risk that it will pick up unreasonable data

I opted for (2) because I did not want to implement an arbitrary limit. If you figure out that this is unacceptable in your situation then (1) would also fail to deliver any reasonable time. Now if you want to detect if (2) picks up questionable data I suggest to implement a simple check if time starts to jump backwards or if the decoded time drifts way to much relative to the local clock. I suggest to double check before that the local clock is definitely within the 100 ppm design limit.

@lakeroe
Copy link
Author

lakeroe commented May 17, 2018

Thanks for your analysis.
As I already wrote, I waited for a complete sync and then artifically introduced the signal fading by turning the antenna. My location is near here.
For me this issue is not relevant at all, because I only need a reliable time once a day and this can be easily achieved by resetting the whole library as mentioned above.

Regarding to double check the local clock 100 ppm design limit. I don't have the possibility to accurate measure the quartz crystal frequency but how about following idea:
I use the 1000 Hz Systick Timer to create a clock and compare it to a reference (www.uhrzeit.org/atomuhr.php). If it deviates less than 8,6s in 24h (=100 ppm) then I assume it's okay ?

@nameoftherose
Copy link

nameoftherose commented May 24, 2018

No we are not close at all. Please ignore this comment
@lakeroe We are just a few kilometers apart ...
What dcf module are you using? How long does it take to sync?
How long it stays in sync per day? (in my case about 8h/d)

@udoklein
Copy link
Owner

Just send me your phone number by private mail. Maybe we can have a call during the weekend. With regard to the issue, you say you use systick. Thus I assume you are running on ARM. On Saturday I will setup a test with an ARM board for 24h and see if there are any regressions. With my board I had never any issues so far as the signal quality around here is way above the limit for my library. So either there is some regression that did not catch my attention or there is something with your setup. 100 ppm should be absolutely OK, but there may be other issues. I suggest to discuss this by phone (or in person).

@udoklein
Copy link
Owner

So far my tests show nothing unusual. Maybe you could run the debug helper in mode DA to see if your clock is really stable. Sometimes it is not the hardware but interrupts that mess up the timing.

@udoklein
Copy link
Owner

udoklein commented May 27, 2018

After 48 hours on Arduino Due (ARM) with a Pollin Module everything looks perfectly well.

   168310, +------XXXXXXXXXXXXXXXXXXXX85-+---------+---------+2X9--7---+---------+3--------+---------X---------
confirmed_precision ?? adjustment, deviation, elapsed
0.0625 ppm, @+ , 24.8750 ppm, -1 ticks, 412 min + 38007 ticks mod 60000
57/100~25/600

Decoded time: 18-05-27 7 21:31:43 CEST ..
  18.05.27(7,0)21:32:42 MEZ 0,0 26 p(67457-40:255) s(191-0:26) m(244-5:33) h(241-0:33) wd(249-1:34) D(246-3:34) M(245-1:34) Y(250-4:34) 127,32,32,50

I have to assume that it is something with your setup. The question is what?

@lakeroe
Copy link
Author

lakeroe commented May 28, 2018

Sorry for my late reply. I was not at home the last days ...

I could observe the drifting clock on both AVR and ARM based board.
One thing I noticed is to have a clean and stable power supply. When powering the whole setup by a very cheap "usb china power supply" I don't get any reception at all.

As suggested, I will run the debug helper in mode DA and report back.
Furthermore I can offer to send you a complete ARM based affected hardware for your investigations.

I'm quite busy until sunday but next week we can have a call if you like.

@nameoftherose
If you're interested you can send me your exact location by private mail ...

@nameoftherose
Copy link

@lakeroe I live in Heraclion Crete (35.332832, 25.121835), 2000km from the DCF77 transmitter. I do not have your email. I am using this library on an UNO on breadboard. My problems were due to signal fading and too much noise (from power supply and the antenna). Since December 17 it is working reasonably reliably.

@lakeroe
Copy link
Author

lakeroe commented May 29, 2018

Please find another LOG attached (in DA mode).
Interestingly the date is wrong but the time is accurate.

dcf_2018-05-28_ARM.zip

@nameoftherose
Copy link

nameoftherose commented May 30, 2018

What is the length of your antenna?

@lakeroe
Copy link
Author

lakeroe commented May 30, 2018

I'm using this DCF77 module and the antenna is about 55mm long and 9mm diameter.
The reception quality depends very much of the orientation of the antenna ...

@nameoftherose
Copy link

nameoftherose commented May 30, 2018

This is a graph of the various performance metrics produced by swiss_army_debug_helper.
qf
The phase_lock (marked p) is erratic. Assuming the library has been properly ported, I would say it is noise. Filter power supply - I am using LC filters - even use battery or feed from a laptop operating from its battery. Reset the clock whenever calculated week_day differs from the decoded one.
For my location (2000km from the transmitter) a 55mm antenna is too short, but you are much closer.

@lakeroe
Copy link
Author

lakeroe commented May 30, 2018

Thanks for your analysis, you can clearly see the dependancy of the antenna orientation.
At the beginning the phase is locked for some time (antenna is horizontal and pointing towards dcf77 sender).
Then it looses the lock (antenna was turned vertical) and at the end it's locked again (because antenna is horizontal and pointing towards dcf77 sender again).

How does your antenna look like ?

@nameoftherose
Copy link

I am using the Conrad module. Its antenna is 50mm.

@udoklein
Copy link
Owner

Looking at the analysis of nameoftherose there is very interesting behaviour. First sof all the phase lock fluctuates a lot. Then during the period with the poorest phase the month and day decoder "lock" to the noise. Once the signal gets really good (before 17:00) the other decoders increase in quality. However the mistakenly locked decoders start to converge to the proper value. Hence the quality decreases.

The easiest way to fix this mess is to not allow the "slower" decoders to decode before the fast decoders are ready. That is: require that locks can only be acquired in the order phase, seconds, minutes, hours, days, months, years.

I think this would fix it. There is a catch though. The current approach allows for significantly faster startup. I think I will fix it by introducing another configuration flag. What do you think about this proposal?

With regards to using the weekday as some kind of checksum this will unfortunately not fix it. The reason is that if your signal quality is such that this issue happens then introducing this as a checksum will fix it for 6 out of 7 days of the week. The rest there will be still this issue.

There is one more thing that I could do: I could introduce additional "flat out detection bins". The price to pay is 1-2 additional bytes of sram consumption per decoder. Thus about 12 additional bytes. This would also help to relieve this issues.

Thus the action plan would be as follows:

  1. Add a flag "optimistic decoder" which toggles the way the decoders are working. TRUE = as currently implemented, FALSE = do not allow the slow decoders to decode before the fast ones are at least marginally ready.

  2. If the flag is TRUE use the weekday information for faster transition to "synced", otherwise require improved consistency.

  3. Add a flag "flat out detector". If FALSE go ahead as today. If TRUE add additional detection bins for flat out 0 and flat out 1. (but not for minutes or hours, for those only for flat out 1). If the decoder locks to the flat out signal then void all successive decoder stages and do not allow to sync.

What do you think about this approach? In particular about the choice of the default values? Should I default to maximum robustness (paying with more memory and slower initial sync) or should I default to the optimistic setup and require that people with extra poor reception need to adapt the configuration?

@udoklein
Copy link
Owner

@nameoftherose By the way: what did you use to plot the log statistics?

@udoklein
Copy link
Owner

@lakeroe If you send me your hardware then I also might be able to gain more insights. Did I understand it right that you are within <300 km from Frankfurt?

One more question: why are you putting the antenna intentionally into a poor reception position? This is a very interesting approach but I would assume that the antenna is mounted in an optimal orientation and that the library will only have to tackle "other noise". What you are doing is basically raising the noise floor by more than 10d dB, maybe even 20 dB. This is great for testing but close to Frankfurt it is somewhat pointless. So I am wondering why you do this?

BTW: together with the analysis of nameoftherose I think this finally shed some light on why some people reported issues with my library that I was not able to reproduce. Obviously your test approach is a good idea :)

@udoklein
Copy link
Owner

I analyzed the log and with the hints of the picture by nameoftherose the plot thinkens. Your decoder module is biased to all 1. Thus during periods of bad reception it shows a different behaviour than mine. Mine biases to 0. This also explains why I was never able to figure this out. My module is biased differently. I will fix the library. However I am very busy right now. I can not promise that I will be finished in June. This might take some weeks.

@nameoftherose
Copy link

nameoftherose commented May 31, 2018

The graph is created as follows:

  1. minicom creates a timestamped log
  2. the log is proccessed with awk -f qf.awk to produce qf.csv
  3. qf.csv is opened in gnumeric.
  4. qf.gnumeric contains links to qf.csv, when opened the graph is shown.
  5. the dates in cells A1, A2 are the graph limits, have to be adjusted.
    This could be done in any spreadsheet.

@nameoftherose
Copy link

nameoftherose commented May 31, 2018

@udoklein
Yes I think your action plan is very promising.
Take your time.
Thank you.

@lakeroe
Copy link
Author

lakeroe commented Jun 4, 2018

@udoklein
Regarding your questions

*) I think your plan sounds promising and we should give it a try.
Once the new software is available for sure I can run further tests and post logs.

*) Default values
I would go for maximum robustness because I think it's more important to have a reliable time than to save some bytes on RAM and faster sync.

*) My hardware
Due to the newest findings do you think it's still necessary to send my hardware to you ?
If yes, another possibility is to by your own hardware. It's just so cheap and probably the same price than sending mine to you and back. The hardware I use is
Pollin modul 5,45€ + 4,95€ (shipping)
STM32F103C8 blue pill board 1,85€ from china
Is that okay for you ?

*) Distance from Frankfurt
According to Google Maps I'm 407km away (in Austria near Salzburg).

*) Antenna orientation
I put the antenna in a poor reception position to get a quicker result.
If it's in an optimal position I might never reproduce the drifting time problem ...

*) Time schedule
No hurry and take your time. It's finished when it's finished !

@udoklein
Copy link
Owner

udoklein commented Jun 4, 2018

Well, I used a Pollin module for my tests --> in my location it is differently biased. As you say most probably I will gain no insights from your hardware. However you could do me a favour. Now that we know what is most likely the issue it would be nice if we can create a log file which will reproduce the issue with my standalone debug helper. Unfortunately the log file with the issue does not. The reason is that the log does not capture the full information prior to the first sync.

There would be two options:

  1. I synthesize this information
  2. Hopefully easier: you create another log just the way you did when you reproduced the error. But this time instead of turning the antenna after the first sync give it another 20-30 minutes to acquire a better lock, then turn the antenna and wait till the issue reoccurs.

According to the theory (2) should still reproduce the error but it should also capture enough signal such that the standalone debug helper will also recreate the issue. This would help me a lot during testing.

Would this be possible for you? I am not in a hurry. If this takes 2 or 3 weeks it would still be in time for testing.

@nameoftherose
Copy link

@udoklein how does the adjustment value (printed by the Swiss_Army_Debug_Helper in mode Da) relates to crystal accuracy/stability;
For a crystal with Frequency Tolerance ±10ppm and Frequency Stability Over Operating Temperature Range ±10ppm (according to manufacturer), I get an adjustment 39 - 40ppm. Is that reasonable? What adjustment values have been observed?

@udoklein
Copy link
Owner

Because KST was always a little bit on the slow side when plotting I created an IPythone notebook to speed up the processing of the log files. You can find it here.

@udoklein
Copy link
Owner

udoklein commented Jul 15, 2018

Here are the first 4 plots I generated from lakeroes logs:
dcf_2018-07-01_arm_aggressive

dcf_2018-07-02_arm_standard

dcf_2018-07-03_arm_conservative

dcf_2018-07-04_arm_paranoid

@udoklein
Copy link
Owner

@nameoftherose what is acceptable and what not is determined by the datasheet of the crystal. Usually I observe adjustments in the range of +/- 50 ppm or less. Thus 40 ppm is OK. However the algorithm assumes that the crystal is stable. Stability is not checked or tuned by the algorithm. Thus this value gives no indication if it is stable or not.

If you signal quality is somewhat reasonable you can check stability also in mode Ds by visual inspection. If you scrool through the log file and the signal drifts at a more or less constant rate, then the crystal is stable. If the rate is not almost constant then the crystal is unstable.

@nameoftherose
Copy link

nameoftherose commented Jul 15, 2018

@udoklein thank you.
Signal quality is awful!

@lakeroe
Copy link
Author

lakeroe commented Jul 16, 2018

@udoklein
Regarding the stability of my oscillator I might do some more testing as you suggested (but can't really promise when).

Regarding the plots you've created from my logs: How do you interpret them ? Is your library working as expected ?

@udoklein
Copy link
Owner

Looking at the plots the library works as expected. However I also see that I need more sophisticated analysis tools to gain deeper insights. This may take some time.

@lakeroe
Copy link
Author

lakeroe commented Jul 24, 2018

Due to my poor oscillator long term accuracy I did another test.
On my ARM STM32F103 Blue Pill board I changed the original (probably chinese) oscillator to this one here and also replaced the original load capacitors to 15pF.

Then I repeated my test from further above. This time the clock was off by about 3 seconds after 86 hours which equals to about 10 ppm. This is an improvement by a factor of 5 and I hope this also leads to an improved and more stable short term accuracy.

Using this change I started further dcf77 receive tests with different antenna positions.

1. conservative_minute_quality

21.07.2018 09:55h Start, antenna horizontal and rotated by 90 degrees towards Frankfurt (best position)
approximately 1/2h after clock_state==synced I rotated the antenna by 90 degrees (worst position)
22.07.2018 20:00h antenna vertical (2nd best position)
23.07.2018 7:00h antenna horizontal and pointing directly towards Frankfurt again (worst position)
23.07.2018 20:25h End

-> No matter how I orientate the antenna, the time is ALWAYS accurate !!!

LOG-File: dcf_2018-07-21_ARM.zip

2. aggressive_minute_quality (old behavior)

23.07.2018 20:48h Start, antenna horizontal and rotated by 90 degrees towards Frankfurt (best position)
approximately 1/2h after clock_state==synced I rotated the antenna by 90 degrees (worst position)
24.07.2018 07:00h clock drift +3s
24.07.2018 16:38h End, decoded date and time completely wrong

LOG-File: dcf_2018-07-23_ARM.zip

-> Time is still drifting and also decoded wrong

Conclusion:
I think this experiment shows the importance of an accurate and stable oscillator and there impact.
Furthermore you can clearly see the improved library works as expected.

@nameoftherose
Copy link

nameoftherose commented Jul 26, 2018

@lakeroe now that you have a known accuracy crystal, will you please tell me what the output of frequency analysis command Da is (in optimum antenna position)?
@udoklein is direct transition sync ↔ free valid?
@udoklein free → sync is (I think) valid because Quality Factor Sync Threshold: 1, but sync → free ?
By the way the weekday consistency constraint would have prevented the erroneous date time decoding. I might add it when it gets cooler ...

@lakeroe
Copy link
Author

lakeroe commented Jul 26, 2018

According to another test I have to revise my opinion. Even with paranoid_minute_quality I get wrong date/time.

1. conservative_minute_quality

24.07.2018 20:20h Start, antenna horizontal and rotated by 90 degrees towards Frankfurt (best position)
approximately 1/2h after clock_state==synced I rotated the antenna by 90 degrees (worst position)
25.07.2018 07:05h clock drift +8s
25.07.2018 16:24h End, clock drift +9s

LOG: dcf_2018-07-24_ARM.zip

2. paranoid_minute_quality

25.07.2018 16:30h Start, antenna horizontal and rotated by 90 degrees towards Frankfurt (best position)
25.07.2018 17:08h clock_state==synced
25.07.2018 18:24h antenna rotated by 90 degrees (worst position)
25.07.2018 21:40h clock_state==synced, time accurate
26.07.2018 16:27h End, time and date completly wrong

LOG: dcf_2018-07-25_ARM.zip

@nameoftherose
I can't do any testing on a UNO anymore. The one I had was borrowed and I had to return it.
Here is the output of "Da" (after about 90 minutes with antenna in optimum position):

confirmed_precision ?? adjustment, deviation, elapsed
0.5000 ppm, @+ , 5.8125 ppm, -2 ticks, 17 min + 52999 ticks mod 60000
54/100~0/600

Does this answer your question ?

@nameoftherose
Copy link

nameoftherose commented Jul 26, 2018

@lakeroe yes thank you! Never mind the UNO.

@udoklein
Copy link
Owner

@nameoftherose The state engine is implemented in DCF77_Local_Clock::process_1_Hz_tick. The intended state transitions are documented here: https://blog.blinkenlight.net/experiments/dcf77/local-clock/ and the state diagram is here: https://blinkenlightblog.files.wordpress.com/2013/02/c13_dcf77_clock_states.png. In particular there should be no direct transition synced --> free.

However it is possible to transition from synced to useless by calling the setup method after the clock has synced.

@udoklein
Copy link
Owner

@lakeroe: I was on vacation so I did not follow this thread closely. What is your current state of affairs? Does the library work as expected in the good position or does it fail also in the good position?

@nameoftherose
Copy link

nameoftherose commented Jul 31, 2018

In file dcf_2018-07-23_ARM.zip line 17406 a direct transition synced → free is recorded.

@udoklein
Copy link
Owner

You refer to this:

Decoded time: 18-07-23 1 21:31:56 CEST ..
  18.07.23(1,1)21:32:55 CET 1,1 4 p(9000-8644:22) s(191-27:23) m(200-124:10) h(188-122:9) wd(114-82:5) D(190-154:5) M(158-118:5) Y(130-105:3) 72,24,22,24
Clock state: synced
Tick: 2

     2656, XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX---+--------2XXXXXXXXXXXXXXXXXXXX

Decoded time: 18-07-23 1 21:31:57 CEST ..
  ??.??.??(?,?)??:??:255 CET 1,1 0 p(9000-9000:0) s(0-0:0) m(0-0:0) h(0-0:0) wd(0-0:0) D(0-0:0) M(0-0:0) Y(0-0:0) 72,24,22,255
Clock state: free
Tick: 150

This is indeed interesting but admissible. Have a look at https://github.com/udoklein/dcf77/blob/master/dcf77.h#L1570


            if (clock_state == Clock::synced || clock_state == Clock::locked) {
                // the important part is 150 < 200,
                // otherwise it will fall through to free immediately after changing to unlocked
                if (tick >= 1150) {
                    // The 1 Hz pulse was locked but now
                    // it is definitely out of phase.
                    unlocked_seconds = 1;

                    // 1 Hz tick missing for more than 1200ms
                    clock_state = Clock::unlocked;
                    Clock_Controller::phase_lost_event_handler();
                }
            }

            if (clock_state == Clock::unlocked || clock_state == Clock::free) {
                if (tick >= 1000) {
                    tick -= 1000;

That is the state engine will transition from synced to unlocked and then immediately transition from unlocked to free. This happens during one cycle. You can argue if this is a direct transition or not. The explanation is as follows: the phase quality gets very low and the algorithm can not determine anymore if it just lost the phase or if it was already out of phase.

The point is that tries dead reckoning of the phase in state "unlocked". If this is not possible then it has to default to free. What does this imply? In state "unlocked" it can leverage the knowledge of the approximate time to still reliably synthesize the target signal that making it slightly easier to reaquire the signal. If this is not the case then it will more or less "cold start".

It can be argued if there is a better way for the transition from synced to unlocked to decrease the likelyhood that it defaults to free. This is a very tricky question and in my opinion not really worth the effort for optimization. I could be done though.

@nameoftherose
Copy link

@udoklein thank you.
Transition to free happens only in line 1603 dcf.h, and can only happen
if (unlocked_seconds > max_unlocked_seconds), which is not fulfilled in this case as unlocked_seconds = 1; in line 1576.
But it is a subtle point ...

@udoklein
Copy link
Owner

udoklein commented Aug 1, 2018

Yesterday was to hot in Germany. I should have spotted this on my own. You are of course right. I think this should be handled in a separate issue (#34).

@udoklein udoklein closed this as completed Aug 1, 2018
@udoklein udoklein reopened this Aug 1, 2018
@udoklein
Copy link
Owner

udoklein commented Aug 1, 2018

@nameoftherose Thanks a lot for noticing this one. I fixed it by now :)

@nameoftherose
Copy link

@udoklein , I should have also spotted that this was caused by the uninitialized max_unlocked_seconds, it is too hot in Crete too ...

@lakeroe
Copy link
Author

lakeroe commented Aug 9, 2018

@udoklein
Sorry for my late answer but I've also been on vacation.
Concerning my current state of affairs:
I've currently running a test for about 27 hours. The antenna was in the good position all time and I didn't touch the whole setup. In the beginning everything was fine.
After about 21 hours I noticed a clock drift of about 30 seconds.
After about 27 hours the clock was in sync again but the output of Da showed

confirmed_precision ?? adjustment, deviation, elapsed
0.5000 ppm, @- , 400.0000 ppm, 7829 ticks, 332 min + 54830 ticks mod 60000
0/100~0/600

before it showed

confirmed_precision ?? adjustment, deviation, elapsed
0.5000 ppm, @+ , 3.9375 ppm, 50 ticks, 20 min + 4051 ticks mod 60000
54/100~0/600

LOG-file: dcf_2018-08-08_ARM.zip

So I can't really be satisfied.
Do you still think this is a problem of my oscillator ?
Could you please also describe in a few words the difference between the displayed 0.5000 ppm and 3.9375 ppm ?

@udoklein
Copy link
Owner

udoklein commented Aug 9, 2018

1 ppm = one part per million = 1 / 1 000 000 = 1e-6
Here the ppm refers to the local oscillator frequency. E.g. if your (crystal) oscillator would be running at 16 MHz, then 1 ppm = 16 Hz.

What you describe is a very clear indicator of a large clock drift. In my opinion your local oscillator is unstable. If you scroll through your file then you can even see (looking as the "scope" output) that your clock is stable for a while and then starts to drift with an excessively high rate. This looks a lot like a way out of spec oscillator. Unless you can bring up any really convincing argument why your oscillator is good I would say the oscillator is at fault with a probably >95%.

If you do not have a frequency counter to check if the oscillator is good you could benchmark it with a second Arduino like this: https://blog.blinkenlight.net/experiments/measurements/crystal-deviations/ or like that https://blog.blinkenlight.net/experiments/measurements/crystal-deviations-2/.

In my opinion this issue is a hardware issue and there will be no software fix for it at all.

@lakeroe
Copy link
Author

lakeroe commented Aug 10, 2018

Because I've already tested an original Arduino Uno and a few different STM32F103C8T6 boards and they behaved all pretty much the same, I can hardly believe it's an oscillator problem.
So I checked again if I ported the library properly to the STM32 and found following issue:
want_high_phase_lock_resolution was set to false (10 ms sample resolution), but dcf77_process() was called every 1ms

Do you think this explains anything ?

Maybe you could also have a look at the changes I made to the library (they are marked by * lakeroe *) ...
dcf77_lakeroe.zip

If this all does not help, are you interested in finding the root cause of the problem ?
I could send you an STM32-board+programming adapter and a DCF77 receiver ...

@udoklein
Copy link
Owner

The Uno is officially not supported because it does not feature a crystal oscillator for the controller. The STM32 is not officially supported. The point is that for officially supported hardware I got only onece a similar issue and this was caused by the oscillator.

Notice that the (in)stability of the crystal oscillator is not only affected by the crystal but also by other parts, in particular the load capacitors. With cheap boards the oscillator design might be suboptimal.

If you want me to analyze the issue please send me a board + programming adapter. My standard policy is to NOT return boards I get sent for testing. In particular if they are not officially supported. If this is OK for you, send me the hardware. Otherwise I highly recommend to get a frequency counter and check the hardware.

@nameoftherose
Copy link

@lakeroe , do you have a commercial dcf clock? How does it perform relative to the board? Can you experiment with an officially supported board?
I am running the library on a atmega on breadboard with run of the mill crystal and capacitors for the last 8 months almost continuously. I am located 2.200km from the transmitter. I have observed just 2 cases of hour errors and the clock corrected within an hour or so. The error between clock and computer was never more than 200ms at the end of free periods. It reaches synced state every night for 6-10 hours.
The only thing special is that the board power supply (from usb) is passed through LC filters.

@lakeroe
Copy link
Author

lakeroe commented Aug 11, 2018

For now I will not do any further testing, but maybe I will check my local oscillator in the future.
The library works for me by just resetting it everytime a valid time is received.

Anyway, thanks for your comments and help so far ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants