Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robot lurches while trying to balance #101

Open
va3wam opened this issue Jan 13, 2021 · 5 comments
Open

Robot lurches while trying to balance #101

va3wam opened this issue Jan 13, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@va3wam
Copy link
Owner

va3wam commented Jan 13, 2021

Implement error counter logic as follows:

Create a reporting control variable that is set -1 = Report counter as it increments in real time, 0 = never report counter increments unless specifically asked, and other value is the number of seconds between timer updates to MQTT broker.
Also, have an MQTT command that causes all health counters to be sent at once to the broker.

@va3wam va3wam added the enhancement New feature or request label Jan 13, 2021
@va3wam va3wam added this to the Get Robot Balancing milestone Jan 13, 2021
@nerdoug
Copy link
Collaborator

nerdoug commented Jan 15, 2021

I've added code to display the cumulative counters for fault interrupts from the left and right DRV8825 controllers. On my TWIPe clone, the right DRV is generating fault about once a second (which is also the OLED refresh rate). The other DRV hasn't generated any faults yet. Right side faults are generated even if the bot isn't trying to balance. If you reset it while it's lying on his back, bring him up to 30 degrees from vertical where he clenches his wheels, then set him on his back again, the right side fault counter continues to climb.

Next steps:
-swap left and right DRV's and see what happens.
-recheck Vref setting on both DRV's - should be 0.85 V
-Put a new DRV in to replace the one that's generating faults
-investigate what s/w is doing to controller while bot is lying on its back
-consider possible physical pressure by CPU console cable on DRV right below it

@nerdoug
Copy link
Collaborator

nerdoug commented Jan 16, 2021

The right side DRV fault counter seems to increment whenever there is activity on either motor. I've verified Vref is 0.85 on both DRV's, swapped them and get same result: faults being counted for the right side DRV when either motor is activated. Need to verify the associations between DRV 1 & 2, left and right, and physical and GPIO pin numbers used for fault interrupts.

From circuit board silkscreen info, and continuity testing on the board:

  • DRV2 is the one closest to a corner of the circuit board
  • DRV2 connects to the motor on TWIPe's left (same as amber pushbutton)
    (above are correctly documented in SB7D-stepper-wiring.odg)
  • DRV2's fault pin connects to GPIO pin 32, physical pin 20 on the CPU
  • DRV1's fault pin connects to GPIO pin 13, physical pin 25 on the CPU
    (above is correct in huzzah32_pins.h)
    (I've added more info to sb7D-pinouts-CPU.ods)

reviewing the source code

  • the gp_DRV2_FAULT gpio pin is attached to the leftDRV8825fault ISR, which is correct
  • that ISR correctly increments health.leftDRVfault
  • above is correct for the DRV1 / right as well
  • the right eye display routine (actually in updateLED() ) outputs 3 numbers on the 4th line, separated by stars:
    percent time in MQTT routines, health.leftDRVfault, health.rightDRVfault
  • it's the last number that's incrementing, i.e. the one for DRV1, right side, GPIO pin 13, physical pin 25

So, the faults don't follow the physical DRV chip, but are always reported for DRV1. I don't see any software bugs that would cause incorrect fault counting. Thus I turn to hardware causes, and I see notations in several places that using GPIO 13 may conflict in some way with the onboard LED. I think 13 was used due to circuit board layout constraints before we started using a double-sided layout.

I'll investigate possible board mods to move the DRV1 fault line to a different CPU pin. Candidates are:
GPIO 26, physical 5
GPIO 34, physical 7 (input only pin, but that's OK)
GPIO 36, physical 9 (input only pin, but that's OK)

@va3wam
Copy link
Owner Author

va3wam commented Jan 16, 2021 via email

@va3wam
Copy link
Owner Author

va3wam commented Jan 18, 2021

Mods from Doug have been applied to both robots. Initial testing looks good (at least the error counters do not climb any more). Need to be sure that error counts do not climb during an extended PID tuning session. The fix was to not use GPIO13. The Huzzah32 onboard LED circuitry may have been messing with electron routing.

@nerdoug
Copy link
Collaborator

nerdoug commented Jan 19, 2021

I tried to reproduce lurching behaviour after fixing DRV1 fault / CPU LED conflict, and didn't really come to a conclusion. However, I came across another problem that could cause lurching behaviour, which is described in this email:

From: Doug Elliott canoe.eh@gmail.com
Date: Mon, Jan 18, 2021 at 11:04 PM
Subject: Twipe Performance
To: Andrew Mitchell va3wam@gmail.com
Cc: Doug Elliott canoe.eh@gmail.com

I was starting to look at Twipe' tendency to lurch occasionally, and captured the telemetry data from a balance run to a copy of the spreadsheet.
The data shows a couple of times where the time between readIMU calls was much bigger than the expected 12 msec. I decoded the runbit info, and each time, updateLED, which is runbit 24, had just executed. When I looked at the timestamps, they were indeed a second apart. Except there was one similar case which was off by a half second. This vaguely rang a bell, and sure enough, in loop(), there's a routine call to update the network info in the left eye in parallel with calling updateLED()., every half second.

This might be a factor in our lurching behaviour, but I seem to remember that we saw lurching before I added the CPU usage display. Maybe the network display was enough to cause trouble?
Anyway, I have some ideas on how to fix this, and will try to put some code together tomorrow. 
I'll attach the spreadsheet in case you want to poke around in it. The CPU display updates are in rows 20, 102, 184 (exactly 82 rows apart)

The netinfo displays happen at each of the above, plus at line 60, 142. (exactly 82 rows apart)

recording ideas before I forget them:

  • have netinfo and CPU display run in alternate half seconds
  • reduce complexity of both displays     
    • would constant width font reduce overhead to build the display?
  • see if there's a way to overwrite part of the OLED rather than complete rewrite     
    • if so, have smaller sequential display tasks, allowing IMU to be serviced between them
  • use FreeRTOS to give IMU routine priority with pre-emption

bal-0118-2208.zip

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants