Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: tessel unbrick #493

Closed
johnnyman727 opened this issue Dec 17, 2015 · 42 comments · Fixed by #744
Closed

RFC: tessel unbrick #493

johnnyman727 opened this issue Dec 17, 2015 · 42 comments · Fixed by #744

Comments

@johnnyman727
Copy link
Contributor

I think we should add a command to unbrick a Tessel if either the openwrt image or the firmware image is somehow corrupted.

A corrupted openWRT image can be easily fixed with a script like this one because @kevinmehall already wrote a utility to write directly to the OpenWRT Flash from the coprocessor. We could either port it to JS or just include that linked Python file directly. We will need to place the u-boot.bin image either on our build server or ship it with the CLI.

A corrupted coprocessor image can be fixed using openocd from the OpenWRT image but this is more involved and you would need to have LAN access to OpenWRT. openocd would have to be installed with opkg (or included by default). Then we would use openocd to load an image file. Basic instructions for this can be found here.

We can do the same as the above for a corrupted bootloader by substituting the bootloader image.

I suggest the following commands:

tessel unbrick --openwrt // Unbricks the linux bootloader and file system
tessel unbrick --firmware // Unbricks coprocessor firmware
tessel unbrick --bootloader // Unbricks coprocess bootloader 
@rwaldron
Copy link
Contributor

rwaldron commented Apr 2, 2016

We should start talking about this now that boards are shipping

@johnnyman727
Copy link
Contributor Author

👍 Building this feature could be a fun first project for anyone that wants to start getting involved. I also welcome thoughts on the CLI API I wrote about above.

@Student007
Copy link
Member

short read:

  • make a pulldown list of /dev/tty.usb* (similar to t2 list)
  • suggest brew install dterm (if brew is installed - or add alternative ways)
  • hold unbrick images with the CLI (at ~/.tessel/unbrick/)
  • trigger update afterwards
  • interactive solution: push and hold the button while plugging usb for entering dfu mode (should work with the menu lib we use)

long read:

I did a successfully unbrick (thanks to @johnnyman727 and @kevinmehall published docs)
and I think it would be the shortest solution to add the suggested command by holding a prepared image with the CLI and trigger an update after successfully unbricking the Tessel2. The problem is to find a way to get people to install dterm on macosx. A simple brew install dterm and take a look at /dev/tty.usbmodem... is not perfect. I don't know but maybe it would be possible to read the given usb-ttys from node (by simple read the /dev/tty.usb* filelist) and create a menu like t2 list which port should be used for executing the chain of unbrick commands. The firmware part works easier with dfu-util -aFlash -d 1209:7551 -D ./tessel/backup/firmware.bin (suggested path for holding images).
But it needs to be a interactive step by step solution like "please unplug your tessel, then hold down the little button while plugging back to usb - it will blink red... now press the special anykey on your keyborard"...

@kevinmehall
Copy link
Member

What does dterm/the serial console have to do with unbricking? flash.py accesses the USB device the same way as the CLI does, and if it were rewritten with node-usb, it could use the existing device detection logic in the CLI. The equivalent of dfu-util is already in the normal CLI process for a samd21 update.

I wouldn't worry about automating unbricking the samd21 bootloader using openocd. If that ever breaks when someone isn't trying to mess with the bootloader, that's a major problem. The normal update process doesn't touch it.

@kevinmehall
Copy link
Member

Jon's instructions are for updating a device from a samd21 bootloader, firmware, and openwrt that were too old to be compatible with recent tools, and don't have much to do with recovering a device from a state that could happen in a user's hands. There are probably less than 5 Tessels in the world where those instructions are a reasonable thing to do.

@johnnyman727
Copy link
Contributor Author

+1 to Kevin's comments.

We should either import flash.py into the CLI or port it into JS. It will work as is without a requirement of dterm or dfu-util. It's a different process.

@Student007
Copy link
Member

@kevinmehall thanks for clarification. I think to get the point and task to to for this issue. Could you put together basically ideas how a user would be able to brick one or all three parts? And additional which way is the right to fix it. So I will get the point too 😄

@Student007
Copy link
Member

In my case I changed some passwords on the Tessel 2 which seams to hangup the t2-cli. Only the dfu-util solved this.

@Student007
Copy link
Member

I know there was a root pass somewhere but I couldn't find it once more in the cli code.

@johnnyman727
Copy link
Contributor Author

@Student007 you could corrupt either the samd21 firmware or the mediatek firmware by sending improperly generated binaries or by interrupting the standard update process while the memory is being overwritten. We don't expose any standard tools for updating the bootloader but you could mess it up in by using a standard SWD programmer to erase the memory. If you do that, you're probably screwed.

In my case I changed some passwords on the Tessel 2 which seams to hangup the t2-cli. Only the dfu-util solved this.

I'm not sure what you mean here. The only thing shared on the T2 is your public SSH key. What were the symptoms you experienced?

@Student007
Copy link
Member

What were the symptoms you experienced?

It hangs forever by waiting for open Tessel connections while provisioning. But this also happened while doing t2 list when the bricked Tessel was plugged.

DEBUG=* t2 provision
INFO Looking for your Tessel...
discovery Will scan for USB devices +0ms
discovery:usb Device found. +3ms
discovery Connection opened: +8ms undefined
discovery Fetching name: +0ms undefined
commands:usb [ 'uci', 'get', 'system.@System[0].hostname' ] +1ms
usb_process Opening process for +5ms ucigetsystem.@System[0].hostname
discovery Timeout hit! Waiting for pending to finish... +2s

@johnnyman727
Copy link
Contributor Author

@Student007 I think we're getting off topic here so I'll continue the discussion with you on Slack. I know what you ran into.

@majgis
Copy link

majgis commented Apr 12, 2016

@johnnyman727 Thanks again for your help unbricking my t2, discussed here.

I am interested in working on this project, please advise.

Given that libusb is required and may not be present, I'm thinking your idea to port it to node is best, so we can skip juggling python dependencies on top of node dependencies.

@johnnyman727
Copy link
Contributor Author

@majgis fantastic. For now, let's just implement the tessel unbrick --openwrt command since that's all the script you ran does.

You should be able to do it with dependencies already installed with the t2-cli. The usb module has an API pretty identical to the Python usb module used in the script so porting it should be pretty simple. Let me know if you have any questions about that.

You'll probably want to download the the u-boot and squashfs images only when the command is run rather than adding them directly to the project because they are ~20MB. Just fetch them from the URL I gave you on the forums for now.

Our general workflow for adding a command is to first add logic to the parser which then delegates to the controller to actually get the task done. The controller will almost always call Tessel.get to get access to a connected Tessel (make sure you check that it's connected via USB by settings opts.usb=true before passing opts to Tessel.get) and then tessel.connection.device will get you access to the equivalent dev object used in the Python script.

Let me know if any of that was confusing. I'm excited that you're working on this feature and will make sure you have everything you need to succeed.

@majgis
Copy link

majgis commented Apr 12, 2016

That sounds straight forward. I'll get a branch going and report back when I have questions, most likely this weekend.

@majgis
Copy link

majgis commented Apr 16, 2016

Good news, I successfully rebricked my tessel. I will pretend the python script doesn't exist, so my only hope is to add this feature.

Calling Tessel.get returns an error. The connection is getting created before the error. I will try cutting out the sickly middle man and get the connection directly.

@majgis
Copy link

majgis commented Apr 16, 2016

Let me know if you dislike how I gained access to just the connections:
majgis@df64dd0

I will continue on now, but I can rework this step if you want something different.

@majgis
Copy link

majgis commented Apr 16, 2016

@johnnyman727 would it be possible to replace the following file with a .tar.gz?
https://s3.amazonaws.com/builds.tessel.io/custom/new_build_next.zip

If yes, I can extend the logic in the update-fetch.js without having to write something separate, and it looks like I'd need to add a library if I am to stream the zip into memory as you did in that case.

@johnnyman727
Copy link
Contributor Author

Let me know if you dislike how I gained access to just the connections:
majgis/t2-cli@df64dd0

It's not exactly how I would have done it. I think the easier way might be to just pass opts.usb = true to Tessel.get and then use the tessel.connection object. What is the error that you were getting before?

@johnnyman727 would it be possible to replace the following file with a .tar.gz?
https://s3.amazonaws.com/builds.tessel.io/custom/new_build_next.zip

Sure thing! I've uploaded it so it should be available here: https://s3.amazonaws.com/builds.tessel.io/custom/new_build_next.tar.gz

@majgis
Copy link

majgis commented Apr 18, 2016

The error occurs on the call to tessel.connection.open() in discover.js:
https://github.com/tessel/t2-cli/blob/master/lib/discover.js#L35

Digging deeper, the error is "LIBUSB_ERROR_PIPE" here:
https://github.com/tessel/t2-cli/blob/master/lib/usb-connection.js#L163

I thought of returning a tessel in a partially functional state, but then I worried about correcting the error handling for all the other commands that expect it to error in this way.

Thank you for uploading the file!

@kevinmehall
Copy link
Member

You'll have to diverge from the normal path before the call to setAltSetting, because that's what selects whether to put the coprocessor into the mode to communicate with the SoC or the mode to write the flash.

@majgis
Copy link

majgis commented Apr 18, 2016

@kevinmehall Thank you, your comment was very helpful.

I will use an altSetting option, which can ultimately be passed to the setAltSetting method on the usbConnection, to pull out a tessel instance with the connection in the correct state.

Questions:

  1. Why is setAltSetting setting to 0 (already the default setting it looks like) before setting to 2?
    https://github.com/tessel/t2-cli/blob/master/lib/usb-connection.js#L178
  2. The value 0 for altSetting is the correct one to write to flash?
  3. The then statements following setAltSetting should be the same for an altSetting of 0 and 2:
    https://github.com/tessel/t2-cli/blob/master/lib/usb-connection.js#L133

@kevinmehall
Copy link
Member

kevinmehall commented Apr 18, 2016

Why is setAltSetting setting to 0 (already the default setting it looks like) before setting to 2?

Not sure. The OS should reset it to 0 when the interface is released, but that will make sure the endpoints get reset if it is closed and re-opened without releasing the interface. Or maybe working around an OS X bug?

The value 0 for altSetting is the correct one to write to flash?

0 disables the interface, 1 is flash mode, and 2 is pipe mode. https://github.com/tessel/t2-firmware/blob/master/firmware/usb.c#L31

The then statements following setAltSetting should be the same for an altSetting of 0 1 and 2:

The endpoints are the same, but the protocol is entirely different. For details on the protocol used on the interface 1 endpoints:

@johnnyman727
Copy link
Contributor Author

Why is setAltSetting setting to 0 (already the default setting it looks like) before setting to 2?

Yes, this was added because OSX has a bug where it doesn't release the interface in the event of a crash or process close.

@johnnyman727
Copy link
Contributor Author

@majgis just wanted to check if you were able to make any progress with this. Can I help with anything? We've got another person on the forums with a corrupted image and they'll need this tool as well.

@majgis
Copy link

majgis commented Apr 26, 2016

I made it as far as getting a tessel instance in the correct flash mode and downloading the needed files into buffers (https://github.com/majgis/t2-cli/commits/unbrick-rb1). I'm ready to write the files to the tessel, but I'll admit I'm out of my element, learning as I go on this last part.

If someone wants to take it so it can get done faster, that is fine. Otherwise, I'll put a solid effort in this weekend.

@johnnyman727
Copy link
Contributor Author

@majgis it sounds like you're on the part of the project where you should be porting script.py into JavaScript. Do you have any specific questions about that that I can help with?

@majgis
Copy link

majgis commented Apr 26, 2016

Correct, script.py is ready to be ported. There are sufficient references to figure it out, and I'm sure I'll have very specific questions once I dive in, but nothing at the moment. I'll devote my weekend to it and see what I can get you by Monday, but no hard feelings if someone else wants to pound it out before then.

@johnnyman727
Copy link
Contributor Author

👍 sounds good!

@majgis
Copy link

majgis commented May 2, 2016

I feel like I'm close:
https://github.com/majgis/t2-cli/tree/unbrick-rb2
The logic ported from the python script is here:
https://github.com/majgis/t2-cli/blob/unbrick-rb2/lib/flash.js#L161

It gets all the way through writing, but throws an error on the reboot (same error I get when running the python script).

$ t2 unbrick --openwrt
INFO Looking for your Tessel...
INFO Connected to your Tessel.
INFO Proceeding with updating OpenWrt...
INFO Beginning download. This could take a couple minutes..
  Downloading [====================] 100% 0.0s remaining
INFO Download complete!
INFO Checking the chip id...
INFO Erasing the chip...
INFO Writing uboot...
INFO Writing mediatek factory partition...
INFO Writing squashfs...
INFO Rebooting device...
INFO   Reset the USB interface...
ERR! Error: LIBUSB_ERROR_NO_DEVICE

For now, I'm just focusing on the fact that after doing a manual reboot, it is still bricked. I ran the python script and it is still working fine, so definitely an issue with my code. I plan to step through python and node.js debuggers in tandem, unless of course you see something obvious.

@johnnyman727
Copy link
Contributor Author

@majgis I didn't have much time to look into this tonight but at first glance, the code looked good (I'm impressed!). Nothing obviously wrong with it.

I tried running it on my T2 with poor results. The first time it got stuck on writing uboot and all subsequent times it got stuck while writing squashfs. I added some console output to see if it was just slow at writing the data, but it seems to stop writing data and just hang indefinitely.

I'm not sure if that's helpful or not but I'll try to look closer later this week. I think your recommendation of walking through the debugger is a good idea too.

@majgis
Copy link

majgis commented May 2, 2016

Thank you. The issues you described I haven't experienced yet...it gives the appearance of having worked on my setup. I will dig into the debuggers as soon as I can and see where that leaves us.

@majgis
Copy link

majgis commented May 8, 2016

I found the issue:
majgis@a96c6f2

parseInt(true) === NaN in JavaScript where int(True) == 1in Python was the problem, so the flags value was incorrect.

If you can confirm that unbricking is working on your end as well, we can move on to solving these two remaining issues:

1 The name is undefined when you do t2 list (same issue when using Python).

$ t2 list
INFO Searching for nearby Tessels...
        USB     undefined  

Is this something you want to address?

2 Restarting does not work (same issue when using Python)

INFO   Reset the USB interface...
ERR! Error: LIBUSB_ERROR_NO_DEVICE

Here is the python stack trace:

Traceback (most recent call last):
  File "/Users/mjackson/projects/tessel/recovery/restore-openwrt/flash.py", line 157, in <module>
    reset_openwrt(dev)
  File "/Users/mjackson/projects/tessel/recovery/restore-openwrt/flash.py", line 131, in reset_openwrt
    device.reset();
  File "/Library/Python/2.7/site-packages/usb/core.py", line 915, in reset
    self._ctx.backend.reset_device(self._ctx.handle)
  File "/Library/Python/2.7/site-packages/usb/backend/libusb1.py", line 893, in reset_device
    _check(self.lib.libusb_reset_device(dev_handle.handle))
  File "/Library/Python/2.7/site-packages/usb/backend/libusb1.py", line 595, in _check
    raise USBError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBError: [Errno 19] No such device (it may have been disconnected)

@johnnyman727
Copy link
Contributor Author

@majgis glad you were able to keep making progress!

The name is being printed as undefined because of this change which was presumably a workaround you made for something else? The altSetting is not 1 so it just hits Promise.resolve() without setting this.name.

I got a different error at the end about an argument needing to be a buffer. You should probably change the last argument of these control transfers ('') to be Buffers (new Buffer(0)). Also, I think we actually just have to pull the RST line low and then high - there is no need to pull it high first. If it doesn't seem to be resetting, you may need to delay the pull back high from low to ensure the mediatek actually resets.

Almost there!

@majgis
Copy link

majgis commented May 10, 2016

Thank you for your help. The undefined name issue is fixed. I am still having no luck with the reboot.

On linux I get this error:

INFO   Reset the USB interface...
ERR! Error: LIBUSB_ERROR_NOT_FOUND

Is there any chance you have a different setup? How is it you are not seeing this error?

I pushed the additional changes you suggested, even though I can't get far enough to test them. Let me know if the reboot is now working for you.

@johnnyman727
Copy link
Contributor Author

@majgis I think we can likely just pull out the command to reset the interface. I can't remember why I added it to the Python script - it was likely just a late night trying to get the rest rigs working in China so we could start manufacturing.

Additionally, I'm not able to get my Tessel to reboot after toggling the RST line (or the SOC power line). I tried to emulate the boot process in firmware with the following JS:

  asyncLog.bind(null, '  Control transfer to put RST line low...'),
    device.controlTransfer.bind(device, 0x40, 0x10, 0, 0, new Buffer(0)),

    asyncLog.bind(null, '  Control transfer to put SOC line low...'),
    device.controlTransfer.bind(device, 0x40, 0x10, 0, 1, new Buffer(0)),

    asyncLog.bind(null, '  Control transfer to put SOC line high...'),
    device.controlTransfer.bind(device, 0x40, 0x10, 1, 1, new Buffer(0)),

    asyncLog.bind(null, '  Control transfer to put RST line high...'),
    device.controlTransfer.bind(device, 0x40, 0x10, 1, 0, new Buffer(0)),

but my T2 doesn't come back awake. It seems like somehow the packet sent from JS is slightly different than the one sent from Python. I can call python scripts/pwr.py rst 0 && python scripts/pwr.py rst 1 or python scripts/pwr.py soc 0 && python scripts/pwr.py soc 1 to from the t2-firmware repo to reboot without problem BUT if I try to do that after running the JS code to reboot, those commands do nothing suggesting that the JS commands may have put either the MediaTek or the SAMD21 in a strange state.

I will have to dig deeper later this week - let me know if you make any traction.

@majgis
Copy link

majgis commented May 13, 2016

I'm still stuck on device not found error.

What do you think about just adding a log message to cycle the power yourself and getting this feature merged in? At least the core feature of unbricking is rolled out and it can be improved later.

@rwaldron
Copy link
Contributor

Just a heads up: you'll want to run grunt on that branch. I can tell that it won't pass :/ better to fix it up now.

@majgis
Copy link

majgis commented May 13, 2016

Yes, lot's of cleaning, I'll squash history and rebase.
On May 12, 2016 20:54, "Rick Waldron" notifications@github.com wrote:

Just a heads up: you'll want to run grunt on that branch. I can tell that
it won't pass :/ better to fix it up now.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#493 (comment)

@johnnyman727
Copy link
Contributor Author

@majgis Yeah I totally think it's acceptable to just print out a "please power cycle your Tessel" at the end. We should also make an issue to come back to this at some point 👍 Nice work! I'm super impressed.

@majgis
Copy link

majgis commented May 17, 2016

I'll have a PR ready tonight.

@rwaldron
Copy link
Contributor

Just a heads up, the link to spansion's datasheet has an "unsafe" redirect (I doubt it's really unsafe, but Chrome insists...). I checked the BOM and found the link to digikey, which links to this pdf

rwaldron added a commit that referenced this issue Aug 18, 2016
Closes gh-493

- update-fetch.js made testable (TODO: more tests to be written)
- restore: adds -f to skip device id check
- restore: simplified controller handler
- consolidate contents of "flash.js" into "restore.js"
- restore.js: eliminate async, uses promises
- restore.js: substantial tests, but still lacking coverage in a few areas, this can be done in a follow up
- updates to jshintrc and bootstrap.js

Signed-off-by: Rick Waldron <waldron.rick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants