Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sonoff Switch: Web server not available #24

Closed
xoseperez opened this issue Jan 3, 2017 · 93 comments
Closed

Sonoff Switch: Web server not available #24

xoseperez opened this issue Jan 3, 2017 · 93 comments

Comments

@xoseperez
Copy link
Owner

Originally reported by: Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin)


Steps to reproduce:

  1. download espurna

  2. change "platform" from "espressif8266" to "espressif8266_stage" in platformio.ini

  3. upload and uploadfs with "pio run"

  4. connect to wifi network SONOFF_XXXXXX make setup

  5. try to enter to the web interface in the router's wifi, but it's fail, seems something wrong with web server:

Resource interpreted as Document but transferred with MIME type application/octet-stream: "http://192.168.0.104/".

also it propose to download some "download" file in browser

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


it's the "best result", that I got from web server:
123.JPG

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


  1. I turned off the router

  2. New wifi network created by device

  3. There all web ui is available 😕

Can't figure out where is the problem

@xoseperez
Copy link
Owner Author

You mean you can see the web interface correctly only when connected to the AP the device creates, but not when it connects to your home wifi network?

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


@xoseperez correct

@xoseperez
Copy link
Owner Author

That's somewhat weird. In the screenshot it looks like only the HTML got loaded, no style, no data (so no websocket connection either),...

I would gather some more info:

  • Connect it to the computer via serial and check the debug messages in the terminal
  • Check the browser requests with a debug panel (F12 under chrome, for instance) and see what happens to the rest of the requests...

One question: does it happen when using the IP of the device and/or the local name (like http://hostname.local)?

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


@xoseperez http://{id}.local not work for me, so I'm use IP.

This how it's look like in the Edge browser:
1234.JPG

That's all what I see in COM port

[WEBSERVER] Request: GET /

Here is content of "download" file when I try to load url http://192.168.0.101/styles.css

3da4 9f3b 1f0b d6f2 a5f2 7333 12d2 3f95
350b f8df e939 17f7 4493 427c 0cc6 fdbb
9f6c 617c 2f03 7903 bbe0 3f84 541c 4a7c
bfea bcbd c23b 8f03 b0e0 93cd 2c76 aa57
31de cf35 3db3 7f96 f817 9109 bc76 8580
92ef 7fc3 3b17 56c4 72f7 0b97 bac8 6cf0
e909 f1e1 0300 7ef3 fe1b 28fa fb13 00cc
4c2f 7c62 8ca6 f6c7 2d6c 5810 51b8 f4ab
cc11 578a f83d 66ba fd06 8544 2b5f 6adf
6881 15cc 8731 3a0d f697 ba62 ddc4 75ac
3fed 7bae c465 7657 0cae a1a8 1917 4a21
bf0c d4e5 c8df ad03 aaec 92d9 6eb6 ebaa
2455 f6cc de48 20eb 2e99 4c10 2ecb 1962
6aa4 69a8 ed06 f3b2 dbae e6d3 8e2c 65b9
938c f42f 8cce 4997 ee97 f1d4 dc24 3b90
6be5 22d6 5cff c61f eb37 a68d 7371 839a
202c f9f4 8cff bb82 f032 8600 8523 4730
fc8b d240 a43c 1f45 cd92 6738 e762 042b
867b c30c f838 0ff9 2c5a 6c5d 3811 4e28
4d9a 2373 7d0b 6d34 56b3 c9f5 95b7 ff01
46db cd82 7670 0000

@xoseperez
Copy link
Owner Author

I thinking this might be due to the same problem reported here: me-no-dev/ESPAsyncWebServer#115.
A combination of several files being requested in parallel makes the uC run out of memory and the result is a corrupted response...
The web interface is quite heavy and lately I split it in files to implement the "default password change" functionality (0f5a0e8).

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


Same issue with Sonoff POW :(
@xoseperez how can I fix it temporary? Maybe I could comment some lines of code?

@xoseperez
Copy link
Owner Author

I'm working on this issue right now but I still cannot reproduce it with confidence, i.e. it happens randomly after the device has been working right for some hours (at least to me). When it happens only the request to SPIFFS files fail (the browser tries to download a binary file, no known format). But the requests to dynamically generated content work just fine (API calls, for instance). The device works as expected otherwise (MQTT, relays, led,...). So my guess right now is still on the SPIFFS handling.

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


Today I got interesting behaviour:

  1. I need to enter to the web interface

  2. So I turn OFF mains on device and turn off router

  3. Then I turn ON mains on device and expect to see new wifi network like SONOFF_POW_ABCD12

  4. But it's not created ❓

  5. I repeat 1-3 steps 3 times and nothing

  6. I turn OFF mains on device, connect via USB uart (3.3 volt, RX, TX, GND)

  7. Voila, new wifi network is created

@xoseperez is it correct behaviour? What do you think?

@xoseperez
Copy link
Owner Author

Original comment by Tim K (Bitbucket: clueo8, GitHub: clueo8):


I too am experiencing this issue. The web page works on a fresh restart, but after some time it becomes unavailable. API calls (i.e. Alexa) still work (response is slow sometimes), but the web server is inaccessible. Is there any workarounds for this? Possibly host the web server external to the device and just have it use the API to control the relays?

@xoseperez
Copy link
Owner Author

This one is really hard to catch... I've been running a Sonoff TH for days without any issue :( If anyone has a step-to-reproduce I will very much thank you.

@xoseperez
Copy link
Owner Author

@paveleremin Why do you turn off your router to enter the Sonoff web interface?

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


@xoseperez because when Sonoff work as AP I can't enter to the web interface, because of this issue. But when it create own wifi network - all is ok.

@xoseperez
Copy link
Owner Author

OK, I'm 90% sure it's a memory leak that empties free heap and prevent the ESPAsyncWebServer library from serving big files. I'm doing tests right now but I suspect it's the Alexa integration. Are you using it?

@xoseperez
Copy link
Owner Author

Original comment by Pavel Eremin (Bitbucket: paveleremin, GitHub: paveleremin):


Nope. First device (Sonoff Switch) seems even didn't had this option, and at second one (Sonoff POW) I turn it off in web interface at first time setup.

@xoseperez
Copy link
Owner Author

This is a dump from the debug log of one device I have running:

[BEAT] Free heap: 21776
[NTP] Time: 09:46:11 31/01/2017
[MQTT] Sending /test/switch/TH16/status => 1
[BEAT] Free heap: 21776
[NTP] Time: 09:51:11 31/01/2017
[MQTT] Sending /test/switch/TH16/status => 1
[BEAT] Free heap: 21776
[NTP] Time: 09:56:11 31/01/2017
[FAUXMO] Search request from 192.168.1.118
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] Search request from 192.168.1.118
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] Search request from 192.168.1.100
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] Search request from 192.168.1.100
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] UDP response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[FAUXMO] /setup.xml response for device #0 (TH16)
[MQTT] Sending /test/switch/TH16/status => 1
[BEAT] Free heap: 18784
[NTP] Time: 10:01:11 31/01/2017
[MQTT] Sending /test/switch/TH16/status => 1
[BEAT] Free heap: 18784
[NTP] Time: 10:06:11 31/01/2017
[NTP] Time: 10:11:05 31/01/2017
[MQTT] Sending /test/switch/TH16/status => 1

As you can see after the Alexa device (a Amazon Dot in my case) did a discovery (it does it from time to time) the free heap is 2.5K less...

I'm opening an issue in the fauxmoESP library repo (https://bitbucket.org/xoseperez/fauxmoesp/issues/5/memory-leak). Won't close this one thou.

@xoseperez
Copy link
Owner Author

Original comment by Tim K (Bitbucket: clueo8, GitHub: clueo8):


I only really use this for the Alexa integration (No MQTT setup)... It seems to happen very frequently that the web interface becomes unreachable. I have not run the actual Alexa (Dot) Discover process in sometime.

@xoseperez
Copy link
Owner Author

The Amazon Dot (I uess the Echo as well) run the discovery process from time to time on their own. No need for you to trigger it. There is a huge memory leak on the fauxmoESP library and after 10 discoveries the free heap is below 5k and the web page becomes irresponsible.

@xoseperez
Copy link
Owner Author

Original comment by Tim K (Bitbucket: clueo8, GitHub: clueo8):


Is there any way remotely to reset the device, freeing up the memory to get the web server back (API, websocket)?

@xoseperez
Copy link
Owner Author

Original comment by Tim K (Bitbucket: clueo8, GitHub: clueo8):


Would it be helpful to off-load jquery to a cdn instead of loading it into spiffs? For example, remove jquery and just include:

#!html

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>

@xoseperez
Copy link
Owner Author

@clueo8 Since version 1.6.0 you can reset the board remotely using MQTT (send a message to your root topic + /action with payload "reset") or using RPC (http://yourip/rpc?apikey=XXXXX&action=reset).

@clueo8 That will only work if you have internet connection in you PC and that is not true if you are connected to the device in AP mode to configure it the first time...

@xoseperez
Copy link
Owner Author

Original comment by matantal (Bitbucket: matantal, GitHub: matantal):


@xoseperez
Hi Xose,
Same issue here, first setup OK. then when accessing the webserver it only prompt for user and password but page won't load.
version 1.6 , sonoff th16.

@xoseperez
Copy link
Owner Author

@matantal Have you flashed the filesystem image?

@xoseperez
Copy link
Owner Author

Original comment by matantal (Bitbucket: matantal, GitHub: matantal):


@xoseperez
Yes, flashed successfully using platformio.
This is after flashing the firmware .
Followed your wiki step-by-step.

@xoseperez
Copy link
Owner Author

Not the firmware, the filesystem. You have to flash two different images on the device:

pio run -e sonoff-debug -t upload
pio run -e sonoff-debug -t uploadfs

@xoseperez
Copy link
Owner Author

Original comment by matantal (Bitbucket: matantal, GitHub: matantal):


This is exactly the commands I used.
I meant I flashed twice. The uploadfs was after I flashed the firmware.

I get the same issue as the original post.
The webserver does not respond when connected to an SSID.

@xoseperez
Copy link
Owner Author

@matantal Can you connect your USB2UART board to the device and see the debug output when it boots and when you do a request?

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


Hi,
I run into the same issue..i don't know if this helps..but this occurs everytime i try to load the web page from my android phone..using chrome,firefox,etc. doesn't make a difference.
so it goes like this.
enter IP of webserver....the browser is unable to load the page..if you press cancel you see the screen Pavel posted.
after that the webserver is not running anymore and the browser get's an connection closed with 'ERR_INVALID_HTTP_RESPONSE'.....

I could never reproduce this on my notebook using chrome,firefox,edge,IE.......
I'm not a web programmer, but i get the strong feeling that this is related to the way the CSS sheets are handled....

i hope this was of any help,

Cheers,
Harry

@xoseperez
Copy link
Owner Author

Original comment by Xavier Smith (Bitbucket: xavier, GitHub: Unknown):


After firmware and filesystem update (1.6.0 release) with these commands

#!

pio run -e electrodragon-debug -t upload
pio run -e electrodragon-debug -t uploadfs

this is my output from PlatformIO serial monitor:

#!

--- Miniterm on COM8  115200,8,N,1 ---
--- Quit: Ctrl+C | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[WEBSERVER] Request: GET /
[DHT] Error reading sensor
[WEBSERVER] Request: GET /
[DHT] Error reading sensor
[WEBSERVER] Request: GET /
[DHT] Error reading sensor
[WEBSERVER] Request: GET /
[DHT] Error reading sensor
[DHT] Error reading sensor
[DHT] Error reading sensor
[DHT] Error reading sensor

After looonnnggg time, i get the same web interface of Pavel Eremin "best result".

No one button functional, if I press, no reply.

Best result with old 1.4.0 firmware: good web interface at power on, unreachable after some time.

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Ok good news ... I can duplicate the problem.

Setup DDNS for router external IP address. Check can not ping the dns name.
On the router configure DMZ ip address to point to the device's internal IP address.
Check can ping the DDNS name. Plug un-plug the device yep ICMP is going to the device.
Hit the DDNS name with the web browser - never get data back. - Harry think this is what you are seeing aaah not seeing.
Now to go build an AP that I can snoop the network traffic with.

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


had the same idea this morning, i did wireshark the traffice already :)
i noticed some high ports being forwarded to source-destination IP (56000 and higher..) but they seem to be random ports that most probably come from Windows...maybe you see them also..

the browser is loading the index.html file... and then just stops...maybe it can't load the favicon.ico ????

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Ok have a bridge AP working on a RPi 3 ...

Can also confirm the heap issues is with the GET's from the DDNS IP - fresh reset and then try.

#!arduino

[NTP] Error: NTP server not reachable
[NTP] Time: 20:23:46 18/02/2017
[WEBSERVER] Request: GET /index.html
[WEBSERVER] Request: GET /index.html
[WEBSERVER] Request: GET /index.html
[WEBSERVER] Request: GET /index.html
[WEBSERVER] Request: GET /index.html
[BEAT] Free heap: 7728
[NTP] Time: 20:28:41 18/02/2017

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


So when you hit the DDNS enough times you get the heap drops and then you need to reset the device to get is to clear memory to allow normal internal IP to connect.

Busy looking at the tcpdumps .. need to confirm a few things.

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Seeing some corruption.

Working from local PC to internal device IP.

#!arduino

22:05:19.857331 IP (tos 0x0, ttl 64, id 17730, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.24.52260 > 192.168.1.29.http: Flags [S], cksum 0x78c3 (correct), seq 699817116, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
22:05:19.859123 IP (tos 0x0, ttl 128, id 18, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.29.http > 192.168.1.24.52260: Flags [S.], cksum 0x9187 (correct), seq 6513, ack 699817117, win 5840, options [mss 1460], length 0
22:05:19.859362 IP (tos 0x0, ttl 64, id 17731, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.52260 > 192.168.1.29.http: Flags [.], cksum 0xc523 (correct), seq 1, ack 1, win 64240, length 0
22:05:19.859725 IP (tos 0x0, ttl 64, id 17732, offset 0, flags [DF], proto TCP (6), length 421)
    192.168.1.24.52260 > 192.168.1.29.http: Flags [P.], cksum 0x5bad (correct), seq 1:382, ack 1, win 64240, length 381: HTTP, length: 381
        GET / HTTP/1.1
        Host: 192.168.1.29
        Connection: keep-alive
        Upgrade-Insecure-Requests: 1
        User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        Accept-Encoding: gzip, deflate, sdch
        Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

22:05:19.881500 IP (tos 0x0, ttl 128, id 19, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [.], cksum 0xbac7 (correct), seq 1:1461, ack 382, win 5459, length 1460: HTTP, length: 1460
        HTTP/1.1 200 OK
        Content-Length: 50971
        Content-Type: text/html
        Content-Encoding: gzip
        Content-Disposition: inline; filename="index.html"
        Last-Modified: Feb 18 2017 07:39:00 GMT
        Connection: close
        Accept-Ranges: none

22:05:19.881829 IP (tos 0x0, ttl 128, id 20, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [P.], cksum 0xb400 (correct), seq 1461:2921, ack 382, win 5459, length 1460: HTTP
22:05:19.882768 IP (tos 0x0, ttl 64, id 17733, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.52260 > 192.168.1.29.http: Flags [.], cksum 0xb83e (correct), seq 382, ack 2921, win 64240, length 0
22:05:19.886734 IP (tos 0x0, ttl 128, id 21, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [.], cksum 0x82b9 (correct), seq 2921:4381, ack 382, win 5459, length 1460: HTTP
22:05:19.888016 IP (tos 0x0, ttl 128, id 22, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [P.], cksum 0x04e8 (correct), seq 4381:5841, ack 382, win 5459, length 1460: HTTP
22:05:19.888523 IP (tos 0x0, ttl 64, id 17734, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.52260 > 192.168.1.29.http: Flags [.], cksum 0xacd6 (correct), seq 382, ack 5841, win 64240, length 0
22:05:19.892255 IP (tos 0x0, ttl 128, id 23, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [.], cksum 0xb19f (correct), seq 5841:7301, ack 382, win 5459, length 1460: HTTP
22:05:19.893589 IP (tos 0x0, ttl 128, id 24, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.24.52260: Flags [P.], cksum 0x9fd3 (correct), seq 7301:8761, ack 382, win 5459, length 1460: HT

The above goes on till the entire payload is delivered then it moves to the next step .. auth all works.

Not working local PC to DDNS address.

#!arduino

22:02:01.080890 IP (tos 0x0, ttl 51, id 6217, offset 0, flags [DF], proto TCP (6), length 52)
    DDNSIPADDr.52258 > 192.168.1.29.http: Flags [S], cksum 0x4de4 (correct), seq 3341093784, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
22:02:01.203060 IP (tos 0x0, ttl 128, id 13, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.29.http > DDNSIPADDr.52258: Flags [S.], cksum 0x66a0 (correct), seq 6513, ack 3341093785, win 5840, options [mss 1460], length 0
22:02:01.246597 IP (tos 0x0, ttl 51, id 6218, offset 0, flags [DF], proto TCP (6), length 52)
    DDNSIPADDr.52259 > 192.168.1.29.http: Flags [S], cksum 0x2a03 (correct), seq 1151775223, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
22:02:01.251168 IP (tos 0x0, ttl 128, id 14, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.29.http > DDNSIPADDr.52259: Flags [S.], cksum 0x42be (correct), seq 6514, ack 1151775224, win 5840, options [mss 1460], length 0
22:02:01.350564 IP (tos 0x0, ttl 51, id 6219, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.52258 > 192.168.1.29.http: Flags [.], cksum 0x95f0 (correct), seq 1, ack 1, win 65340, length 0
22:02:01.365187 IP (tos 0x0, ttl 51, id 6220, offset 0, flags [DF], proto TCP (6), length 429)
    DDNSIPADDr.52258 > 192.168.1.29.http: Flags [P.], cksum 0x1b29 (correct), seq 1:390, ack 1, win 65340, length 389: HTTP, length: 389
        GET / HTTP/1.1
        Host: DDNSIPADDr
        Connection: keep-alive
        Upgrade-Insecure-Requests: 1
        User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        Accept-Encoding: gzip, deflate, sdch
        Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

22:02:01.418562 IP (tos 0x0, ttl 128, id 15, offset 0, flags [none], proto TCP (6), length 1492)
    192.168.1.29.http > DDNSIPADDr.52258: Flags [.], cksum 0xa3d8 (correct), seq 1:1453, ack 390, win 5451, length 1452: HTTP, length: 1452
        HTTP/1.1 200 OK
        Content-Length: 50971
        Content-Type: text/html
        Content-Encoding: gzip
        Content-Disposition: inline; filename="index.html"
        Last-Modified: Feb 18 2017 07:39:00 GMT
        Connection: close
        Accept-Ranges: none

22:02:01.418785 IP (tos 0x0, ttl 128, id 16, offset 0, flags [none], proto TCP (6), length 1492)
    192.168.1.29.http > DDNSIPADDr.52258: Flags [.], cksum 0x9875 (correct), seq 1453:2905, ack 390, win 5451, length 1452: HTTP
22:02:01.620647 IP (tos 0x0, ttl 51, id 6221, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.52258 > 192.168.1.29.http: Flags [.], cksum 0x8913 (correct), seq 390, ack 2905, win 65340, length 0
22:02:01.714781 IP truncated-ip - 1460 bytes missing! (tos 0x0, ttl 128, id 17, offset 0, flags [none], proto TCP (6), length 2960)
    192.168.1.29.http > DDNSIPADDr.52258: Flags [P.], seq 2905:5825, ack 390, win 5451, length 2920: HTTP
22:02:02.180563 IP (tos 0x0, ttl 128, id 18, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.29.http > DDNSIPADDr.52259: Flags [S.], cksum 0x42be (correct), seq 6514, ack 1151775224, win 5840, options [mss 1460], length 0
22:02:02.352807 IP (tos 0x0, ttl 51, id 6222, offset 0, flags [DF], proto TCP (6), length 40)

Payload never gets to the browser.

Still looking

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Maybe a correlation between MTU size of 1492 - set on the router vs - 1500 on the MTU of Ethernet. ?

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


it seems like the IP 192.168.1.29 is hiting on port 52259...... what OS/browser are are you running ?

could you install wireshare to monitor traffic ?

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


MTU size is so 80's :-)
don't get hook up on these things...that's hardly ever the problem.

i tihink we are going the wrong direction...basically i think we are missing something really fundamental..i admitt that i am really stuck on the network thing..but maybe that is not even close to the problem ..i am not even sure that we are still talking about the original issue...what about the guy who originally reported this problem ? can they confirm the things we found out so far ?
or is to too far away from the original issue ?

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


So changing the MTU on the router changes the length of the data ... for the DDNS requested page.

#!arduino


        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        Accept-Encoding: gzip, deflate, sdch
        Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

22:34:44.843221 IP (tos 0x0, ttl 128, id 19, offset 0, flags [none], proto TCP (6), length 1370)

1370 is the lowest that my router will go on MTU.

Could it be as simple as the tcp-stack on the esp only wants to play on sizes of 1500 ?

maybe that is why the failed load has the 22:02:01.714781 IP truncated-ip - 1460 bytes missing! the TCP stack could not reduce the payload to 1460?

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


thrust me, if we need to fidle with the MTU size, something fundamentaly is wrong !
it's not like we are transfering Gbit of date...this is never the issue..least not ever since 15 years ago :-) ...if this is REALLY some kind of an issue, a problem with the TCP/IP stack is much more likely !

@xoseperez
Copy link
Owner Author

@Harry_Reutter In the dev branch everything is embedded in one single file. Even images and the favicon are included encoded in base64, so there is no other static content to download.

@F-Fish amazing investigation. I'm sorry I cannot keep up with it right now. Family weekend. Last week, while trying to catch a memory leak in the MQTT module, I used ab (apache benchmark) to benchmark the server and the numbers I got are consistent with yours: 4s average response time and struggling to handle more than 3 concurrent connections. Anyway I'm not worried about the concurrency or the response time (not the goal of the app) as much as a possible memory leak.

Can you confirm this behaviour only happens when using mDNS or an externa DNS service and not when using the IP?

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Quick google - seems like loads of issues with MTU sizes - to big to small with the ESP.

From the SDK limiting you to 1500 and guys debating buffers with large chunks of data.

Harry I think your problem is so unique in the way that the data first goes back into your router from your PC and then to the device - that it is probably an MTU issue.

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Yeah I hear you Harry, could well be in the stack.

Looks like the issue is only when you do the external DDNS - so the request has to run out to the router so that the router can pass it back to the internal device.

Did as quick test from mDNS on mint - no issues.

#!arduino

22:58:36.585428 IP (tos 0x0, ttl 64, id 15913, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.22.35048 > 192.168.1.29.http: Flags [S], cksum 0x6f65 (correct), seq 419379479, win 29200, options [mss 1460,sackOK,TS val 7997246 ecr 0,nop,wscale 7], length 0
22:58:36.587146 IP (tos 0x0, ttl 128, id 12, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.29.http > 192.168.1.22.35048: Flags [S.], cksum 0x0902 (correct), seq 6513, ack 419379480, win 5840, options [mss 1460], length 0
22:58:36.587495 IP (tos 0x0, ttl 64, id 15914, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.22.35048 > 192.168.1.29.http: Flags [.], cksum 0xc57e (correct), seq 1, ack 1, win 29200, length 0
22:58:36.587862 IP (tos 0x0, ttl 64, id 15915, offset 0, flags [DF], proto TCP (6), length 360)
    192.168.1.22.35048 > 192.168.1.29.http: Flags [P.], cksum 0x1373 (correct), seq 1:321, ack 1, win 29200, length 320: HTTP, length: 320
        GET / HTTP/1.1
        Host: sodemo.local
        User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
        Accept-Language: en-US,en;q=0.5
        Accept-Encoding: gzip, deflate
        Connection: keep-alive
        Upgrade-Insecure-Requests: 1

22:58:36.607553 IP (tos 0x0, ttl 128, id 13, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.22.35048: Flags [.], cksum 0x3242 (correct), seq 1:1461, ack 321, win 5520, length 1460: HTTP, length: 1460
        HTTP/1.1 200 OK
        Content-Length: 50971
        Content-Type: text/html
        Content-Encoding: gzip
        Content-Disposition: inline; filename="index.html"
        Last-Modified: Feb 18 2017 07:39:00 GMT
        Connection: close
        Accept-Ranges: none

22:58:36.607770 IP (tos 0x0, ttl 128, id 14, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.22.35048: Flags [P.], cksum 0x2b7b (correct), seq 1461:2921, ack 321, win 5520, length 1460: HTTP
22:58:36.608351 IP (tos 0x0, ttl 64, id 15916, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.22.35048 > 192.168.1.29.http: Flags [.], cksum 0xb322 (correct), seq 321, ack 1461, win 32120, length 0
22:58:36.608979 IP (tos 0x0, ttl 64, id 15917, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.22.35048 > 192.168.1.29.http: Flags [.], cksum 0xa206 (correct), seq 321, ack 2921, win 35040, length 0
22:58:36.612532 IP (tos 0x0, ttl 128, id 15, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.29.http > 192.168.1.22.35048: Flags [P.], cksum 0xfa2b (correct), seq 2921:4381, ack 321, win 5520, length 1460: HTTP

and since local hosts files also worked .. it does seem to point to the router dictating MTU. I have no way of upping it past 1492 on the router so can not test the theory further unfortunately.

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Side note ... just checked this on a basic sonoff with espeasy R120 ... set the DMZ to point to that IP address .. no issue, can use the DDNS name via the router. Need to do some flashing to put that behind the AP bridge, to run real traces.

But this is for another day.

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


ok..i don't know how to say this...MTU size is OSI layer stuff...this is NOT an issue with TCP/IP HTTP connections..the TCP protocol should be able to handle this...otherwise we wouldn't be able to Download ripped Porn over Web ;-)
we are going the wrong direction..

i have over 20 webcams setup to access from outside...this is not a unique setup.

again, the packet size is not an issue...even if it is, it's an server issue, not an client/router issue..

i still strongly believe that it is a routing issue..(or authentification/Java :-) ) OR the web interface is using an other port than 80 for some reason ?!?

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


For completeness .. this is espeasy R120 - via device IP

#!arduino

23:36:29.269882 IP (tos 0x0, ttl 64, id 1097, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [S], cksum 0x4416 (correct), seq 2276470477, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
23:36:29.361208 IP (tos 0x0, ttl 128, id 14, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.61.http > 192.168.1.24.53382: Flags [S.], cksum 0x5cd6 (correct), seq 6517, ack 2276470478, win 5840, options [mss 1460], length 0
23:36:29.361467 IP (tos 0x0, ttl 64, id 1101, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [.], cksum 0x9072 (correct), seq 1, ack 1, win 64240, length 0
23:36:29.361771 IP (tos 0x0, ttl 64, id 1102, offset 0, flags [DF], proto TCP (6), length 421)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [P.], cksum 0x2304 (correct), seq 1:382, ack 1, win 64240, length 381: HTTP, length: 381
        GET / HTTP/1.1
        Host: 192.168.1.61
        Connection: keep-alive
        Upgrade-Insecure-Requests: 1
        User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        Accept-Encoding: gzip, deflate, sdch
        Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

23:36:29.372699 IP (tos 0x0, ttl 128, id 15, offset 0, flags [none], proto TCP (6), length 157)
    192.168.1.61.http > 192.168.1.24.53382: Flags [P.], cksum 0x0dff (correct), seq 1:118, ack 382, win 5459, length 117: HTTP, length: 117
        HTTP/1.1 200 OK
        Content-Type: text/html
        Content-Length: 1742
        Connection: close
        Access-Control-Allow-Origin: *

23:36:29.403336 IP (tos 0x0, ttl 64, id 1103, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [.], cksum 0x8ef5 (correct), seq 382, ack 118, win 64123, length 0
23:36:29.409980 IP (tos 0x0, ttl 128, id 16, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.1.61.http > 192.168.1.24.53382: Flags [P.], cksum 0x5bc5 (correct), seq 118:1578, ack 382, win 5459, length 1460: HTTP
23:36:29.441412 IP (tos 0x0, ttl 64, id 1105, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [.], cksum 0x88cc (correct), seq 382, ack 1578, win 64240, length 0
23:36:29.446278 IP (tos 0x0, ttl 128, id 17, offset 0, flags [none], proto TCP (6), length 322)
    192.168.1.61.http > 192.168.1.24.53382: Flags [P.], cksum 0x329c (correct), seq 1578:1860, ack 382, win 5459, length 282: HTTP
23:36:29.446611 IP (tos 0x0, ttl 64, id 1106, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.24.53382 > 192.168.1.61.http: Flags [F.], cksum 0x88cb (correct), seq 382, ack 1860, win 63958, length 0
23:36:29.449021 IP (tos 0x0, ttl 128, id 18, offset 0, flags [none], proto TCP (6), length 40)
    192.168.1.61.http > 192.168.1.24.53382: Flags [R.], cksum 0x6bce (correct), seq 1860, ack 383, win 5840, length 0

vs external DDNS - both work for espeasy - but the payload is so much smaller.

#!arduino

23:40:50.377881 IP (tos 0x0, ttl 51, id 6362, offset 0, flags [DF], proto TCP (6), length 52)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [S], cksum 0x2cb0 (correct), seq 3833006701, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
23:40:50.380256 IP (tos 0x0, ttl 128, id 24, offset 0, flags [none], proto TCP (6), length 44)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [S.], cksum 0x455f (correct), seq 6526, ack 3833006702, win 5840, options [mss 1460], length 0
23:40:50.468565 IP (tos 0x0, ttl 51, id 6363, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [.], cksum 0x74af (correct), seq 1, ack 1, win 65340, length 0
23:40:50.482551 IP (tos 0x0, ttl 51, id 6364, offset 0, flags [DF], proto TCP (6), length 429)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [P.], cksum 0xf9e7 (correct), seq 1:390, ack 1, win 65340, length 389: HTTP, length: 389
        GET / HTTP/1.1
        Host: DDNSIPADDr
        Connection: keep-alive
        Upgrade-Insecure-Requests: 1
        User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        Accept-Encoding: gzip, deflate, sdch
        Accept-Language: en-GB,en-US;q=0.8,en;q=0.6

23:40:50.590960 IP (tos 0x0, ttl 128, id 25, offset 0, flags [none], proto TCP (6), length 157)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [P.], cksum 0xf686 (correct), seq 1:118, ack 390, win 5451, length 117: HTTP, length: 117
        HTTP/1.1 200 OK
        Content-Type: text/html
        Content-Length: 1743
        Connection: close
        Access-Control-Allow-Origin: *

23:40:50.729339 IP (tos 0x0, ttl 51, id 6365, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [.], cksum 0x732a (correct), seq 390, ack 118, win 65223, length 0
23:40:50.790758 IP (tos 0x0, ttl 128, id 26, offset 0, flags [none], proto TCP (6), length 1492)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [.], cksum 0xd549 (correct), seq 118:1570, ack 390, win 5451, length 1452: HTTP
23:40:50.950883 IP (tos 0x0, ttl 51, id 6366, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [.], cksum 0x6d09 (correct), seq 390, ack 1570, win 65340, length 0
23:40:50.997627 IP (tos 0x0, ttl 128, id 27, offset 0, flags [none], proto TCP (6), length 48)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [P.], cksum 0x149f (correct), seq 1570:1578, ack 390, win 5451, length 8: HTTP
23:40:51.132668 IP (tos 0x0, ttl 51, id 6367, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [.], cksum 0x6d09 (correct), seq 390, ack 1578, win 65332, length 0
23:40:51.199939 IP (tos 0x0, ttl 128, id 28, offset 0, flags [none], proto TCP (6), length 323)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [P.], cksum 0x8f95 (correct), seq 1578:1861, ack 390, win 5451, length 283: HTTP
23:40:51.309953 IP (tos 0x0, ttl 51, id 6368, offset 0, flags [DF], proto TCP (6), length 40)
    DDNSIPADDr.53519 > 192.168.1.61.http: Flags [F.], cksum 0x6d08 (correct), seq 390, ack 1861, win 65049, length 0
23:40:51.404774 IP (tos 0x0, ttl 128, id 29, offset 0, flags [none], proto TCP (6), length 40)
    192.168.1.61.http > DDNSIPADDr.53519: Flags [R.], cksum 0x544e (correct), seq 1861, ack 391, win 5840, length 0

Later Ferdie

@xoseperez
Copy link
Owner Author

Guys. I think we should do some clean-up here. This is going way too far from the original issue.
I'd like to know if @paveleremin has tested the current dev branch to see if we have done some steps in the right direction here.

Some many different (?) things in this ticket.

  1. Browser trying to download the page as a file. => I'm almost certain this is due to low free heap. Some improvements have been done here.
  2. Partial download (html but no css or js). => This should not happen since everything is in one file now.
  3. Connects to network only when connected to the board via serial => power issue?
  4. AP mode not stable. => This should also be fixed with the latest commits to dev
  5. Page not working when using external DNS service (whatever the cause might be)
  6. Free heap going down when using named requests (not using IP). => I could not reproduce this.

I'd like to open a different issue for the DNS problem. Is anyone still facing any of the other issues?

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


On 1.6.4b retested .... same same (not that I expected a magic fix ;-)

AFAIK only point 5 still an issue for this thread.

5 and 6 I see as related since the device does not seem to recover that easily.

I would describe point 5 slightly differently: Page does not load when accessing via external address - be it IP of DNS, either from on network browser or an internet browser.

Sorry Harry - see I missed you wireshark dump request - having some issues on the AP bridge with install fluff - will see if I can get it installed on that.

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by f-fish (Bitbucket: f-fish, GitHub: Unknown):


Since we do not have a new thread yet... here goes - and one can "clearly" see the issue.

badpacket.JPG

wireshark trace above - esp and router can not agree on MSS

badpacket1.JPG

the first segment the esp sends what the router can handle based on MSS 1452

badpacket2.JPG

we get an ACK for 41 so esp sends the next stream but at the wrong MSS 1460

badpacket3.JPG

we never get an ACK back from the router - I am assuming because of segment length. So we never get to send the next bit of data.

When I look at the working internal ip address (mDNS or direct internal IP)

badpacket-working.JPG

both are happy with a MSS of 1460 and data flows.

One could probably "fix" it by reducing the max MSS on the esp ... that sounds like coding to me ;-)

Later Ferdie

@xoseperez
Copy link
Owner Author

Original comment by J.D. (Bitbucket: Harry_Reutter, GitHub: Unknown):


@xoseperez
thats what i wrote earlier..i think most of the things are fixed/follow up error.

Nr.5 not releated to DNS, happens with IP also

Nr.6 still an issue (sometimes) right now it's stable at 22k

@ ferdie
i have the same entries....but look at the port !? these seem to be random windows high ports..i agree that they are more or less dropped when coming from outside, but i am not sure that this is the cause of the problem..If so, where do the high port numbers from ?

Regards,
Harry

@xoseperez
Copy link
Owner Author

Opened issue #74 to track the problem accessing the device from outside the network. Please report there further info.

About the heap issue: I have been doing tests hitting just the index.html or the /auth resource alone (using curl and ab doing up to ten requests in a row) or both (using a browser). The free heap goes back to the previous value after the requests have been responded, both using IP or DNS (just in case). When using the browser the websocket client object uses some memory but it gets correctly freed after the browser closes.

@xoseperez
Copy link
Owner Author

Pending issues in thei ticket have been released with 1.6.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant