Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken URLs #10

Open
teeli opened this issue Mar 17, 2017 · 27 comments
Open

Broken URLs #10

teeli opened this issue Mar 17, 2017 · 27 comments
Assignees
Labels

Comments

@teeli
Copy link
Owner

teeli commented Mar 17, 2017

Please report any URLs that aren't working properly (either causing errors on your bot's partyline or just not showing titles correctly) here.

Make sure you include the URL in question, any errors you might see and your configuration (any relevant software versions, e.g. eggdrop version, tcl version, tcl extension versions)

@teeli teeli added the bug label Mar 17, 2017
@teeli teeli self-assigned this Mar 17, 2017
@OmkAR2013
Copy link

eggdrop 1.80
Tcl library: /home/eggie/tcl85/lib/tcl8.5
Tcl version: 8.5.19 (header version 8.5.19)
TLS support is enabled.
TLS library: OpenSSL 1.0.2g 1 Mar 2016

https://twitter.com/Breaking911/status/842624423358291968

[05:09:32] Tcl error [UrlTitle::handler]: can't read "meta(Content-Type)": no such element in array

@teeli
Copy link
Owner Author

teeli commented Mar 17, 2017

@OmkAR2013 that should be fixed in the latest version

@OmkAR2013
Copy link

OmkAR2013 commented Mar 18, 2017

I got all previously unworking url's working. It's great! Everything except Twitter https links.

https://twitter.com/Reuters
https://twitter.com/i/moments/842395226299760641

There's no error being displayed in the bot log, so I'm not sure what's happening.
Using newest urltitle.tcl

Any suggestions? What setup do you have teeli for your working bot?

CONFIG ->

I am mOOpeY, running eggdrop v1.8.1+RC2: 1 user (mem: 100k).
Configured with: '--with-tcllib=/home/moopey/local/lib/libtcl8.6.so' '--with tclinc=/home/moopey/local/include/tcl.h' '--enable-tls'
OS: Linux 4.4.0-66-generic
Process ID: 37832 (parent 1)
Tcl library: /home/moopey/local/lib/tcl8.6
Tcl version: 8.6.6 (header version 8.6.6)
Tcl is threaded.
TLS support is enabled.
TLS library: OpenSSL 1.0.2g 1 Mar 2016
IPv6 support is enabled.

tDOM - a XML/DOM/XPath/XSLT implementation for Tcl
(Version 0.8.4)

tcltls-1.7.11.tar.gz
tcllib_1_18.tar.gz
tcl8.6.6-src.tar.gz
eggdrop-1.8.1rc2.tar.gz

@knofte
Copy link
Contributor

knofte commented Mar 21, 2017

We're getting similar with random urls now.

07:57:19 <@knofte> https://casinojakten.se
07:57:21 <@servant> Title: Freespin och Bäst Bonus från de Bästa Casinon!! | casinojakten.se
07:57:27 <@knofte> https://www.sunet.se
07:57:29 <@servant> Title: SUNET | Datakommunikation & infrastruktur för forskning och utbildning
07:57:41 <@knofte> http://www.google.com
07:57:46 <@knofte> https://www.google.com
07:57:50 <@knofte> https://google.com
07:57:56 <@knofte> http://google.com
...

ii libtcl8.6:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - run-time library files
ii tcl 8.6.0+9 amd64 Tool Command Language (default version) - shell
ii tcl-dev:amd64 8.6.0+9 amd64 Tool Command Language (default version) - development files
ii tcl-tls 1.6.7+dfsg-1 amd64 TLS OpenSSL extension to Tcl
ii tcl8.6 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - shell
ii tcl8.6-dev:amd64 8.6.5+dfsg-2 amd64 Tcl (the Tool Command Language) v8.6 - development files
ii tcl8.6-tdbc 1.0.3-1 amd64 Tcl Database Connectivity
ii tcl8.6-tdbc-sqlite3 1.0.3-1 all Tcl Database Connectivity
ii tcllib 1.17-dfsg-1 all Standard Tcl Library

@FevLoad
Copy link

FevLoad commented Apr 7, 2017

is it fixed yet ?

@teeli
Copy link
Owner Author

teeli commented Aug 4, 2017

Should be better support for HTTP(S) redirects and case insensitive HTTP headers now. Google, Twitter etc. should work.

@voidzero
Copy link

Hi @teeli,

I have a new issue: with this url:
http://blog.dilbert.com/post/164297628606/how-to-know-youre-in-a-mass-hysteria-bubble

On the partyline, I see this:

Tcl error [UrlTitle::handler]: invalid command name ""

So I added putlog statements everywhere, and it seems to be this line being the culprit:

set title [[$root selectNodes {//head/title/text()}] data]

Any idea?

@knofte
Copy link
Contributor

knofte commented Aug 18, 2017

I get the same on that url.
Tcl error [UrlTitle::handler]: invalid command name ""

@teeli
Copy link
Owner Author

teeli commented Aug 21, 2017

Apparently XPath fails to parse title on that page. I'm not sure why, I suspect it could be because of invalid html structure (stray doctype).

I should probably add some error checking and maybe a regex fallback (if that helps, need to test)

@teeli
Copy link
Owner Author

teeli commented Aug 21, 2017

Updated a new version that should fix that issue

@voidzero
Copy link

Fixed indeed. Well done. Your TCL-fu is admirable.

@lollko
Copy link

lollko commented May 12, 2019

after updating imdb there is a problem with urltile

21:37:09 <~lollko> https://www.imdb.com/title/tt1025100/
21:37:11 <&rss> Title: TryIMDbProFree

is it possible to fix ?

@teeli
Copy link
Owner Author

teeli commented May 15, 2019

Looks like there's an inline SVG element on the page that has a <title> tag. Need to look if it's possible to exclude those.

For reference

...
<svg width="175px" height="30px" viewBox="0 0 172 29" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>TryIMDbProFree</title>
<g id="tryIMDbProFree" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd">
<rect id="tryIMDbProFreeButton" stroke="#A88734" fill="#F1C241" x="1" y="1" width="170" height="28" rx="3"></rect>
<text id="tryIMDbProFreeText">
<tspan x="33" y="19">Try IMDbPro Free</tspan>
...

@teeli
Copy link
Owner Author

teeli commented May 16, 2019

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

@lollko
Copy link

lollko commented May 16, 2019

I've updated a new version now, that fixes the issue with title tags outside <head> when using regex parsing instead of tdom.

Should fix the issue with the IMDB link above.

working fine :) thx for you work

@JesseMach
Copy link

JesseMach commented May 26, 2019

Great work, thanks. Most links work fine but BBC News articles don't work for me. :(
(and yet BBC Sport links work fine)

[10:11:59] Connection to https://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed
[10:11:59] Error: Missing host part: /news/uk-england-south-yorkshire-47623303
[10:12:07] Connection to http://www.bbc.co.uk/news/uk-england-south-yorkshire-47623303/ failed
[10:12:07] Error: Missing host part: /news/uk-england-south-yorkshire-47623303

@knofte
Copy link
Contributor

knofte commented Sep 10, 2019

Yo, YouTube changed earlier this year (afaik) which created a problem with urltitle, same happened to youtube-dl:
Lamieur/youtube-dl@5eabe9c

For example:
Error: HTTP/1.1 429 Too Many Requests (https://www.youtube.com/watch?v=JImcvtJzIK8)

Some say forcing ipv4 for lookup could be used, but was not succesful with curl -I -4 unfortunately.

It'd be great to get YT-titles fixed again :)

@teeli
Copy link
Owner Author

teeli commented Sep 10, 2019

I'll take a look and try to figure that out, but it'll probably be a bit more complex fix and might take a bit more time than usual. Looks like it's blocked by the youtube servers on a request level instead being just a parsing error in the script.

@knofte
Copy link
Contributor

knofte commented Sep 10, 2019

Yeah, it seems like the title is loaded firstly after a redirect has been made. Quite annoying feature. :)

@knofte
Copy link
Contributor

knofte commented Sep 10, 2019

There is a youtube-api.tcl available for using the youtube API, perhaps that could give some hints.
(could not find a reliable link for it though)

@reelated
Copy link

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N
Title:

@knofte
Copy link
Contributor

knofte commented Jan 13, 2020

Not sure if its me or not but any page from reuters.com comes back with a blank title.

https://www.reuters.com/article/us-china-aviation-comac-insight/chinas-bid-to-challenge-boeing-and-airbus-falters-idUSKBN1Z905N
Title:

Same thing here, version 0.11.

@Ramshie
Copy link

Ramshie commented Feb 14, 2020

https://www.bbc.com/news/world-us-canada-51483541 - Nothing happens, no errors in console either.

@ramsesatabusimbel
Copy link

ramsesatabusimbel commented Jul 5, 2020

Twitter broke some weeks ago, nothing happens on those links.
https://twitter.com/ttnyhetsbyran/status/1279837369605160960?s=20 For example.
Tcl library: /usr/share/tcltk/tcl8.6
Tcl version: 8.6.9 (header version 8.6.9)
Tcl is threaded.
TLS support is enabled.
TLS library: OpenSSL 1.1.1d 10 Sep 2019

EDIT: Other sources tell me Twitter needs API to work. Perhaps not as easy fix then. Rather use a twitter exclusive script.

@lollko
Copy link

lollko commented Jul 11, 2020

hi fellas

i tried some YT links but url title show me

22:23:40 <~lollko> https://www.youtube.com/watch?v=-tDiXMeEWzw
22:23:42 <&rss> Title: YouTube

maybe yt redesign yt site ?

here is my "conf" from egg

22:37:15 <rss> Tcl library: /usr/share/tcl8.5
22:37:15 <rss> Tcl version: 8.5.13 (header version 8.5.13)
22:37:15 <rss> Tcl is threaded.
22:37:15 <rss> TLS support is enabled.
22:37:15 <rss> TLS library: OpenSSL 1.0.2k-fips  26 Jan 2017

@hjudges
Copy link

hjudges commented Nov 8, 2020

Hi @teeli

Twitter links aren't working for quite a while (no output at all). Can you have a look?

Thanks!

@angelperezleon
Copy link

x.com aka twitter link still not working @teeli
Is anyone else fixing this?

example: https://x.com/Space_Station/status/1807824547309093239
Bot response: Title: x.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests