Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix search #14

Closed
wants to merge 5 commits into from
Closed

Fix search #14

wants to merge 5 commits into from

Conversation

9ary
Copy link

@9ary 9ary commented May 9, 2023

Search now requires being logged in + a CSRF token.

This PR adds a CLI flag to provide an authentication cookie (must be obtained by logging in with a browser, in Firefox the cookie can be found in the developer toolbox under the storage tab).

It looks like a randomly generated CSRF token works, so no complicated mechanism is required to obtain one.

Fixes #11.
Fixes #13.

@@ -108,12 +108,19 @@ def get_connector(config):

async def RequestUrl(config, init):
logme.debug(__name__ + ':RequestUrl')
csrf_token = random.randbytes(16).hex() # Looks like any random string works
Copy link
Author

@9ary 9ary May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, randbytes is Python 3.9+ (late 2020). No idea if you want to support older versions, in that case it can be made to work, I guess hardcoding a fixed string might even be feasible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'd like to run the code on Python 3.6. Seems as if there would be enough options available though ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely feasible.
It's just that Python 3.6 is already 7 years old, and already EOL. 3.7 is going to be EOL'd in 6 weeks according to https://devguide.python.org/versions/. Debian Bullseye (current stable) ships 3.9, and I usually find that to be a reasonable reference to set the cutoff point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinqLover Python 3.6 has already reached end of support, and Python 3.7 reaches end of support on 2023-06-27 (1 month 17 days away). TWINT should not care about Python versions that have reached end of support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point 👍

@bb010g bb010g force-pushed the fix-search branch 3 times, most recently from 7540876 to 9f52444 Compare May 10, 2023 23:23
@bb010g
Copy link
Contributor

bb010g commented May 10, 2023

Tests are now capable of passing on this branch. The first two commits (including #8) take care of fixing bugs that already prevented tests from working, independently of Twitter's latest changes.

@LinqLover
Copy link
Contributor

LinqLover commented May 11, 2023

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

@9ary
Copy link
Author

9ary commented May 11, 2023

Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire?

No idea yet, but we run a twint job every 12 hours on github actions (https://github.com/catgirl-v/cubari/actions), so we'll find out soon enough.

That sounds great! Can you tell how stable this solution will be if you run twint regularly on a daily basis, i.e., how fast will the token/cookie expire? Will it work when you use the token/cookie to run twint on a different machine/IP address?

It's working so far.

@leonardoulloa21
Copy link

leonardoulloa21 commented May 12, 2023

Is working for you guys? In my case this error is popping up, any advice?

"ConnectionError: Access forbidden, try passing --auth-token."

@9ary
Copy link
Author

9ary commented May 12, 2023

Yes, it's working. I'm gonna need more details to help you. Did you in fact pass a valid authentication cookie as per the op? If so, please post minimum example that reproduces the problem.

@leonardoulloa21
Copy link

Do I need to pass a valid authentication cookie, how so? I just use the changes in this pr and try to execute my previous code the that error message popped up. How can I do what you recommed?

@9ary
Copy link
Author

9ary commented May 12, 2023

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

@luxoflux
Copy link

Brilliant solution, works just fine. Thanks.

@leonardoulloa21
Copy link

Sounds to me like you didn't read any of the conversation in #13 and here. The error message is very clear, you need an auth token. This is the whole point of this PR: Twitter now requires login to search. Instructions are in the op.

My bad, I though that csrf_token = random.randbytes(16).hex() was it but I need to replace it with my auth token witch I get from Firefox browser, right? because I did make the change and I'm still having the same error ("ConnectionError: Access forbidden, try passing --auth-token."). Maybe am I doing something wrong? Some help would be nice please :)

@9ary
Copy link
Author

9ary commented May 12, 2023

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

@leonardoulloa21
Copy link

leonardoulloa21 commented May 12, 2023

No, you don't have to modify the code. Pass the token with the --auth-token flag, or set the TWITTER_AUTH_TOKEN environment variable.

CSRF is unrelated, it's just that both changes were required to actually get it to work.

I have my code implemented in AWS Lambda with twint's library as a layer. I update the lib and set the env variable as mentioned but I still having the same error. Locally, I'm getting the same result, if you could I would love to have some help :)

[CRITICAL] 2023-05-12T20:53:44.334Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs
[CRITICAL] 2023-05-12T20:53:45.425Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data'
sleeping for 8.0 secs
[CRITICAL] 2023-05-12T20:53:53.524Z 38205fb9-65a5-41b2-b6ce-377909a1b4e3 twint.run:Twint:Feed:noData'data'
sleeping for 27.0 secs

@batmanscode
Copy link

batmanscode commented May 14, 2023

Thank you for the fix @9ary, works great! 😃

Tiny request, is it possible to add a wait time to prevent rate limits?

Looks like --min-wait-time is supposed to be automatically adjusted but I still get TokenExpiryException: Rate limit exceeded

ap.add_argument("--min-wait-time", type=float, default=15,
                    help="specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints")

@9ary
Copy link
Author

9ary commented May 14, 2023

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

@batmanscode
Copy link

For what it's worth, it seems the owner of this repo is inactive, so this PR is unlikely to be merged anytime soon. We've set up a fork at https://github.com/catgirl-v/twint.

@leonardoulloa21 @batmanscode please open issues over there with the code or command line invocation that reproduces your problems. It's not practical to do all development and troubleshooting in a single PR thread.

Makes sense, thanks!

@corpuzdonn
Copy link

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

@batmanscode
Copy link

I replacd everything on the changes on the py files of my twint but i keep getting the ones below on all of my searches.

module 'random' has no attribute 'randbytes'

You have to use python 3.9 or above. It's mentioned in some of the early comments

@corpuzdonn
Copy link

corpuzdonn commented May 19, 2023

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

@batmanscode
Copy link

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?

i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

@corpuzdonn
Copy link

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Thanks it's working now.

@leonardoulloa21
Copy link

leonardoulloa21 commented May 22, 2023

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint
import os
import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config()
c.Username = "BCPComunica"
c.Since="2023-05-21"
c.Limit = 100
twint.run.Search(c)`

I'm getting this error:
CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

@JoelBird
Copy link

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

I can only find Authentication tokens, and they're found in the developer portal, I didn't see any 'developer tools' or 'storage' on Firefox. Which of them Should I use?

@9ary
Copy link
Author

9ary commented May 24, 2023

@JoelBird hopefully this is detailed enough:

  • go to twitter.com
  • log in
  • press F12, the developer toolbox will appear
  • click the storage tab
  • on the left, select cookies > https://twitter.com
  • find the cookie named auth_token
  • double-click the value and copy it

@marquisvictor
Copy link

marquisvictor commented May 24, 2023

Hi @9ary, thanks for the fix. But for now, using the command line, only the -u parameter works, the search parameter -s isn't work. Any idea why it isn't. I'm trying to debug it here.

I'm getting CRITICAL:root:twint.run:Twint:Feed:noData'data' with twint -s pineapple
but twint -u username works fine

@corpuzdonn
Copy link

I'm having issues of Rate Limit exceeded? How do i fix this? what should i keep looping to override this?

@batmanscode
Copy link

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token

Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN

It should then run. Good luck

Hey @batmanscode

Would you mind testing my code and tell me if you are getting the same error message?

I'm trying to run it in jupyternotebook and then in AWS Lambda.

`import twint
import os
import nest_asyncio

os.environ["TWITTER_AUTH_TOKEN"] = "my_token"

nest_asyncio.apply()

c = twint.Config()
c.Username = "BCPComunica"
c.Since="2023-05-21"
c.Limit = 100
twint.run.Search(c)`

I'm getting this error:
CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs

Hope you can give a hand!

Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

@a-annor
Copy link

a-annor commented Jun 24, 2023

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token
Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN
It should then run. Good luck

Hey @batmanscode
Would you mind testing my code and tell me if you are getting the same error message?
I'm trying to run it in jupyternotebook and then in AWS Lambda.
import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)
I'm getting this error:
CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs
Hope you can give a hand!
Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

@corpuzdonn
Copy link

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token
Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN
It should then run. Good luck

Hey @batmanscode
Would you mind testing my code and tell me if you are getting the same error message?
I'm trying to run it in jupyternotebook and then in AWS Lambda.
import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)
I'm getting this error:
CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs
Hope you can give a hand!
Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output:
1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente.
1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝
...
...
...
@Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje.
1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos.
1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝
1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia
1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos.
1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos.
[!] No more data! Scraping will stop now.
found 0 deleted tweets in this search.

@batmanscode
Copy link

Thanks @corpuzdonn, maybe it's a token issue from my end

I attempted a huge scrape (4 weeks via search terms) and that got rate limited. Maybe that token wasn't valid after that

Have you tried long scrapes? I saw there's a time out parameter but even setting that very high didn't work for me

@leonardoulloa21
Copy link

For those that this is working for, would someone be able run through

Thank. I replaced it to 3.9 but I got 'Access forbidden, try passing --auth-token.' i saw that auth tokens were added. how can i add this to the search function?
i'm very new at this. How do i pass the auth token?

Login to twitter on Firefox -> developer tools -> storage -> Auth token
Then wherever you're running twint, save that as an environment variable called TWITTER_AUTH_TOKEN
It should then run. Good luck

Hey @batmanscode
Would you mind testing my code and tell me if you are getting the same error message?
I'm trying to run it in jupyternotebook and then in AWS Lambda.
import twint import os import nest_asyncio os.environ["TWITTER_AUTH_TOKEN"] = "my_token" nest_asyncio.apply() c = twint.Config() c.Username = "BCPComunica" c.Since="2023-05-21" c.Limit = 100 twint.run.Search(c)
I'm getting this error:
CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs
Hope you can give a hand!
Thanks in advanced

I'm not sure, sorry. I'm having the same issue :(

Hi all, is this still working for anyone? I'm experiencing the same issue as @leonardoulloa21

It's working fine on my end.

Output: 1672684481071976449 2023-06-25 03:13:51 +0800 @kvafelled @kvafelled se debe esperar el plazo de 15 días calendario aproximadamente. 1672663630951878656 2023-06-25 01:51:00 +0800 @kvafelled Hola, @kvafelled 👋 Te contamos que cuando se realiza una cancelación, anulación o reembolso de compra por parte de alguna empresa, estas tienen hasta 15 días (calendario) para proceder con la devolución del dinero a tu cuenta de ahorros. 🤝 ... ... ... @Maverick99210 ¡Hola, @Maverick99210! 👋 Lamentamos el inconveniente generado, por favor, envíanos tu DNI vía DM para poder orientarte de la mejor manera. Esperamos tu mensaje. 1668754764476346368 2023-06-14 06:58:34 +0800 @mendezt_29 Hola @mendezt_29 Envíanos un DM con la captura de pantalla de lo que te aparece y el número de tu DNI. Quedamos atentos. 1668710970741628928 2023-06-14 04:04:33 +0800 @PALICUYA ¡Hola Pilar!👋 Por favor envíanos un inbox con tu DNI y la imagen que te aparece aquí 👉🏻 https://t.co/HE00YFfJez. Quedamos atentos. 🤝 1668690239685267457 2023-06-14 02:42:10 +0800 @vladineitor Elsa, gracias por la información. Estamos reportando lo sucedido al equipo a cargo, para que se pueda hacer las consultas y verificaciones al respecto. Lamentamos mucho la molestia 1668687790408884224 2023-06-14 02:32:26 +0800 @vladineitor Hola, Elsa. Queremos conocer lo ocurrido. Por favor, detállanos vía DM el inconveniente presentado y la ubicación de la Agencia (avenida/calle/número/alguna referencia). Quedamos atentos. 1668631670076116997 2023-06-13 22:49:26 +0800 @RodrigoVinyas ¡Hola, Rodrigo! 👋 Nos importa mucho la experiencia de cada uno de nuestros clientes, agradeceríamos que puedas comunicarte al 01 311 9400 de L-V de 7:00 a.m. a 5:00 p.m. con nuestra área de Soluciones de Pagos, para solicitar alguna facilidad de pago o un compromiso de pagos. [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

Would you mind packling up your twint library and share it to us, please! I might be doing something wrong because I have just tried it and I got the same result :

CRITICAL:root:twint.run:Twint:Feed:noData'data'
sleeping for 1.0 secs

I don't think that this message is related to the auth token, it has to be something else...
Thanks in advanced for your time @woluxwolu

@corpuzdonn
Copy link

I am actually getting the following below all of a sudden. Did something change?

CRITICAL:root:twint.get:User:Expecting value: line 1 column 1 (char 0)

@9ary
Copy link
Author

9ary commented Jul 2, 2023

Most likely the latest Twitter changes require more API calls to be authenticated. Our scripts broke too but I'm currently on vacation. I'll have a look in a few days.

@corpuzdonn
Copy link

Has there been any updates. Idk if there was but my output has become:

CRITICAL:root:twint.run:Twint:Feed:noData'globalObjects'

@9ary
Copy link
Author

9ary commented Jul 19, 2023

The search endpoint returns 404, it looks like they've finally killed it off. This means twint will need to be reworked to use the graphql API, which is a lot more work than I'm willing to put in personally.

@corpuzdonn
Copy link

The search endpoint returns 404, it looks like they've finally killed it off. This means twint will need to be reworked to use the graphql API, which is a lot more work than I'm willing to put in personally.

I see. It's ok. Will find alternative solutions. Thanks for your hard work!

@9ary 9ary closed this Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRITICAL:root:twint.run:Twint:Feed:noData'globalObjects' No Data