Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

[QUESTIONS] twint and python3 twint.py don't show the same results #309

Closed
michaelabrt opened this issue Dec 9, 2018 · 28 comments
Closed
Labels

Comments

@michaelabrt
Copy link

michaelabrt commented Dec 9, 2018

Hello there,

First of all, this is a great tool. I saved so much time using it.

  • When I'm trying to use the script directly with python, it displays less info than with the executable. Are there options to add when using the python script? For example: python3 twint.py -u username/twint -u username don't display the same results.

  • One last question, is it possible to use the --elasticsearch option without Kibana (with a custom dashboard I'll create myself)?

Thanks in advance for your help

@michaelabrt michaelabrt changed the title [QUESTIONS] About twint in general (doc, ES, etc.) [QUESTIONS] twint and python3 twint.py don't show the same results Dec 9, 2018
@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

Hi @mubartok ,

python3 twint.py has been replaced with twint command. The output is handled by twint/output.py and it did not change since a while. May you tell me which differences did you find?

About Elasticsearch, Kibana and Elasticsearch are part of the ELK stack so you can use the second one without the first one, this one is like a "front-end"

Best

@michaelabrt
Copy link
Author

Hi @pielco11,

Thanks for your quick answer. When I'm using the twint command, I get some stats that I don't get if I'm using python3 twint.py. Here is an example:

screenshot 2018-12-09 at 13 55 00

Is it possible to get exactly the same results when using python3 ? If so, how?

Thanks about Elasticsearch, I don't know if it's more efficient to use it than a classic firebase but I'll try to figure it out.

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

python3 usage is deprecated, the user info that you see can be simply returned with twint --user-full -u username. So if you have a "not too much" outdated version of twint, you should run python3 twint.py --user-full -u username

are there any specific reason why you are still using python3?

About Elasticsearch, I use it because it's scalable, has replicas, you can index a few 1000s of tweets per second and a lot of other features as "score" in results and full-text search. If you want to do some tests please feel free to provide us benchmarks so that we can compare solutions and maybe provide other storing options

@michaelabrt
Copy link
Author

Thanks to you, I realised that I had a greatly outdated version of twint. I now downloaded the latest repo. Yes, there is a specific reason why I want to use python3, I'd like to make a standalone executable with pyinstaller to be able to use it easily with nodejs. With the outdated version, I just had to run this command: pyinstaller twint.py -F but now I can't make it work, I don't know but you might be able to help me on this? I'm not a regular user of python...

By the way, thanks a lot for everything, for twint, the quick answers and the great help.

@michaelabrt
Copy link
Author

About Elasticsearch, I made a quick search and I think it's wayyy more efficient than firebase for this usage ;)

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

To better integrate Twint with other technologies we will create a "Twint API" so that you will be able to query requests via http(s). If you really need a portable executable of Twint, I suggest you to integrate sys.argv or even argparse into a little script, and then compile the script:

import sys
import twint

c = twint.Config()
c.Username = sys.argv[1]

twint.run.Search(c)

@michaelabrt
Copy link
Author

Very nice for the Twint API, do you know approximately the delay before it's available?

I want to use a standalone executable to get rid of python and be able to run it from any computer, just by running the command.

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

Honestly I'm not able to provide you a specific date, but almost sure after the end of this month.

I want to use a standalone executable to get rid of python and be able to run it from any computer, just by running the command

you could get the old Twint.py and "compile" it. Please consider that the compiled twint will run only with the os/arch that you compiled with. If you compile twint in GNU/Linux it will not run on Windows, not because Twint is bad but because that's how python (and others) works

@michaelabrt
Copy link
Author

I did what you said for the script, but now I have this:

'NoneType' object is not subscriptable [x] get.User

And a lot of results that don't have any link with my request.

@michaelabrt
Copy link
Author

Ok, I'll keep myself updated then 😉

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

I tried that little script and it works correctly with my query, so I guess that it depends by one that you are requesting. Do you still get the errors without using the compiled version? (there should not be differences because you are not changing anything)

@michaelabrt
Copy link
Author

Yes, I'm getting the same errors when I'm using the script.py I just created, I guess I did something wrong.

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

Too early for conclusions, sometimes is just Twint that is not playing well with hidden tweets or Twitter that confuses itself. May you provide me your query so that we find a solution?

@michaelabrt
Copy link
Author

Sure, it doesn't work whatever the request. I try with my twitter profile which is empty so it's easier to test, I'm using the following command: python3 script.py -u "michaelaboukrat".

But I get the same error with any other request, -s "...", etc.

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

import sys
import twint

c = twint.Config()
c.Username = sys.argv[1]

twint.run.Search(c)

This is really basic and should be used with python3 script.py username, here -s and other args are not handled. If you want to handle args in a better way you should use argparse, so without re-inventing the wheel you can just use the old Twint.py

@michaelabrt
Copy link
Author

Oh yeah, I didn't even read the script... Thanks, I'll think about a solution later 🙂

Thanks for your time!

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

@mubartok great! In case feel free to ping me!

@michaelabrt
Copy link
Author

@pielco11 thanks! I have an error with ES (missing [lat] field), which I think comes from the python code (missing check). Do I open a new issue?

Here the end of the error stack:

raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors) elasticsearch.helpers.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'twinttweets', '_type': 'items', '_id': '1067987073158836224_raw_', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'parse_exception', 'reason': 'field [lat] missing'}}, 'data': {'id': '1067987073158836224', 'conversation_id': '1067987073158836224', 'created_at': 1543462935000, 'date': '2018-11-29 04:42:15', 'timezone': 'CET', 'place': 'Legislative Assembly of Ontario', 'location': '', 'tweet': 'Pleased to welcome back @GSK, @Roche, and @NovoNordiskCA to Queens Park today. #onpoli #ForThePeople #Mississauga #Streetsville @ONgov pic.twitter.com/Mo84hImcYh – à Legislative Assembly of Ontario', 'hashtags': ['#onpoli', '#ForThePeople', '#Mississauga', '#Streetsville'], 'user_id': 1202902675, 'user_id_str': '1202902675', 'username': 'ninatangri', 'name': 'Nina Tangri', 'profile_image_url': 'https://pbs.twimg.com/profile_images/989167009001713666/ywFfPVmp.jpg', 'day': 4, 'hour': '10', 'link': 'https://twitter.com/ninatangri/status/1067987073158836224', 'retweet': None, 'essid': '', 'nlikes': 8, 'nreplies': 1, 'nretweets': 1, 'quote_url': '', 'search': '@Roche', 'near': None, 'geo_tweet': {}}}}])

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

Did you get something like this in the output?
immagine

@michaelabrt
Copy link
Author

michaelabrt commented Dec 9, 2018

I got this:

screenshot 2018-12-09 at 18 24 21

But the first time it worked well, it created the index patterns in ES and ran normally

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

Indexes are created only once, so I guess that some tweets are indexed but others break the code, is this correct?

@michaelabrt
Copy link
Author

Yes, I guess (by reading the error message) because some tweets don't have lat/long?

@pielco11
Copy link
Member

pielco11 commented Dec 9, 2018

@mubartok that's correct, may you provide me your query so that I can see in details how to mitigate the error?

@michaelabrt
Copy link
Author

Sure, there it is: twint -s "#asco2018" -es localhost:9200

@pielco11
Copy link
Member

pielco11 commented Dec 10, 2018

In that case the place is McCormick Place Lakeside Center and you get an error because geoLocation() can't find coordinates of that place

So for now I'm going to add a workaround

Update your local repo with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

@aashu4uiit
Copy link

@pielco11 It's an old thread but I ran into this problem today. Same issue. is it already fixed? I tried the workaround mentioned in this thread but that did not help.

Workaroud: Update your local repo with pip3 install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Query:twint -s "#asco2018" -es localhost:9200

File "c:\users\asharma\appdata\local\programs\python\python38\lib\site-packages\elasticsearch\helpers\actions.py", line 188, in _process_bulk_chunk_success
raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
elasticsearch.helpers.errors.BulkIndexError: ('1 document(s) failed to index.', [{'index': {'_index': 'twinttweets', '_type': 'doc', 'id': '1267486697958359045_raw', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': "failed to parse field [mentions] of type [keyword] in document with id '1267486697958359045_raw'. Preview of field's value: '{screen_name=Mzaarour09, name=Mazen Zaarour, MD, id=759559674270101505}'", 'caused_by': {'type': 'illegal_state_exception', 'reason': "Can't get text on a START_OBJECT at 1:849"}}, 'data': {'id': '1267486697958359045', 'conversation_id': '1267249922400755713', 'created_at': '2020-06-02 02:02:32 AUS Eastern Standard Time', 'date': '2020-06-02 02:02:32', 'timezone': '+1100', 'place': '', 'tweet': '@ADesaiMD 2nd anniversary here 🎉 I joined during #ASCO2018 because of colleague @Mzaarour09 , crazy that I joined @twitter because of @asco', 'language': 'en', 'hashtags': ['ASCO2018'], 'cashtags': [], 'user_id_str': '1003334939016867842', 'username': 'saramatarmd', 'name': 'Sara Matar', 'day': 1, 'hour': 2, 'link': 'https://twitter.com/saramatarmd/status/1267486697958359045', 'retweet': False, 'essid': '', 'nlikes': 2, 'nreplies': 0, 'nretweets': 0, 'quote_url': '', 'video': 0, 'search': '#asco2018', 'near': None, 'user_rt_id': '', 'user_rt': '', 'retweet_id': '', 'retweet_date': '', 'reply_to': [{'screen_name': 'ADesaiMD', 'name': 'Aakash Desai', 'id': '1134767467740434432'}], 'mentions': [{'screen_name': 'Mzaarour09', 'name': 'Mazen Zaarour, MD', 'id': '759559674270101505'}, {'screen_name': 'Twitter', 'name': 'Twitter', 'id': '783214'}, {'screen_name': 'ASCO', 'name': 'ASCO', 'id': '20187065'}]}}}])

@tmerien
Copy link

tmerien commented Nov 22, 2020

Same issue with twint -u "username" -s localhost:9200

@noahh40
Copy link

noahh40 commented Dec 18, 2020

@tmerien did you figure out a solution? I am also running into this error.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

5 participants