-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logic of the scripts #2
Comments
Hi Alberto,
I've just got your email back from work. The script order looks good.
However, omit the step 3. As noted in the script, it is for when you want
to get data in one go which will not happen without a special API key.
The collect_steamid.py is using 2 string variables: vacbanned_last20 and
vaclist_last20. Both are websites that list the last 20 steam ids that were
entered into the website to check for VAC ban status. Thus, this script
will work exclusively for those websites.
However, because of their requests, I cannot give out the websites in
questions. Steps to fix:
1. Google the websites that give VAC ban status
2. Locate where the last 20/10 id links
3. Change the script to work with those links in questions. It should be
easy with requests parsing.
Secondly, you will need to play around with collecting data scripts since
the steam API allows only 100k requests which is only enough for 5k-7k
steamIDS at a time.
If you have any questions, please let me know.
Best,
Vinh Hang
…On Tue, Jan 5, 2021 at 8:55 AM Alberto Cereser ***@***.***> wrote:
Hello again, could you check if I am running the scripts in the right
order? As far as I understood, the recipe is the following:
1. init.py --> initialise the steamids.txt file, which lists a few IDs
of Steam users.
2. Get more data using 00_collect_steamid.py. The script keeps running
until stopped.
3. 00_collect_private_public_index.py --> collect the status of the
IDs (they can be public or private).
4. 00_collect_data.py --> collect statistics for the listed IDs which
are also public.
5. 01_clean.py --> Clean collected data, combining datasets into a
single one.
6. 02_eda.ipynb --> Do some data analysis.
7. 03_resample.py --> Resample, to take into account that our dataset
is not balanced. Considered approaches: under-sampling and over-sampling.
8. 04_models.py --> Run and optimise ML methods for cheaters detection
Am I missing something? Also, when I launch 00_collect_steamid.py I get a
NameError (name 'vacbanned_last20' is not defined). Do you know how to
fix it? Thanks heaps!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHDHB3Z6OFFNTQY66XI6JITSYNAANANCNFSM4VVSFTUQ>
.
|
Hi Vinh,
Thank you for your reply and for the clarification! One last question: how
should I define `sampling_df, sampling_score = sampling_dict.copy(),
result_dict.copy()` in 04_models.py?
Best,
Alberto
…On Tue, Jan 5, 2021 at 11:56 PM Vinh Hang ***@***.***> wrote:
Hi Alberto,
I've just got your email back from work. The script order looks good.
However, omit the step 3. As noted in the script, it is for when you want
to get data in one go which will not happen without a special API key.
The collect_steamid.py is using 2 string variables: vacbanned_last20 and
vaclist_last20. Both are websites that list the last 20 steam ids that were
entered into the website to check for VAC ban status. Thus, this script
will work exclusively for those websites.
However, because of their requests, I cannot give out the websites in
questions. Steps to fix:
1. Google the websites that give VAC ban status
2. Locate where the last 20/10 id links
3. Change the script to work with those links in questions. It should be
easy with requests parsing.
Secondly, you will need to play around with collecting data scripts since
the steam API allows only 100k requests which is only enough for 5k-7k
steamIDS at a time.
If you have any questions, please let me know.
Best,
Vinh Hang
On Tue, Jan 5, 2021 at 8:55 AM Alberto Cereser ***@***.***>
wrote:
> Hello again, could you check if I am running the scripts in the right
> order? As far as I understood, the recipe is the following:
>
> 1. init.py --> initialise the steamids.txt file, which lists a few IDs
> of Steam users.
> 2. Get more data using 00_collect_steamid.py. The script keeps running
> until stopped.
> 3. 00_collect_private_public_index.py --> collect the status of the
> IDs (they can be public or private).
> 4. 00_collect_data.py --> collect statistics for the listed IDs which
> are also public.
> 5. 01_clean.py --> Clean collected data, combining datasets into a
> single one.
> 6. 02_eda.ipynb --> Do some data analysis.
> 7. 03_resample.py --> Resample, to take into account that our dataset
> is not balanced. Considered approaches: under-sampling and over-sampling.
> 8. 04_models.py --> Run and optimise ML methods for cheaters detection
>
> Am I missing something? Also, when I launch 00_collect_steamid.py I get a
> NameError (name 'vacbanned_last20' is not defined). Do you know how to
> fix it? Thanks heaps!
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#2>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AHDHB3Z6OFFNTQY66XI6JITSYNAANANCNFSM4VVSFTUQ
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACDP24AKKIY2ILH6CRGS7DSYOYMFANCNFSM4VVSFTUQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello again, could you check if I am running the scripts in the right order? As far as I understood, the recipe is the following:
init.py
--> initialise thesteamids.txt
file, which lists a few IDs of Steam users.00_collect_steamid.py
. The script keeps running until stopped.00_collect_private_public_index.py
--> collect the status of the IDs (they can be public or private).00_collect_data.py
--> collect statistics for the listed IDs which are also public.01_clean.py
--> Clean collected data, combining datasets into a single one.02_eda.ipynb
--> Do some data analysis.03_resample.py
--> Resample, to take into account that our dataset is not balanced. Considered approaches: under-sampling and over-sampling.04_models.py
--> Run and optimise ML methods for cheaters detectionAm I missing something? Also, when I launch
00_collect_steamid.py
I get a NameError (name 'vacbanned_last20' is not defined
). Do you know how to fix it? Thanks heaps!The text was updated successfully, but these errors were encountered: