Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q]: Is there a program which can assist in searching for specific keywords in open databases of certain Courts? #303

Closed
1 task done
Jacintha777 opened this issue Mar 4, 2022 · 46 comments
Assignees
Labels
can be worked on! ready to be worked on! (even self-assigned!) question Further information is requested

Comments

@Jacintha777
Copy link

Jacintha777 commented Mar 4, 2022

Contact Details

j.r.k.asarfi@tilburguniversity.edu

Shoot!

I am looking into the case law of national courts and in this regard I have to search for specific keywords in two databases. These databases are: http://www.ttlawcourts.org/index.php/law-library/search-librarys-holdings and http://rechtspraak.sr/
In the first database the specific keywords to search for are: referral, referral jurisdiction, Article 214 RTC, Caribbean Court of Justice, CCJ. The specific keywords for the second database are in Dutch namely: __**verwijzingsprocedure, Herziene Verdrag van Chaguaramas, Caribisch Hof van Justitie, verwijzing naar het Caribisch Hof van Justitie_****".

The expectation is that searching these databases for these specific keywords will result in cases which are relevant for my research.

Looking forward to your reply.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Jacintha777 Jacintha777 added the question Further information is requested label Mar 4, 2022
@hannesdatta
Copy link
Contributor

Hi @Jacintha777, thanks a bunch. Could you please provide a bit more detail on how the search is going to be executed?

1) With regard to ttlawcourts:

At http://www.ttlawcourts.org/index.php/law-library/search-librarys-holdings, I do not see a "keyword" field.
image

Further, does any filter need to be used on the Documents?
image

Further, please specify how the search results should be saved:

  • Do you require a list in Excel with these search results?

image

  • Or do you require the scraper to download the resulting PDF document?

image

2) With regard to rechtspraak.sr

In formulating this issue, please imagine you are instructing a Research Assistant to strictly follow a particular procedure. Without any "thinking". Just executing a procedure. That way, we can instruct a program to do the same thing. Thanks!

@Jacintha777
Copy link
Author

Jacintha777 commented Mar 4, 2022

Dear @hannesdatta, with regard to your first point: click on Supreme Court (second on the left under the logo of Judiciary of Trinidad & Tobago) and then on High Court, then you see the search the site option, there you can search for the keywords. I just did with the word 'referral and then you find cases in which referral is highlighted. The same procedure can be followed with regard to "Court of Appeal".

@Jacintha777
Copy link
Author

With regard to the filter on the documents: 'judgments' is preferred

@Jacintha777
Copy link
Author

Concerning how the search results should be saved: an excel file containing the name of the case and a sentence before and after the keyword to determine whether the keyword is used in the context of e.g. 'referral to the CCJ'.

@Jacintha777
Copy link
Author

Although I would appreciate it if the scraper could download the pdf document, if that is possible of course.

@Jacintha777
Copy link
Author

With regard to rechtspraak.sr. the search should be conducted in: https://rechtspraak.sr/uitspraken-databank/eenvoudig-zoeken/
The link that you are referring to is a more elaborate search which requires specific information such as case number which makes the search complicated as I do not have that information.

@Jacintha777
Copy link
Author

With regard to the format: the "text" like this?https://rechtspraak.sr/sru-hvj-2020-6/ is fine which is helpful to determine the context in which the keyword is used. But if it is possible to highlight the keyword in the document that would be great (if this is possible of course).

@Jacintha777
Copy link
Author

@hannesdatta, please let me know if you require further information. Thank you

@hannesdatta
Copy link
Contributor

@BilgeKasapoglu , is this something u could handle? I'd say develop it for the first site and we can then check how it performs. Please incest about 2-3 hours for now. MaYbe set up a meeting with Jacinta to clarify any issues.

Woud try beautiful soup first btw. Selenium may be an overkill. Check Tutorials at Odcm.hannesdatta.com for code snippets.

@BilgeKasapoglu
Copy link

Dear @hannesdatta,

I keep getting "SSLCertVerificationError" when I try to request the URL. Do you know any experience with such an error? Thank you

Best
Bilge

@hannesdatta
Copy link
Contributor

@BilgeKasapoglu, did you try to google this error? This search result seems to be relevant. Let me know please.

https://stackoverflow.com/questions/10667960/python-requests-throwing-sslerror

@Jacintha777
Copy link
Author

Jacintha777 commented Mar 14, 2022

@hannesdatta and @BilgeKasapoglu, thanks and curiously following your updates. I am available to meet on 15 and 16 March so let me know.

@hannesdatta hannesdatta added the can be worked on! ready to be worked on! (even self-assigned!) label Mar 14, 2022
@hannesdatta
Copy link
Contributor

@BilgeKasapoglu, let us know whether any input is required for working on this.

@hannesdatta
Copy link
Contributor

@BilgeKasapoglu, also inform jacintha about expected date of delivery (plus allow some time for me to review the final product).

@BilgeKasapoglu
Copy link

Dear @hannesdatta and @Jacintha777

I think I can work on this on Thursday if it is okay with you. I can can it to you by Friday noon, @hannesdatta. Thank you

Best
Bilge

@Jacintha777
Copy link
Author

Dear @hannesdatta and @Jacintha777

I think I can work on this on Thursday if it is okay with you. I can can it to you by Friday noon, @hannesdatta. Thank you

Best Bilge

@Jacintha777
Copy link
Author

Dear @BilgeKasapoglu, that sounds great. I look forward to the results after @hannesdatta has reviewed the final product.
Kind regards,
Jacintha

@Jacintha777
Copy link
Author

Jacintha777 commented Mar 15, 2022

@hannesdatta and @BilgeKasapoglu, my apologies, I closed this issue by mistake. What I also wanted to comment on: this is for the website of Trinidad and Tobago and I am really pleased to hear from both of you that it can be worked on. I hope you are also successful with the website of Suriname (rechtspraak.sr), which is quite a challenge. Thanks. Kind regards,
Jacintha

@BilgeKasapoglu
Copy link

Dear @Jacintha777,

would you guide me how to search for the keywords in the second website? Is it through "Zoeken"? Thank you

Best
Bilge

@BilgeKasapoglu
Copy link

Dear @hannesdatta,

I scraped the first website. Usually, there are less than 20 results. However, "Caribbean Court of Justice" gives 50 results. The scraper only gets the first 20 results. In the past, I came across with such a problem such that the results on separate pages. However, I never understood how to solve it. Would you please help me?

Also, how should I share the code and files with you? For now, I will send them through Microsoft Teams? Thank you

Best
Bilge

@Jacintha777
Copy link
Author

Jacintha777 commented Mar 16, 2022 via email

@hannesdatta
Copy link
Contributor

hannesdatta commented Mar 17, 2022

@BilgeKasapoglu - I checked your Teams message. Thanks a bunch for your work!

Please keep the communication / results etc. on GitHub (all project-related communication needs to be here, not somewhere else).

The main goal here is that @Jacintha777 can run the notebook herself. At Tilburg Science Hub, we don't "DO" the job, but we make/give our colleagues tools so they can do it themselves. Plus we share them online.

Accordingly, please:

  • annotate the jupyter notebook with markdown cells (e.g., setup, scraper for this, scraper for that)
  • annotate the jupyter notebook with comments (e.g., this is what happens here, this is how I do this, this is where stuff gets saved)
  • test the notebook on colab.google.com, because that's probably the most ideal place for @Jacintha777 to run her work ultimately (saves a lot of setup costs)
  • use functions as much as possible, see https://tilburgsciencehub.com/write/good-code

Let us NOT ship any excel files - Jacinta will have to edit queries in search herself.

Please post your updated notebook here for another round of feedback. Alternatively, post the notebook on gist.github.com.

Old version:
scrapeCourts.ipynb.zip

@Jacintha777
Copy link
Author

Dear @hannesdatta, thanks for this update. Please don`t forget that I have zero knowledge on how to use a scraper. Therefore guidance from @BilgeKasapoglu might be necessary. Kind regards, Jacintha

@hannesdatta
Copy link
Contributor

@Jacintha777, no worries. Google Colab has a point&click interface & @BilgeKasapoglu can walk you through how to use it (it's really just clicking on it, changing the search query, and waiting for the results to be downloaded). If you get that to run, it's way more useful for you.

@Jacintha777
Copy link
Author

@hannesdatta, thanks and looking forward to the session with @BilgeKasapoglu. Kind regards, Jacintha

@BilgeKasapoglu
Copy link

scrapeCourts.ipynb.zip

@Jacintha777 and @hannesdatta,

Here is the most up-to-date version of the notebook. I also created a version on Google Collab and added you two. I hope you two can see it there now. I must say the second website is difficult to work with because it does not search for the keywords as a whole such that if I search for van huizen, it gives all the results containing "van" and "huizen" even separately. Below, I am putting some additional information for @Jacintha777 to get the class names in each website. Thank you

Best,
Bilge

@BilgeKasapoglu
Copy link

Dear @Jacintha777 and @hannesdatta,

Below you can find additional information on scraping a website. It is about how to get the class name of the objects that we want to scrape. Let me know if anything is unclear. Thank you

Best
Bilge
scrapingAdditionalInfo.pdf

@Jacintha777
Copy link
Author

@BilgeKasapoglu and @hannesdatta, thank you very much.
Today I will not be able to try out the scraping tool, because of various meetings. Therefore, I will try it out in the weekend. I will let you know how it went on Monday and whether I need a session with @BilgeKasapoglu to guide me via a Teams meeting.

@hannesdatta @BilgeKasapoglu, I expected the complications with the second website, because I tried searching the website rechtspraak.sr as well by entering the keywords separately in : eenvoudig zoeken. So I recognize what @BilgeKasapoglu found with the 'van 'and huizen' words. I also did a similar search with the website of Trinidad & Tobago and there some of the keywords delivered results. So I am really looking forward to the results of the scraping tool

Thanks and will update both of you on Monday. Have a good weekend. Kind regards, Jacintha

@Jacintha777
Copy link
Author

Dear @BilgeKasapoglu, I opened the link with Notepad and from there I had no idea how to proceed. And I have another question: how do I get access to Google Collab? It would be very helpful if you could guide me through it. In this regard, I would appreciate a session via zoom of Teams. Tomorrow I will be at the university and I am not sure if you also work from there. I am available for a session via zoom or Teams on Wednesday 23 or Thursday 24 or Friday 25 March. Please let me know which date and time is convenient for you. Thank you, Jacintha

@BilgeKasapoglu
Copy link

Dear @Jacintha777,

Tomorrow i have a meeting with my supervisors at 3PM and I am trying to give them an end result for that meeting. would it be possible to hold the meeting after 16:30 for you? I will be at the university the whole day. Thank you

Best
Bilge

@Jacintha777
Copy link
Author

Dear @BilgeKasapoglu, see you tomorrow after 16.30. My office is in the M-building room M312. Good luck with your meeting!

@hannesdatta
Copy link
Contributor

Dear @BilgeKasapoglu, please move the scraper code to a repository where we can actually collaborate on the files. See https://github.com/tilburgsciencehub/onboarding/wiki/Workflow. Any feedback required from me at this stage?

@BilgeKasapoglu
Copy link

Dear @hannesdatta,

here is the repo : https://github.com/tilburgsciencehub/courtScraping

Bilge

@hannesdatta
Copy link
Contributor

hannesdatta commented Mar 25, 2022

@BilgeKasapoglu:

  • implement looping through search results using iterators and limits: https://www.ttlawcourts.org/index.php/advance-search?searchword=court%20of%20appeal&searchphrase=all&limit=50&start=55

@Jacintha777 - please specify a search on the Surinamese site that produces "valid" results.

E.g., we're trying w/
image

But the search results are quite meaningless.

Can you use these results at all?

image

The site just produces results with "van"... not really intended, right?

@Jacintha777 please advise how to go ahead here.

@BilgeKasapoglu
Copy link

Dear @hannesdatta ,

@BilgeKasapoglu is my username. I guess you have been @'ing someone else.

Best
Bilge

@tilburgsciencehub tilburgsciencehub deleted a comment from Bilge Mar 25, 2022
@hannesdatta
Copy link
Contributor

noted ;)

@Jacintha777
Copy link
Author

@hannesdatta and @BilgeKasapoglu, the results with 'van' are indeed useless. I think that the rechtspraak.sr website will not deliver any results. I just tried 'herziene verdrag Chaguaramas' and I get other results not relevant for my research and no results on 'Chaguaramas'. So at least I can say that I tried scraping this website but with no results. many thanks for trying.

@BilgeKasapoglu, with regard to Trinidad and Tobago: do you get the same results of the High Court when searching the Court of Appeal?

@BilgeKasapoglu
Copy link

Dear @hannesdatta and @Jacintha777,

I have modified and delivered the outcomes to Jacintha last week. We have concluded that the second website was not so workable. For the first website, I created a python code to scrape it. I shared the file with you on a repository and Google Collaborator.

Best
Bilge

@Jacintha777
Copy link
Author

Jacintha777 commented Apr 1, 2022 via email

@Jacintha777
Copy link
Author

Dear @hannesdatta and @BilgeKasapoglu,

Thank you once again.

@hannesdatta, I know you are quite busy, but I still would like to enquire about the following: how can the python code find one case and not another? When searching the website of Trinidad and Tobago, the keyword 'referral' delivered the case of Jhamilly Hadeed of 2019. However, there is another case of 2021 which should have been detected when searching for the keyword 'referral'. Do you have an idea or a possible explanation why one case is detected and the other not?

Thank you and looking forward to your reply.

Kind regards,
Jacintha

@hannesdatta
Copy link
Contributor

@BilgeKasapoglu, can you comment/have an idea?

@BilgeKasapoglu
Copy link

Dear @Jacintha777,

Would you share the exact names of the cases? also, is the search under High Court or Court of Appeal?

Thank you
Bilge

@BilgeKasapoglu BilgeKasapoglu reopened this Apr 1, 2022
@BilgeKasapoglu
Copy link

@Jacintha777
also which case should have been showed up in 2021? Thank you

@Jacintha777
Copy link
Author

Dear @BilgeKasapoglu and @hannesdatta ,

Thank you, so if I understand correctly, if the word 'referral' is not mentioned in the title or summary of the case on the website then the scraper will not be able to find it. If that is the case then this answers my question. The conclusion that I draw from this is that the scraper is not able to help me detect cases from the website which mention referral because if they are not in the title and case summary (but in the text somewhere) then I will not be able to find such cases. I had hoped that such would be the case which would make it easier for me. Can this be fixed so that it can search everywhere on the website and not only title and summary only?

Thank you once again.

Kind regards,
Jacintha

@BilgeKasapoglu
Copy link

Dear @Jacintha777,

To my knowledge, I do not know how to fix it. I think the scraper cannot help with this. Maybe @hannesdatta knows more about this but probably it is the faulty design of the website. Thank you

Best
Bilge

@hannesdatta
Copy link
Contributor

Hi all,
@Jacintha777, a web scraper captures text that is visible on a website. As the text is not visible on the website, we can't capture it. Our idea was to use the site's search function (right, @BilgeKasapoglu?) to get you the articles, but the search functionality seemed very limited.

My approach would be to use broader search words (which you could change in the Jupyter Notebook) and download all cases, and then use the PDF/text tool we have developed for you earlier to search through this data.

Note that this is a massive undertaking that we can't facilitate at this stage.

What I would say is you try to develop the code from here.

If you want to learn coding in Python, you can also enroll to https://odcm.hannesdatta.com (starting in September) where you will learn python & scraping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be worked on! ready to be worked on! (even self-assigned!) question Further information is requested
Projects
Development

No branches or pull requests

3 participants