Skip to content

yashrajdesai/GSoC-2021-Report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Google Summer of Code 2021

with

GSoC with CDLI

About the project

The project mainly focuses on enhancing the Search and Advanced search features in the CDLI framework and adding new features to it using Elasticsearch and CakePHP.

Proposal : Discovery search and advanced search features
Contributions to CDLI : Merge Requests
Weekly Blogs : Blog

Mentor : Vedant Wakalkar

Objectives and Deliverables

# Objectives Associated Deliverables issue(s) Pull Requests Status
1 Add “Ids” and “Keywords” search fields to both simple and advanced search Users will be able to search for specific keywords, Id/Numbers artifacts #314 !317, !307 ✔️
2 Implementation of fuzzy queries Fuzzy queries would yield search results in all search fields #593 !317 ✔️
3 Port request to Elasticsearch from cURL to HttpClient Replaced cURL implementation with HTTP Client #350 !338 ✔️
4 Highlight transliteration sign values in ATF display The sign values will be highlighted in the full and compact search results page #347 !354 ✔️
5 Enable search inscription with sign value permutation When a user will enable this search feature and search for sign values, all possible sign values with matching sign names of the query will be returned #596 !375 ✔️
6 Search settings integration Users will be able to save specific configuration of search settings and search results will be displayed accordingly. #540 !332 ✔️
7 Input flexibility enhancements Users will have the flexibility to search with both UTF-8 and ASCII characters #597 !375 ✔️
8 Filter search results by RTI Image, Transliterations , 3D Data Users can apply filters such as RTI Image, Transliterations, 3D Data and get search results #136 !369 ✔️

Preview of objectives

1. Keywords search

  • Final outcome:
    Keywords search field can internally query all the fields of the database and return results accordingly.
  • Methodology:
    • Used Elasticsearch query DSL format in backend.
Keyword's search
Search results for "Vorderasiatisches Museum" in Keyword's field.

2. Fuzzy Id's search

  • Final outcome:
    Id's Search should yield results even if input query is not in exact format.
  • Methodology:
    • Processed the input query by applying regex operations before performing search.
Fuzzy Id's search
Search yields results even for improper input format "A1169" for Museum Id "OIM A01169" in Id's search

3. Highlight inscriptions

  • Final outcome:
    Highlight the inscription input in search results.
  • Methodology:
    • The text in the inscription field of each search result was processed using regex so that it can highlight the input query.
Highlight inscriptions
Highlights the inscription input "muk" in search results

4. Sign Value permutation

  • Final outcome:
    Search results of all possible sign-readings of input sign-values are returned.
  • Methodology:
    • Sign names field was added and populated in database.
    • Containerised jtf-lib library in docker to make requests and get response in the framework.
    • Input query was converted to sign-names using jtf-lib and these sign-names along with sign-values are used to perform the search.
Sign Value permutation
All possible sign-readings of input "muk" can be searched with sign-name "MUG".

5. Search Settings

  • Final outcome:
    Search settings can be saved in session and search results will be displayed accordingly.
  • Methodology:
    • Used cakePHP sessions to store the search settings and applied it on the search results.
Search Settings
Removed "Museum collections" and "Period" from search results by modifing Search Settings.

6. Input flexibility enhancements

  • Final outcome:
    Users can search with both UTF-8 and ASCII characters
  • Methodology:
    • Used UTF8 to ASCII mapping for converting the input into ASCII before performing search.
Input flexibility enhancements
UTF-8 input "diš2" yields search results for ASCII format "disz2"

7. Images and Transliteration Filter

  • Final outcome:
    Search results can be filtered w.r.t to Images and Transliteration according to the access of the user.
  • Methodology:
    • Created new index for "Images" table.
    • Added elasticsearch queries which would filter results according to the access.
Images and Transliteration Filter
Search results filtered having image type as "photo".

To Do (Post GSoC)

  • Robust Testing of all newly added features.
  • Documentation (User's and Developer's)

Acknowledgements

Participating in GSoC was a great learning curve for me, I faced alot of challenges in this journey which helped me in learning new skills. I would like to thank my mentors Vedant Wakalkar and Émilie for guiding me and helping me throughout my GSoC journey. I shall always be indebted to such a welcoming organisation to help me enhance my coding skills.

About

Final report of GSoC 2021 Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published