Skip to content

An attention based approach to convert Indian Sign Language to Text using simulated hand gesture data

License

Notifications You must be signed in to change notification settings

thebrownkidd/ISL-to-text

Repository files navigation

Abstract: Data gathering can be a challenge while training artificial intelligence models for new tasks like sign language interpretation. In this case, there is limited data available for Indian Sign Language and it is therefore very difficult to build an interpreter for it. This technique hopes to reduce that challenge by allowing simulation of new data based on present data and some random sample of hands. This is done by using a random sample with a small variance in gestures, on basis of which changes can be made to the gathered data to simulate new training data built around the gathered data having the complexity in movements of the random sample.

The development of Artificial Intelligence is largely dependent on data collection and the usability of the collected data. Most models are trained on publicly available data sets which aren’t very difficult to access. Alternatively, models are also trained on gathered data which is either publicly available and can be converted into a dataset or data gathered by other methods specifically for the model. The latter is usually, a very time, money and resource intensive task. One such case is seen when attempting to acquire data for Indian Sign Language (ISL) to Text conversion. The limited number of resources on the internet include one example for every phrase or alphabet in the vocabulary. This paper tries to reduce this challenge; faced when gathering such types of data. This approach hopes to translate patterns followed by landmarks during subtle movements on a hand into gathered data samples in hopes of creating new samples such that these patterns are reproduced. This is a more space friendly, low complexity alternative to fine tuning pretrained models on said examples.Feeding an image of a open hand will output a set of 21 landmarks labelled as Pi where i ∈ [0,20] We may consider P0 to be the position of the hand. i.e. in a two-dimensional coordinate system, P0 indicated the coordinates in the form of (x0, y0) depicting where the hand is, referring to the bottom left corner of an image as (0,0). We may move the hand towards any direction (without affecting the rotation of the hand) by calculating the change in its P0 coordinates represented by ΔP0 and then adding the difference to all the remaining points. Using this point of reference, we enable ourselves to exploit other patterns that occur when hands move.

  1. Method: To simulate examples based on movements, the determination of the type of translation needed to carry the process out. Translations used can either be spatial; involving movement of hands while maintaining gesture in 3 dimensions or gestural; indicating any change in the gesture or sign made by the hand. The result of these calculations is the matrix T0, n, which contains 21 vectors with three dimensions each representing the changes in each of the corresponding vectors. These changes may be simulated into the base example by denormalization of these values and summing the denormalized values with a matrix of all vector points in H0.

  2. This simulated data is then centeralised and correccted for rotatio to be used in traning models that better adapt for real life situations.