Skip to content

TheShubhamArya/AudioVision

Repository files navigation

AudioVision

Slides

This is an iOS application that aims to help blind people see the text around them. This is made possible with the help of

  1. Speech recognition - to take users' commands and turn them into actions in the app
  2. Text Detection - to detect text out of an image
  3. Image Stitching - to stitch multiple images together to create 1 long image which can be used to detect the text
  4. Natural Language Processing - to detect spelling errors in the detected word and correcting it
  5. Speech Synthesizer - to turn the processed text in to speech.

Text Detection

Screen Shot 2022-04-24 at 11 23 49 PM

RPReplay_Final1650861001.MP4

Image Stitching

RPReplay_Final1650861508.MP4

Steps

  • Uses image registration requests from the vision framework to calculate an alignment transform between the 2 images.
  • This uses a homographic image registration mechanism.
  • A perspective transform filter is used for the homographic image registration.
  • The warped image is then place on the base image to create a single image.

References

About

This project is my submission for Apple's Swift Student Challenge for 2022.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages