UNCOVer

UNCOVer : UNsighted COmputer Vision

Introduction

UNCOVer is an accessibility tool for the sightless person that is worn like sunglasses. UNCOVer will use the technology of Artificial Intelligence to aid the user in identifying and locating objects and texts accurately.

UNCOVer differentiates itself from the currently available products by offering multiple object detection complete with object name and location, powerful and precise object description by detecting the object pointed by the user's finger, and optical character recognition that enables the user to "hear" the characters, all in a single, easy-to-use package, and reasonable price within reach of many people.

UNCOVer core components consists of a single camera positioned to mimic user's field of view. This camera will be used to capture an image of the objects and texts to be analysed. To facilitate user with easy and simple to use interface, UNCOVer will feature a microphone powered by speech recognition so that the user can give command directly by speaking without pressing any button. All information will be given to the user as speech via the provided earphone.

UNCOVer's prototype will use a single Raspberry Pi 3 Model B to handle and process the information. To detect object and recognise characters, UNCOVer will be powered by Azure Cognitive Service for reliable object & character recognition and speech services to deliver the best possible experience.

To make sure blind people are within reach of the device, UNCOVer will be priced in a relatively low cost between 100 to 150 USD. The UNCOVer will be distributed to local medical vendors to reach the users.

With all of those feaures, it is hoped that UNCOVer will give blind people independence in identifying and locating object, and reading text so they can enjoy life as much as normal people, making them uncover the countless information of the world.

How UNCOVer Works

General Flowchart on How UNCOVer Works

When the device is turned on, the device will enter an initial state where the microphone will continuously record sound. The software in the raspberry pi that communicates with Azure Speech SDK will constantly perform speech recognition on the recorded stream of sound. If a sentence is completed, the software will then match the sentence in the recognised speech with the sentence of the available commands.

If the sentence "What's in front of me" is found within the recognised speech, the camera on the device will capture an image in the direction where the user is facing. The software will then send the image to Azure's Computer Vision API. The API will perform finger detection which searches the user's finger on the image, and then return the information of the finger's location and orientation in the image if a finger is detected, otherwise the API will return without the information.

UNCOVer will perform the task of detailed object recognition of the pointed object if a finger is detected. First, the software will crop the image based on the finger orientation to discard unwanted areas and minimise the search area. The finger location will be saved as a reference point for the later process. The cropped image will be sent to the Computer Vision API that will perform object recognition. The API will return complete information of the object(s) detected. If there is no object detected, then UNCOVer will say a message "No object is detected" and return to initial state. Else, the program will start calculating the distance from each detected object to the reference point. UNCOVer will pick the nearest object from the reference point and say the complete description of the chosen object, which is the pointed object.

The other task that UNCOVer will perform is general object recognition where a finger is not detected. The software will directly send the image to the Computer Vision API to recognise object(s) and will return information of object(s)'s name and position. If no objects are detected, UNCOVer will say "No object is detected". Otherwise, the device will say each object(s)'s name and relative position.

Back to the recognised speech, if the sentence "Read Text in front of me" is found, the camera will capture an image in the direction where the user is facing. The software will send the image to the Computer Vision API to perform character recognition in the image. The API will then return the information containing the text. If there are no text or any characters in the image, the returned text information will be blank and the device will say "No text or character is detected". Otherwise, the device will say all detected text in the image.

UNCOVer will use Azure Speech Service for text-to-speech conversion to give the best output speech for the user. When the object or text is detected and contained in the information returned by the API as described previously, the string in the information that will be spoken to the user will be parsed first, and then sent to the Speech Service for TTS request. The Speech Service will return the speech as a .wav audio file. UNCOVer will then say the information by playing the audio file on the earphone. For instances where no object or text is found, the pre-recorded audio file will be played instead.

Software Tools that Empower UNCOVer Prototype

Things that empower UNCOVer

Azure Vision Cognive Services

Extract rich information from images to categorize and process visual data—and perform machine‑assisted moderaon of images.

Azure Speech Services

Swily convert audio to text for natural responsiveness. The Speech to Text and Text to Speech API is part of the Speech services.

Python

Python is used for simplicity, versality, and cross‑plaorm compability. Azure Cognive Services supports Python.

Raspbian OS

Raspbian OS is the default OS of Raspberry Pi. Raspbian is based on linux. The OS will oﬀer ﬂexibility in developing the software.

Demo of UNCOVer in Action

This video is a demonstration clip of the UNCOVer device.

This video is divided into 5 parts of demonstration:

Part 1: Object Detection with object's name and relative location
Part 2: Describe Pointed Object
Part 3: Text Reading from Image
Part 4: Analyzing context from an Image
Part 5: Describing Image