UNCOVer Project Documentation

Intro

UNCOVer is a project to help sightless person to see objects and tell it's position, describe pointed object, read a text, analyze and desribe and image. In this documentation you will found the explanations for the codes and algorithm used.


Things Needed to use UNCOVer

Clone this Github Repository

Simply, the source code of the project.

Azure Account

You need Azure account to access Azure Cognitive Services.


Tools and Services Used

Azure Vision Cognitive Services

Extract rich information from images to categorize and process visual data—and perform machine‑assisted moderation of images. Azure Vision is used on object detection mode and pointed object mode.

Azure Speech Services

Swiftly convert audio to text for natural responsiveness. The Speech to Text and Text to Speech API is part of the Speech services. Speech services is used to convert response from string into audio, which will be the response given by the device. Speech is also used to convert user's commands into a string that can be analyzed by the algorithm.

Azure Custom Vision

Azure Custom Vision are used for to train a model of index finger. UNCOVer will use this custom vision to detect index finger of the user and decide what object is pointed using our algorithm

Python

This Program is coded on Python Programming Language.

Raspbian OS

We're using Raspberry's default OS for this project. Use your preffered OS with caution, as some dependencies may differ.

Thonny IDE

IDE to Code and Debug Python in Raspberry OS. Use other IDEs as you wish.


Hardwares that We Use:
  • Raspberry Pi 3 Model B+
  • Raspberry Battery Pack
  • Camera Module
  • Microphone
  • Earphone

Dependencies: (Suit with your needs as well, this are for it to run on Raspberry)

	pip install mutagen
	pip install pillow
	pip install requests
	pip install picamera
	pip install pyAudio
	pip install playsound
	pip install speech_recognition


Documentations & References

A Quickstart guide on using Azure Speech Services on Python
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-python
A Quickstart guide on using Azure Text-to-Speech on Python
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-python-text-to-speech
A Quickstart guide on using Azure Custom Vision
https://azure.microsoft.com/en-us/services/cognitive-services/custom-vision-service/

Prerequisite Functions


There are a few Prerequisites scripts that were made, such as cameraFixingPos.py , posutil.py , SpeechRecognition.py , util.py , and vorec.py .

In this section we will describe what's inside those scripts, how to use it, and the use of the scripts.




cameraFixingPos.py

First, we're going to take a look on cameraFixingPos.py , this function regulates camera usage of the program. It's task is pretty simple, it just regulates the Raspberry's Camera to start previewing, timing the shot, and of course take the photo. So here it is.

cameraFixingPos.py : regulates camera and image retrieval. It has imageCapture() function.


	
	# Import PiCamera libary to run in Raspberry Pi
	from picamera import PiCamera
	from time import sleep
	
	# Create instance
	cam = PiCamera()
	def imageCapture():
		# Set the camera resolution , can be adjusted for different camera
		cam.resolution = (2560, 1920)
		cam.start_preview()
		sleep(4)
		
		# Capture image and save it into jpg file so it can be processed later
		cam.capture('capture/image.jpg')
		cam.stop_preview()
	
		# Return image
		return 'capture/image.jpg'
		
	imageCapture()
	

First, we initialize the PiCamera class on a variable, in this case, it's

cam = PiCamera()

After that, we call the imageCapture() function which will initialize the imageCapture() function.

cam = PiCamera()
imageCapture()

cam.resolution regulates the resolution of the outputted image, it's basically a (w, h) configuration, for example, we want a 1920x1080 image, so we set:

cam.resolution = (1920, 1080)

The cam.start_preview() tells the camera to start working. When this function runs, we should start to see the camera's viewfinder on our screen.

After that we set the sleep() time, which functions as a timer of how long the camera waits before taking a picture. In this case, we put 4 seconds. So it's sleep(4).

Now, we capture the image using the cam.capture(). Inside the brackets, we put the desired photo directory, and the picture will be there. For example cam.capture('capture/image.jpg')

Next, the cam.stop_preview() is the opposite of cam.start_preview(). When initialized, it should stop the working camera.

After that we return the photo directory string to be used. ( return 'capture/image.jpg' )




posutil.py

Next, we're going to take a look on posutil.py . This script regulates on determining object positions, in this case, it could be finger's position, and/or other objects positions. It has 3 functions: getFingersMiddlePos(fingerResult, shapesize), findObjectLocation(listOfObj, img_size), and findClosestObjectFromFinger(objectResults, fingerMidPos, withCoord=False)

Firstly, let's have a look on findObjectLocation().

findObjectLocation(listOfObj, img_size) determines each object's relative position on the picture.


	
	def findObjectLocation(listOfObj, img_size):

	# returned value listOfTuple
	validObjs = []

	# get img dimension
	widthImg, heightImg = img_size

	for names, pos in listOfObj:
		x, _, w, _ = pos

		# middle point each obj
		xt = (w)/2.0 + x

		# divide img scene into 3 partition
		if xt <= widthImg/3.0:
			orientation = "left"
		elif xt <= 2.0 * widthImg/3.0:
			orientation = "center"
		else:
			orientation = "right"

		validObjs.append((names, orientation))

	return validObjs
	
						

First, we're going to make an empty list (validObjs = []) which later will be our return value. The list will contain a tuple which contains the object's name and position. (ex. ("glasses", "left"))

Then, we declare some variables to store the image dimension. In this case widthImg, heightImg.

After that we iterate through the listOfObj to get every object's location in the image. After getting the object location, we look for the middle point for every object.

After all the data has been obtained, lastly we divided the image into 3 sections, which is: left section, center section, and right section.

Next, we will compare the image's middle point location to the range of each section, if the middle point's location is less than widthImg/3.0, then the object is located on the left, if it's less than 2.0 * widthImg/3.0 then the object is located on the center, and lastly, if it doesn't fulfill previous requirements, it must be on the right.

After the calculation we append the results to our validObjs list.

After the last iteration, the validObjs returned to be used.




Next, we'll take a look on getFingersMiddlePos(fingerResult, shapesize).

getFingersMiddlePos(fingerResult, shapesize) : function to determine finger's center point location or coordinate.


	
	def getFingersMiddlePos(fingerResult, shapesize):
		if len(fingerResult) < 1:
			return []

		positions = []

		# get fingers position
		for _, fingerPos in fingerResult:
			left, top, width, height = fingerPos

			# shapesize = getImageSize(image_path)
			left = int(left * float(shapesize[0]))
			top = int(top * float(shapesize[1]))
			width = left + int(width * float(shapesize[0]))
			height = top + int(height * float(shapesize[1]))
			# print('l: {}, t: {}, w: {}, h: {}'.format(left, top, width, height))

			# middle point of finger
			xF = int((left + width) / 2.0)
			yF = int((top + height) / 2.0)
			# print('FINGER - xF: {}, yF: {}'.format(xF, yF))

			positions.append((xF, yF))

		return positions
	

Firstly, we must have the finger detection data and the canvas size. So this function needs fingerResult and shapesize. fingerResult is the returned value of the Azure Custom Vision for the index detection. Here's how it formed: [("index", (left, top, width, height))].

After getting the data, we get the x position of finger's center point by dividing the width into half, then we add the x (left position). After that we can also get the y position of finger's center point by dividing the height into half then add the y (top position).




Next, we're going to take a look on the findClosestObjectFromFinger(objectResults, fingerMidPos, withCoord=False).

findClosestObjectFromFinger(objectResults, fingerMidPos, withCoord=False) : uses the informations from the objects locations, finger's middle point, this function determines which object is the closest to the pointing finger.


	
	def findClosestObjectFromFinger(objectResults, fingerMidPos, withCoord=False):

	xF, yF = fingerMidPos
	print('FINGER - xF: {}, yF: {}'.format(xF, yF))

	distances = []
	names = []
	coords = []

	# get object mid positions
	for objName, objPos in objectResults:
		x, y, w, h = objPos

		xObj = int(w/2 + x)
		yObj = int(h/2 + y)

		dist = int(math.sqrt(((xF - xObj) ** 2) + ((yF - yObj) ** 2)))

		print('{} - x: {}, y: {}'.format(objName, xObj, yObj))

		distances.append(dist)
		names.append(objName)
		if withCoord is True:
		coords.append((xObj, yObj))

	closest = distances.index(min(distances))

	if withCoord is True:
		return (names[closest], coords[closest])
	else:
		return names[closest]
	

objectResults is a list of tuples containing each object's location data on the image. fingerMidPos is obtained using the getFingersMiddlePos(fingerResult, shapesize)

By default, withCoord is equal to false, but if coordinate is used on the dataset, then change this flag to true.

First, we get the (x,y) point of the finger, then we create empty lists that will contain distances, names, and coordinates of every object relative to the finger's position.

After that we iterate through every object's location data to find every object distances relative to finger's location using Euclidean distance formula.

dist = int(math.sqrt(((xF - xObj) ** 2) + ((yF - yObj) ** 2)))

Then simply append it to our distances list.

Since we're looking for the closest object to the finger, we only take the object that has the minimum relative distance to the finger. Hence, we use

closest = distance.index(min(distances))

After that, simply returned the object name, or with coordinates too if the withCoord=true.

if withCoord is True:
	return (names[closest], coords[closest])
else:
	return names[closest]




SpeechRecognition.py

SpeechRecognition.py regulates the Azure speech-to-text service that we use for the commands to the UNCOVer.

It contains one class named SpeechRecognition.

To use this class, it's better if you have your own Azure Subscription Key for the Speech Services, if you don't have one, the default one will be used. But Microsoft could just revoked it at anytime.

SpeechRecognition() : Manages Azure Speech-to-Text requests.


	import speech_recognition as sr

	class SpeechRecognition:

	def __init__(self, token_key='DEFAULT_TOKEN_KEY'):
		self.token_key = token_key

	def recognize(self, device_index=-1):
		r = sr.Recognizer()
		if device_index == -1:
			mic = sr.Microphone()
		else:
			mic = sr.Microphone(device_index=device_index)

		with mic as source:
			print('Please silent...')
			r.adjust_for_ambient_noise(source)
			print('Recording...')
			audio = r.listen(source)

		print('Done recording...')
		text = ''
		try:
			text = r.recognize_azure(
				audio, self.token_key, location='southeastasia'
			)
		except sr.UnknownValueError:
			print('The voice is not recognizable')
			text = ''

		return text

On initialization, it puts Azure Speech Subscription key to itself, if you have your own key, you can change it yourself.

And then there is the recognize(self, device_index=-1) function. It basically records command using our desired audio record device (in this case, determined by the device_index we declare. Else, it will use the default audio record device).

sr.Recognizer() is used from the speech_recognition library. We use it to pickup the audio.

sr.Microphone() is used to set the microphone and of course, using the microphone to record the sound.

After that we create the audio file filled with commands that we will send to Azure Speech-to-Text services. To see full documentation of speech_recognition, click here.

After recording the command, we make an empty string variable to contain the result from the text-to-speech service, and if there's an error, we'll return that empty string.




util.py

util.py consists of utilities used to help the main algorithm to work.

The first one is a function called getImageSize(image_path)

getImageSize(image_path) : returns the designated image size (width, height)


	
	def getImageSize(image_path):
		im = Image.open(image_path)
		width, height = im.size

		return (width, height)	
	

getImageSize has 1 parameter that has to be satisfied. Which is image_path. image_path is the path to the image that we wish to get the size.

The next one is playSound(filepath) which plays the designated audio file.

getImageSize(image_path) : returns the designated image size (width, height)


	
	def playSound(filepath):
		playsound(filepath)	
	

playSound(filepath) has 1 parameter that needs to be satisfied, which is filepath. filepath is the path of the audio file we wish to play.

util.py contains several classes named: Object Detection, TextToSpeech, FingerDetection, OCR, Describe, and Analyze.

First, we'll look into the first class, which is Object Detection.

Object Detection class has methods, the first one called __init__ is called on initialization. On __init__ we put our Azure API Subscription Key and our Image Path to the class so later it can be used for our Object Detection service.

__init__(self, subscription_key, image_path) : Puts Azure API Subscription key and Image Path to object from class Object Detection.


	
	def __init__(self, subscription_key, image_path):

		self.subscription_key = subscription_key
		self.image_path = image_path
	



Next, we have the DetectObject(self) function. It basically runs our Object Detection service with the value we provided on initialization. You may want to change the vision_base_url to your region's service for better performance. This function converts our image to bytearray and passes it to our request url to be analyzed using Microsoft Azure Cognitive Services. Later, we get a result in JSON format.

DetectObject(self) : Makes a request to Azure ObjectDetection API, and get the result in JSON format.


	
	def DetectObject(self):
		print('LOG: Commencing image recognition of ' + self.image_path + '\nusing Azure Computer Vision API...')

		subscription_key = self.subscription_key

		print('LOG: Using vision subscription_key ' + subscription_key)

		assert subscription_key

		# You may want to change it with your specified region services.
		vision_base_url = ("https://southeastasia.api.cognitive.microsoft.com/"
							+ "vision/v2.0/")

		print('LOG: Using vision base url ' + vision_base_url)

		analyze_url = vision_base_url + "detect"

		print('LOG: Reading the image into a byte array...')

		# Read the image into a byte array
		image_data = open(self.image_path, "rb").read()

		# HTTP request header
		headers = {'Ocp-Apim-Subscription-Key': subscription_key,
					'Content-Type': 'application/octet-stream'}
		params = {'visualFeatures': 'Categories,Description,Color'}
		# receiving result from API
		response = requests.post(
			analyze_url, headers=headers, params=params, data=image_data)
		response.raise_for_status()

		print('LOG: Receiving JSON response...')

		self.result = response.json()

		print('LOG: JSON response received...')
	



Next, we're going to take a look at the next function which is getDetectedObject(self). This function parses our JSON received from the DetectObject(self) function, so we can take the object name and the location of the object.

getDetectedObject(self) : Parses the JSON result from DetectObject(self) function to get object name and locations. Returns list of tuples.


	
	def getDetectedObject(self):

		result = self.result
	
		objects_detected = []
		# parse object names from JSON response
		print('LOG: Parsing object names from JSON...')
		for dicts in result['objects']:
			object_name = dicts['object']
			object_pos = []
			for i in dicts['rectangle']:
				object_pos.append(dicts['rectangle'][i])

			objects_detected.append((object_name, object_pos))

		return objects_detected
	



Next, we will take a look on the second class, TextToSpeech.

Like other classes, TextToSpeech has an __init__ function which runs at initialization.

What differs TextToSpeech from ObjectDetection is the parameters needed. While ObjectDetection requires Azure API Subscription Key for Vision services and the Image path used, TextToSpeech requires Azure API Subscription key for Speech services and the text that we want to convert to audio.

__init__(self, subscription_key, text_candidate) : Puts Azure API Subscription Key and Text that want to be converted to audio to object from class TextToSpeech.


	
	def __init__(self, subscription_key, text_candidate):
		print('LOG: Initializing TextToSpeech object...')
		print('LOG: Using speech subscription_key ' + subscription_key)

		self.subscription_key = subscription_key

		self.tts = text_candidate

		print('LOG: Speech output: ' + text_candidate)

		self.access_token = None




Next, we're going to take a look on the get_token(self) method.

get_token(self) is used to get authorization key so we will be able to access the Azure Speech API.

get_token(self) : Gets bearer authorization key from Azure Speech Cognitive Services.


	
	def get_token(self):
		print('LOG: Getting token...')

		fetch_token_url = ("https://southeastasia.api.cognitive.microsoft.com"
						+ "/sts/v1.0/issueToken")

		print('LOG: Fetching token at ' + fetch_token_url)
		# HTTP request header
		headers = {
			'Ocp-Apim-Subscription-Key': self.subscription_key
		}
		# receiving result from API
		response = requests.post(fetch_token_url, headers=headers)
		self.access_token = str(response.text)
	



Finally, the last member of the class, save_audio.

save_audio puts the request to the Azure Speech Services and then saves the response to an audio file that later can be used in the main program. Usually the audio from this method is used to tell the results of object detection.

This method has 2 parameters that must be satisfied. The first one is the filename, and quality.

filename parameter should be filled with the desired output audiofile name, without the audiofile format.

quality must be filled with number, either 1/0.

  • If 1 is applied, the output file format is '.wav'
  • If 0 is applied, the output file format is '.mp3'

The default quality value is 0.

save_audio(self, filename, quality=0) : Requests and saves the output audio file from the request.


	
	def save_audio(self, filename, quality=0):
		qual = (
			'audio-24khz-48kbitrate-mono-mp3',
			'riff-16khz-16bit-mono-pcm',
			'riff-24khz-16bit-mono-pcm'

		)

		extension = '.wav'
		if quality == 0:
			extension = '.mp3'

		print('LOG: Processing audio...')

		base_url = 'https://southeastasia.tts.speech.microsoft.com/'

		print('LOG: Using speech base url ' + base_url)

		path = 'cognitiveservices/v1'
		constructed_url = base_url + path
		headers = {
			'Authorization': 'Bearer ' + self.access_token,
			'Content-Type': 'application/ssml+xml',
			'X-Microsoft-OutputFormat': qual[quality],
			'User-Agent': 'YOUR_RESOURCE_NAME'
		}
		xml_body = ElementTree.Element('speak', version='1.0')
		xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
		voice = ElementTree.SubElement(xml_body, 'voice')
		voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
		voice.set(
			'name',
			'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
		)
		voice.text = self.tts
		body = ElementTree.tostring(xml_body)

		# receiving result from API
		response = requests.post(constructed_url, headers=headers, data=body)
		if response.status_code == 200:
			print('LOG: Saving audio as ' + filename + '...')

			with open(filename + extension, 'wb') as audio:
				audio.write(response.content)
				print(
					"\nStatus code: "
					+ str(response.status_code)
					+ "\nYour TTS is ready for playback.\n"
				)

		else:
			print(
				"\nStatus code: "
				+ str(response.status_code)
				+ "\nSomething went wrong. "
				+ "Check your subscription key and headers.\n"
			)	
	



Next, We will take a look on FingerDetection class.

Just like other classes, FingerDetection class has an __init__ method.

__init__ method in this class has 2 parameters that needs to be satisfied:

  • prediction_key, fill with your own subscription key on Azure Custom Vision API
  • image_path, image path of an image that we want to find finger.

__init__(self, prediction_key, image_path) : Puts Azure Custom Vision API Subscription Key and path to image that want to be analyzed.


	
	def __init__(self, prediction_key, text_candidate):
		self.prediction_key = prediction_key
		self.image_path = image_path
	



Let's move on to PredictImage(self) method.

This method sends request to Azure Custom Vision API and analyzes the image that we sent - it will look for the specific object that we previously train to look for - and then gets the result in JSON format.

PredictImage(self) : Sends the image to Azure Custom Vision API and retrieve result in JSON format.


	
	def PredictImage(self):
		# req_url = (
		#     'https://southeastasia.api.cognitive.microsoft.com/' +
		#     'customvision/v1.1/Prediction/{}/image/nostore'.format(
		#         self.project_id
		#     )
		# )
		req_url = (
			"https://southeastasia.api.cognitive.microsoft.com/"
			+ "customvision/v3.0/Prediction/"
			+ "47917e0f-ee76-4fc3-afe4-1eb02b94d6b0/"
			+ "detect/iterations/Iteration7/image/nostore"
		)

		image_data = open(self.image_path, 'rb').read()

		headers = {'Content-Type': 'application/octet-stream',
					'Prediction-key': self.prediction_key}

		# receiving result from API
		response = requests.post(
			req_url,
			headers=headers,
			data=image_data
		)

		response.raise_for_status()

		self.result = response.json()
	



After that, let's dive in to getPrediction(self, minProb=0) method.

This function gets the JSON format retrieved by PredictImage(self) and then gets detected object which has a probability more than minProb value (default value is 0).

This method returns a list of tuple consisting the name of the object detected and the location of the object.

getPrediction(self, minProb=0) : Gets object detected from Custom Vision API and get objects that has probability higher than minProb parameter.


	
	def getPrediction(self, minProb=0):
		result = self.result
		detected = []
		# parse object names from JSON response
		for dicts in result['predictions']:
			prob = dicts['probability']

			if prob < minProb:
				continue

			name = dicts['tagName']
			pos = []
			for i in dicts['boundingBox']:
				pos.append(dicts['boundingBox'][i])

			detected.append((name, pos))

		return detected
	



Last but not least, getPredictionJson(self).

This method returns the JSON which is the result from the API call. (from PredictImage)

getPredictionJson(self) : Returns the JSON object retrieved from the Custom API server.


	
	def getPredictionJson(self):
		result = self.result
		return result
	



OCR class performs OCR service for the application.

The __init__ file consists of:

  • sub_key, subscription key for the OCR service.
  • img_path, path to the image we wish to do OCR.

__init__(self, sub_key, img_path) : Puts Subscription key and image path for the OCR service.


	
	def __init__(self, sub_key, img_path):
		self.sub_key = sub_key
		self.img_path = img_path
	



The second method in the OCR class is PerformOCR(self)

PerformOCR(self) basically sends request to the OCR service from Azure and then retrieves the result in JSON format.

PerformOCR(self) : Performs OCR Service request and retrieve the result in JSON format.


	
	def PerformOCR(self):
		assert self.sub_key

		vision_base_url = ("https://southeastasia.api.cognitive.microsoft.com/"
							+ "vision/v2.0/")

		ocr_url = vision_base_url + "ocr"

		# Read the image into a byte array
		image_data = open(self.img_path, "rb").read()

		# HTTP request header
		headers = {'Ocp-Apim-Subscription-Key': self.sub_key,
					'Content-Type': 'application/octet-stream'}
		params = {'language': 'unk', 'detectOrientation': 'true'}
		# receiving result from API
		response = requests.post(
			ocr_url, headers=headers, params=params, data=image_data)
		response.raise_for_status()

		self.result = response.json()
	



GetTexts(self) method parses the JSON retrieved in the PerformOCR(self) method. With this function we take the result text from the OCR processing.

GetTexts(self) : Returns the text result from OCR readings.


	
	def GetTexts(self):
		text = ''

		for i in self.result['regions']:
			for j in i['lines']:
				for k in j['words']:
					# print(k['text'], end=' ')
					text += (k['text'] + ' ')
				text += '. '

		return text
	



Next, we're going to move on to Describe class.

Of course it also has an __init__ function. It's basically the same with OCR class.

The __init__ function has 2 parameters that must be satisfied,

  • sub_key, Subscription Key for Azure Vision Cognitive Services
  • img_path, Path to image which we want to describe.

__init__(self, sub_key, img_path) : Puts subscription key and image path to the object.


	
	def __init__(self, sub_key, img_path):
		self.sub_key = sub_key
		self.img_path = img_path
	



DescribeImage(self) method sends the requests to Azure Vision API to perform Descriptive Analysis of an image.

DescribeImage(self) : Sends API Request to Azure Vision API and retrieves JSON results.



	def DescribeImage(self):
		assert self.sub_key

		vision_base_url = ("https://southeastasia.api.cognitive.microsoft.com/"
							+ "vision/v2.0/")

		desc_url = vision_base_url + "describe"

		# Read the image into a byte array
		image_data = open(self.img_path, "rb").read()

		# HTTP request header
		headers = {'Ocp-Apim-Subscription-Key': self.sub_key,
					'Content-Type': 'application/octet-stream'}
		params = {'maxCandidates': '1', 'language': 'en'}
		# receiving result from API
		response = requests.post(
			desc_url, headers=headers, params=params, data=image_data)
		response.raise_for_status()

		self.result = response.json()





GetDescription(self) function parses the JSON result from API requests on DescribeImage(self) function and returns the text result.

DescribeImage(self) : Sends API Request to Azure Vision API and retrieves JSON results.



	def GetDescription(self):
		res = self.result

		try:
			return res['description']['captions'][0]['text']
		except IndexError:
			return 'There are no description for the scene'




Finally, the final class on util.py, Analyze.

Analyze class provide Image analysis function to UNCOVer.

Like any other classes, we'll start with the class' constructor/init function.

__init__(self, sub_key, img_path) : Initializes Analyze and put the subscription_key and img_path to the object's attributes.



	def __init__(self, sub_key, img_path):
		self.sub_key = sub_key
		self.img_path = img_path




AnalyzeImage(self) sends the request to analyze image from the given image path.

AnalyzeImage(self) : sending request to Azure Vision API for analyzing image, gets JSON format response.



	def AnalyzeImage(self):
		assert self.sub_key

		vision_base_url = ("https://southeastasia.api.cognitive.microsoft.com/"
							+ "vision/v2.0/")

		analyze_url = vision_base_url + "analyze"
		visualFeatures = (
			'Brands,Color,Description,Faces'
		)

		# Read the image into a byte array
		image_data = open(self.img_path, "rb").read()

		# HTTP request header
		headers = {'Ocp-Apim-Subscription-Key': self.sub_key,
					'Content-Type': 'application/octet-stream'}
		params = {'visualFeatures': visualFeatures,
					'details': 'Celebrities,Landmarks'}
		# receiving result from API
		response = requests.post(
			analyze_url, headers=headers, params=params, data=image_data)
		response.raise_for_status()

		self.result = response.json()




GetResult(self) function returns the dominant color in the image, the branding of a product on the image (if any) also age and gender of people in the image.

This function retrieves the result from the JSON response from AnalyzeImage(self).

GetResult(self) : parses the JSON reponse from AnalyzeImage(self), returns a tuple consists of dominant color, scene description, and people information (age and gender).



	def GetResult(self):
		res = self.result

		colors = []
		# get only the dominant colors that detected
		for color in res['color']['dominantColors']:
			colors.append(color)

		brands = []
		# get the detected brands
		for brand in res['brands']:
			brands.append(brand['name'])

		# get description of the scene
		desc = res['description']['captions'][0]['text']

		faces = []
		# get age and gender of detected people
		for face in res['faces']:
			faces.append((face['age'], face['gender']))

		return ((colors, brands, desc, faces))




vorec.py

vorec.py is basically an audioconfig. Only change this if necessary.

We're using Raspberry Pi so this config is used for Raspbian OS.

vorec.py : audio config.



	import os
	from util import playSound

	# For setting audio purposes
	# Putenv is used to modify the environment 
	# Using alsa advanced linux sound architecture
	os.putenv('SDL_AUDIODRIVER', 'alsa')
	os.putenv('SDL_AUDIODEV', '/dev/audio')
	playSound('speech.mp3')