Azure Cognitive Services: Implementing Computer Vision in Azure using Python

Microsoft Azure Cognitive Services is a collection of cloud-based services that enables developers to add various AI and machine learning capabilities to their applications. These capabilities include image and facial recognition, speech and language understanding, and knowledge mining, among others. With Azure Cognitive Services, developers can build applications that can see, hear, speak, understand, and even reason, allowing them to create more intelligent and interactive experiences for their users.

One of the key benefits of using Azure Cognitive Services is that it allows developers to add AI capabilities to their applications without having to have expertise in machine learning or data science. This means that even those with little to no background in these fields can easily integrate AI into their projects. Additionally, Azure Cognitive Services is fully scalable, so developers can easily adjust the amount of computing power and data storage they need as their applications grow and evolve.

One example of a service offered by Azure Cognitive Services is the Computer Vision API, which enables developers to add image recognition capabilities to their applications. This allows applications to automatically analyze and understand the content of images, including identifying objects, faces, and scenes. This can be used in a wide range of applications, such as automatically tagging photos on social media, creating smart photo albums, or even building security systems that can recognize faces and objects. In this blog, I am going to walk you through the process of analyzing an image using the Computer Vision API.

For a more detailed explanation, feel free to refer to this 12-minute video.

Let us begin by creating a Computer Vision resource:

Create a Computer Vision resource:

Navigate to Home -> Create a resource.
Enter ‘Computer Vision’ in search.
Select ‘Computer Vision’ option.
Comtinue to select ‘Computer Vision’ for the ‘plan’ drop-down as well and click on ‘create’.
Select a an existing resource group and name your CV resource.
Select the first option ‘Free tier’ under the ‘Pricing tier’ drop-down.
Acknowledge the terms and conditions, leave everything else to default, and select ‘review and create’.
Wait for a few seconds for your resource to be deployed.
If you have done everything correctly so far, your result will look similar to the image below.

The next step is to set up a python environment to implement out code.

Setup a Python Environment:

The process ahead expects you to have python and vscode installed in your device. If you do not have these two softwares, here are the links to download them from the internet.

https://www.python.org/downloads/

https://code.visualstudio.com/download

Once you have python and vscode installed on your device, follow the steps below:

Create a new folder and navigate to it.
Open VS code and create a python file in it.
Go to the extensions section on the left and make sure Python extension is installed.
Now, open the terminal on VS Code and type the following command in it.

where python

Copy the output on the clipboard.
Enter the keyboard shortcut CTRL/Command+Shift+P and paste the copied output. (if you still see the input box after pasting it, paste it again)
Now that you have set up the correct path for your vs code environment to find the python interpreter, you can proceed to enter the following commands and install a few required packages.

pip install --upgrade azure-cognitiveservices-vision-computervision

pip install pillow

Once you install the above packages successfully, open your python file and enter the following code in it:

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

from array import array
import os
from PIL import Image
import sys
import time

'''
Authenticate
Authenticates your credentials and creates a client.
'''
subscription_key = "PASTE_YOUR_COMPUTER_VISION_SUBSCRIPTION_KEY_HERE"
endpoint = "PASTE_YOUR_COMPUTER_VISION_ENDPOINT_HERE"

computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))
'''
END - Authenticate
'''

'''
Quickstart variables
These variables are shared by several examples
'''
# Images used for the examples: Describe an image, Categorize an image, Tag an image, 
# Detect faces, Detect adult or racy content, Detect the color scheme, 
# Detect domain-specific content, Detect image types, Detect objects
images_folder = os.path.join (os.path.dirname(os.path.abspath(__file__)), "images")
remote_image_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/landmark.jpg"
'''
END - Quickstart variables
'''


'''
Tag an Image - remote
This example returns a tag (key word) for each thing in the image.
'''
print("===== Tag an image - remote =====")
# Call API with remote image
tags_result_remote = computervision_client.tag_image(remote_image_url )

# Print results with confidence score
print("Tags in the remote image: ")
if (len(tags_result_remote.tags) == 0):
    print("No tags detected.")
else:
    for tag in tags_result_remote.tags:
        print("'{}' with confidence {:.2f}%".format(tag.name, tag.confidence * 100))
print()
'''
END - Tag an Image - remote
'''
print("End of Computer Vision quickstart.")

Configure the Python program with your credentials:

Navigate to the Computer Vision resource and click on the ‘overview’ tab.
Click on the ‘manage keys’ option and copy the primary key you are using.
Replace the value of ‘subscription key’ variable in the above python program with the key you copied.
Go back to the ‘manage keys’ section, copy your ‘endpoint’.
Replace the variable ‘endpoint’ with the endpoint value you just copied.
Lastly, copy the URL of any remote image source on the internet and replace the value of ‘remote URL image’ with it.
If you are wondering what the above steps are about, the explanation is that you have just granted access to your python program to analyze a remote image on the internet by using the Computer Vision resource in your Azure account.
In order to do that, run the python code.
If you have followed all the steps in the right sequence so far, your output on the VS Code terminal will be similar to this.

Here’s the image source I used for my Python Code:

https://www.007.com/wp-content/uploads/2022/12/Castore-x-007-LS.jpg

And this is what the image looks like:

Now, if you compare the output given by our Computer Vision model with the above image, you can observe that its performance is decent. However, this is only one small task out of a huge set of tasks Azure Cognitive Services will allow you to do.

Overall, Microsoft Azure Cognitive Services is a powerful tool for developers who want to add AI capabilities to their applications. With its wide range of services and easy-to-use APIs, it enables developers to quickly and easily integrate AI into their projects, allowing them to create more intelligent and interactive experiences for their users. If you want to explore further, you can also refer to Microsoft Azure’s official website for Cognitive Services below:

https://azure.microsoft.com/en-us/products/cognitive-services/#overview