In this part of the project, we implement an Azure function from the following specification:
Specification
- The function will be triggered by an HTTP request.
- The HTTP request will contain the name of an uploaded file that exists on Azure Blob storage.
- The function must support using either GET or POST to invoke it.
- The function will write information about the uploaded blob to a table in Azure Tables storage.
- The function will capture the following fields for each uploaded file: a) as partition key, the fully-qualified path name of file (that is, the name includes the containing folder names), b) as row key, the current date/time of when the Azure function ran, c) Content Type of uploaded object, d) Size of uploaded object in bytes. (No need to capture the source IP address)
- The Azure table must allow for multiple HTTP requests for the same file name whether or not the uploaded file was updated, and keep a record of each HTTP request made for the blob object.
- The function will provide a log output for each HTTP request whether the write to the Azure table is successful or not.
The naming of resources is up to you.
You must create and test your function using Visual Studio code and deploy it to a Function App on Azure to do the final test runs.
What You Will Need To Do
Among the things you will need to do are the following:
- You will need to create resources necessary to run your Azure function.
- You will need to install the tools to allow you to develop functions in VS Code.
- You will need to locate and use API documentation to choose and use appropriate Python APIs, though some hints will be provided below.
- You will need to upload the test data to a blob container, then make multiple HTTP requests to capture the information about all of them.
Some Hints and Advice
Following is some additional information that should be helpful to complete the assignment. These are not absolute requirements – use them if they seem appropriate:
- You can find instructions for how to set up your PC to do Azure Functions development in Python here: Instructions Installations on Linux (for example, your cloud PC) will be supported. The same tooling on Windows has not been tested and will not be supported.
- Create a new resource group for this project, and create a storage account and Function App using the portal before moving to VS code to create your function.
- Remember that your storage account and function app names must be globally unique, and choose names accordingly.
- It is much easier to test the function locally rather than deploying to Azure each time you make a change. When debugging in VS Code, simply changing and saving a file will trigger a hot-reload of the function, which makes debugging go faster.
- A Blob Storage input binding provides access to blob content as a stream. This is not what you need. You need to access metadata about the blob, not the blob content.
- Connection strings for storage accounts are accessible via os.environ[“<connection-name>”] in Python.
- You can generally add Python packages to your VS Code project by adding a line to requirements.txt.
Hint: It can be quite challenging to figure out which Python APIs are needed to access metadata about blobs rather than the blobs themselves. Following is a series of Python statements that can obtain metadata about a blob. (Note that this is not a complete solution, but it should save you time looking through the online API documentation Consider what happens if the file name provided on the HTTP request does not exist within the container).
from azure.storage.blob import BlobServiceClient blobsvc = BlobServiceClient.from_connection_string(connstr) #storage account connection string container = blobsvc.get_container_client("your-container-name") blob = container.get_blob_client("your-blob-file-name.txt") properties = blob.get_blob_properties() size = properties.size contentsettings = properties.content_settings contenttype = contentsettings['content_type']
There are also some distinct “gotchas” that you are likely to encounter while doing this project. While not an exhaustive list, here are a couple of things to work around:
Gotcha #1
The Azure Tables 3.* extensions bundle does not support output bindings to Azure Table storage. If you use an output table binding you will need to modify host.json in your VS Code project to use version 2.*:
"extensionBundle": { "id": "Microsoft.Azure.Functions.ExtensionBundle", "version": "[2.*, 3.0.0)" }
See this link for further information.
Gotcha #2
Files can be uploaded to folders within blob storage. When this happens, the name on the HTTP request would contain forward slashes. Partition keys in Azure Tables storage cannot contain forward slashes and the update will fail if you have a “/” in the key value. One workaround is to replace “/” with “+” when creating partition key values. People do not usually include “+” in file names, even though it is legal on Linux to do so. To reproduce the actual file path when reading data, replacing “+” with “/” should do the trick.