In this blog, we will be working with DynamoDB as we will be creating a table, loading multiple data, retrieving a particular data, creating and loading single data and in the end deleting the table with the help of python SDK Boto3.
Requirements:
1. Python3 –> You would have already python installed in VS Code in your virtual machine. If not, you can follow this link, Install Python package Manager and Python Extension. After that, in your VS code write this command,
pipenv install numpy
2. Install AWS CLI –> You would have already installed AWS CLI, if not use this command from VS code or outside of it to install,
sudo apt update
sudo apt install awscli
We can verify the version by,
aws --version
3. Boto3 –> If not already installed, use this link to install python SDK package Boto3 in your new VS Code project, Create a new project and Install Python SDK .
4. Configure AWS CLI –> Now, we will configure the AWS CLI if not done already. Before executing the next command, keep your credentials file(CSV) with you that consists of Access Key ID and Secret Access Key to authenticate and configure our python program. From inside/outside of VS Code execute this command,
aws configure
It will ask 4 things :
- Access Key Id -> Enter your key id.
- Secret Access Key -> Enter your access key.
- A Default Region -> Enter ‘us-east-2’ since we have been using this from beginning, if you are using some other region enter that.
- A Default output format -> Enter ‘json’ for this.
Now, we are done with our requirements and our environment is set. We can proceed with our steps.
1. Creating a table:
We will be beginning by creating our database table, named “Student” in which ‘student_id’ will be our primary key and ‘student_name’ will be our sort key. Create a new python file and use this code for creating the table.
import boto3 def create_students_table(): dynamodb = boto3.resource('dynamodb') # Table definition table = dynamodb.create_table( TableName='Student', KeySchema=[ { 'AttributeName': 'student_id', 'KeyType': 'HASH' # Partition key }, { 'AttributeName': 'student_name', 'KeyType': 'RANGE' # Sort key } ], AttributeDefinitions=[ # AttributeType defines the data type. 'S' is string type. { 'AttributeName': 'student_id', 'AttributeType': 'S' }, { 'AttributeName': 'student_name', 'AttributeType': 'S' }, ], ProvisionedThroughput={ # ReadCapacityUnits set to 10 strongly consistent reads per second 'ReadCapacityUnits': 10, # WriteCapacityUnits set to 10 writes per second 'WriteCapacityUnits': 10 } ) return table if __name__ == '__main__': student_table = create_students_table() # Print table status print("Status:", student_table.table_status)
In this code, initially we are connecting to our DynamoDB using Boto3 client. Then we are creating table, “Student” with “student_id” and “student_name” both having attribute types as ‘S’ i.e. String. Provisioned Throughput is maximum read and write that an application can perform on a table.
Press F5 to execute the program. Once executed you will see the following output.
Now if you go to DynamoDB service in your AWS console, you will be able to see “Student” table being added there.
2. Loading the data:
Now, we will add the data in our table. We will make a json file, ‘data.json’ to store our student data which will be later called by our python file that will read all that data and load it in the ‘Student’ table. The data in the json file is an array of student objects that will be inserted one by one in the student table. The json data that we will use have 5 student objects and it will look like this:
[ { "student_id": "987342", "student_name": "Cindy", "email": "cindy@gmail.com", "department": "ITM" }, { "student_id": "923342", "student_name": "Dev", "email": "dev.rockstar@gmail.com", "department": "CS" }, { "student_id": "984356", "student_name": "Beth", "email": "beth12@gmail.com", "department": "ITM" }, { "student_id": "923432", "student_name": "John", "email": "john78@gmail.com", "department": "CS" }, { "student_id": "909876", "student_name": "Seth", "email": "seth_hemsworth@gmail.com", "department": "HCI" } ]
Make a new python file for this process and write the following code:
import json from decimal import Decimal import boto3 def load_data(student_list, dynamodb=None): dynamodb = boto3.resource('dynamodb') students_table = dynamodb.Table('Student') # Loop through all the items and load each for student in student_list: student_id = (student['student_id']) student_name = student['student_name'] # Print student info print("Loading Students Data:", student_id, student_name) students_table.put_item(Item=student) if __name__ == '__main__': # open file and read all the data in it with open("data.json") as json_file: student_list = json.load(json_file, parse_float=Decimal) load_data(student_list)
After pressing F5 and executing this program, you will be able to see the following output:
And when you now go to your ‘Student table’ in DynamoTable in AWS console, you can see student data being added there.
The main difference between DynamoDB and relational database is that we can even have different columns and fields in DynamoDB but the partition key and sort key column should always be present otherwise the program will throw an error. To show that data can have different number of columns for each row, change ‘data.json’ file to this:
[ { "student_id": "987342", "student_name": "Cindy", "student_lastname": "Well", "department": "ITM" }, { "student_id": "923342", "student_name": "Dev", "email": "dev.rockstar@gmail.com", "course": "CS", "GPA": "3.76" }, { "student_id": "984356", "student_name": "Beth", "email": "beth12@gmail.com", "department": "ITM" }, { "student_id": "923432", "student_name": "John", "email": "john78@gmail.com", "department": "CS" }, { "student_id": "909876", "student_name": "Seth", "email": "seth_hemsworth@gmail.com" } ]
Execute the program and you will see your database tables getting populating like this:
3. Creating and loading a single data:
In this step, we will be creating a single student_data and then directly adding it into our database table. Make a new python file and write the following code:
from pprint import pprint import boto3 def put_student(student_id, student_name, email, department, dynamodb=None): dynamodb = boto3.resource('dynamodb') # Specify the table students_table = dynamodb.Table('Student') response = students_table.put_item( # Data to be inserted Item={ 'student_id': student_id, 'student_name': student_name, 'email': email, 'department': department } ) return response if __name__ == '__main__': student_resp = put_student("921189", "Abhijeet", "ahluwal5@uwm.edu", "CS") print("Create item successful.") pprint(student_resp)
Here, we are adding a new student with student_id = 921189, student_name = “Abhijeet”, email = “ahluwal5@uwm.edu” and department = “CS”. Execute the program pressing F5, you will see the create item process being successful.
Go to your database in your AWS console, you will see the updated data.
4. Getting Information about a data:
In this step, we will get complete information about a particular student whose id and name we will provide. Again create a new python file for this step and write the following code:
from botocore.exceptions import ClientError import boto3 def get_student(student_id, student_name): dynamodb = boto3.resource('dynamodb') # Specify the table to read from students_table = dynamodb.Table('Student') try: response = students_table.get_item( Key={'student_id': student_id, 'student_name': student_name}) except ClientError as e: print(e.response['Error']['Message']) else: return response['Item'] if __name__ == '__main__': student = get_student("921189", "Abhijeet") if student: print("Get Student Data Done:") # Print the student data print(student)
Execute it and you will see this output:
5. Deleting the table:
Lastly, we will be deleting the table that we have created in this process. Create a new python file and write the below code which states the database table to be deleted:
import boto3 def delete_students_table(): dynamodb = boto3.resource('dynamodb') # specify the table to be deleted students_table = dynamodb.Table('Student') students_table.delete() if __name__ == '__main__': delete_students_table() print("Table deleted.")
After execution, you will see our table being successfully deleted:
You can go to your DynamoDB table in AWS Console to confirm the same.
Thank you for reading the blog and here I will be providing a zip folder that contains all the python files and data files that we used.