Copy the Object from One Bucket to Another in AWS S3

Here, we are going to see how to copy an object from one bucket to another bucket within the same account using Python SDK.

Bucket policies and IAM policies

To copy an object between buckets, you must make sure that the correct permissions are configured. To copy an object between buckets in the same AWS account, you can set permissions using IAM policies. To copy an object between buckets in different accounts, you must set permissions on both the relevant IAM policies and bucket policies.

For copying, objects between buckets of different accounts follow the below link to do the necessary policies setup before proceeding further.

How to create Bucket policies and IAM policies

Need for Boto3

Using the AWS SDK for Python (Boto3) we can create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services. They provide various functionalities to automate the AWS services.

In case of copying objects between buckets, they help us to skip unnecessary manual work like:

  • Downloading the image from the source bucket and saving it.
  • Again, uploading them in the destination bucket.

Moreover, it is quicker and more reliable. ‘We can share the object securely without copying or sharing to an external device.

Prerequisite: Steps to be followed

Initially, you are required to have an AWS account and personal storage. Here we are implementing the program in a fully-functional Linux development computer, in the cloud. So, we are required to make the following setup for the program.

Once all the above steps are executed then the setup is ready. (Note: You can skip if you already have the setup)

1. Once you connect to your remote desktop, you can create a new project by providing the below command in the terminal.

mkdir sample
cd sample
code .

2. Using pipenv, you create a virtual environment simply by installing the first Python package you need. We will install a very popular numerical package for Python called numpy. We won’t really be using numpy for this example – it’s just some package we can install. Usually, when you create a new virtual environment and project, you know which package you need to install first.

Within VS Code, open a New Terminal window, then enter the following command:

pipenv install numpy

3. Within the folder create a new python file. You can choose your own name, I have created sample.py. Open the terminal from VS code, and start installing the below packages.

To access the AWS service we are required the following things:

  • Boto3
  • awscli
  • aws configure

4. Boto3, which is the latest version of the SDK, provides support for Python versions 2.6.5, 2.7, and 3.3. Boto3 includes several service-specific features to ease development. Boto supports all current AWS cloud services, including Elastic Compute Cloud, DynamoDB, AWS Config, CloudWatch, and Simple Storage Service.

pipenv install boto3

5. AWS Command Line Interface or CLI is a set of open-source tools that allows you to manage and automate AWS services using commands. It supports multiple platforms like Windows, Linux, and macOS and provides direct access to public APIs for AWS. Further, automation of tedious infrastructure tasks is easily possible in a CLI environment.

sudo apt update 
sudo apt install awscli

6.  The aws configure command is the fastest way to set up your AWS CLI installation. When you enter this command, the AWS CLI prompts you for four pieces of information:

  • Access key ID
  • Secret access key
  • AWS Region
  • Output format

Access keys consist of an access key ID and a secret access key, which are used to sign programmatic requests that you make to AWS. If you don’t have access keys, you can create them from the AWS Management Console. The only time that you can view or download the secret access key is when you create the keys. You cannot recover them later. However, you can create new access keys at any time.

aws configure

AWS Configure will help authenticate the user account using the security credentials. The access key and secret access key will be in .CSV file downloaded from security credentials while creating the user in the IAM console. It looks like below:

The Default region can be seen near the account. For me, it is “us-east-2”, you can see the region in your account. The Output format will be “json”

You can provide all the value in the terminal.

7. Initially, we are required to know which buckets we are going to use. We can get the list of buckets present in the account. Since we are using buckets from the same account, run the below program to know the list of buckets within the account.

import boto3

def list_bucket():
    s3_res = boto3.resource('s3')
    buckets = s3_res.buckets.all()
    for bucket in buckets:
        print(bucket.name)

list_bucket()

You will be able to see the list of buckets within the account

8. From the above list of buckets you can choose your source and destination bucket. I have chosen ‘awsbucketvn’ as the source bucket. To select which object to be copied you can get the list of objects within the source bucket.

import argparse
import boto3

def object_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("bucket", help = "Bucket Name")
    args = parser.parse_args()
    return args

def list_object(bname):
    s3_res = boto3.resource('s3')
    bucket = s3_res.Bucket(bname)
    #For loop will iterate all the objects within the bucket
    print("List of objects with the bucket")
    for obj in bucket.objects.all():
        #obj.key provide only object name
        print(obj.key)


def main():
    bucket_name = object_parser()
    list_object(bucket_name.bucket)
    
if __name__ == "__main__":
    main() 

Here we are trying to pass the bucket name as an argument – “argparse” module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv

In the above code, we are passing only one parameter i.e source bucket. In order to pass the argument we have run the code as

python <filename.py> <bucketname>

It looks more like the below screenshot

9. From the above list we can choose any object to be copied to the destination bucket.

It’s a simple program, we are trying to copy the object from the source to the destination bucket within the same account. So, we don’t have to think about access policies and security which are taken care of by using “aws configure”

import argparse
import boto3
import logging
from botocore.exceptions import ClientError

s3_client= boto3.client('s3')

def copy_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("bucket1", help = "Bucket Name1")
    parser.add_argument("bucket2", help = "Bucket Name2")
    parser.add_argument("filename", help = "filename")
    parser.add_argument("Outputfile", help = "Ofilename")
    args = parser.parse_args()
    return args

def copy_file(source,dest,fname,outputfn):

    copy_source = {
         #Source bucket name
        'Bucket': source,
        #Source file to be copied to destination
        'Key': fname
    }
    try:
        #s3_client.copy(copy_source, destination_bucket, object_name)
        s3_client.copy(copy_source, dest, outputfn)
        print("File Copied Successfully")
    except ClientError as e:
        #Provides log message 
        logging.error(e)
        return False
    return True

def main():
    global s3_client
    arg_val =copy_parser() 
    copy_file(arg_val.bucket1,arg_val.bucket2,arg_val.filename, arg_val.Outputfile)


if __name__ == "__main__":
    main()

In copy_source, the bucket will be the “name of source_bucket”, and the key will be the “name of the object” to be copied.

s3_client.copy(copy_source, destination_bucket, object_name) – I have provided my destination_bucket which should be changed according to yours and object_name can be any name the way you wanted the file to be stored in the destination bucket along with the extension.

The try and except block help in identifying the type of error in case of a FileNotFound, credential error.

Config (boto3.s3.transfer.TransferConfig) — The transfer configuration is been used when performing the copy.

So, here I am trying to pass all the values using argparse which we have seen in the previous step. Here we are passing 4 parameters – Source and destination bucket, filename to be copied, and destination filename in what name file to be stored.

10. Run the python files as

python <filename.py> <sourcebucket> <destinationbucket> <filetobecopied> <destinationfilename>

In AWS Console initially, this is the Source bucket- awsbucketvn, and the destination bucket – destbuckvn

Objects within the Source bucket, images.jpg is to be copied to the destination bucket

The destination bucket is empty before copying.

After, running the program the object is copied to the destination bucket

Hope you find my blog informative. Using, AWS SDK for Python (Boto3) we can automate various services similar to the above functionality. Moreover, argparse helps in sending the value from the console rather than using the value within the program using AWS SDK

Thanks for reading my blog!!!