SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Delete Log Files from Amazon S3 Bucket using Scheduled AWS Lambda Function


Cloud developers use AWS Lambda functions to automate tasks on their cloud environment like deleting files from Amazon S3 buckets. AWS Lambda functions are major components for building serverless solutions on Amazon cloud. Using Lambda functions and triggering them with Amazon CloudWatch events and rules enables cloud developers and operation teams to schedule the execution of these Lambda functions.

In this AWS Lambda tutorial, I want to show DevOps team members how they can create an AWS Lambda function to delete Amazon S3 bucket files using boto3 library with sample Python code and how to schedule the Lambda function execution with Amazon CloudWatch service.

In Amazon S3 bucket, thousands of log files were created automatically. I wanted to clear the log files using an Amazon Lambda function. Since the Amazon S3 bucket might contain other files than the log files, I could not delete all at once. I was lucky that the log files had a patern used for their names so I can distinguish programmatically whether a file is a log file to delete or a data file to keep in AWS S3 bucket

Amazon S3 buket containing log files to delete prorammatically

I created a new AWS IAM role and edited the default execution role policy.
I allowed all S3 actions including delete action by using "*" for the specified Amazon S3 bucket resource

{
 "Sid": "VisualEditor1",
 "Effect": "Allow",
 "Action": "s3:*",
 "Resource": [
  "arn:aws:s3:::/datavirtuality-cdl",
  "arn:aws:s3:::/datavirtuality-cdl/*"
 ]
}

While editing the IAM policy to attach to the AWS Lambda function, on Visual Editor, check and try to clear all warnings that might prevent target Amazon S3 resource or bucket and CloudWatch logs.

For AWS Lambda function, I preferred using Python 3.8 as the programming language of my serverless application which will iterate through all files in the Amazon bucket and decide to delete the file by checking the file name matching for a defined string pattern.

If you check the first figure once more, you can also realize that the log files created automatically have a basic pattern that can be observed on file names.
All file names started with a prefix and has a standard length of 54 characters.
Here is an example:
datavirtuality-cdl2020-06-18-05-30-51-EFCBD4824959E751
2020-06-18-05-30-51-EFCBD4824959E751
EFCBD4824959E751

Below Python codes using the boto3 library helps cloud developer to list the top 1000 files so that their names start with a given prefix and has a file length of 54 characters long.
Then with in the Python for loop these files are deleted one by one

import json
import boto3
from boto3 import client

def lambda_handler(event, context):
 # TODO implement
 bucket_name = "datavirtuality-cdl"
 prefix = "datavirtuality-cdl"
 s3_conn = client('s3')
 s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter = "/")

 file_list = []
 for key in s3_result['Contents']:
  file_list.append(key['Key'])

 print(len(file_list))

 file_list.sort()
 for key in file_list:
  if len(key) == 54:
   print(key)
   s3_conn.delete_object(Bucket=bucket_name, Key=key)

 return {
  'statusCode': 200,
  'body': json.dumps('Hello from Lambda!')
 }
Code

Python code sample for AWS Lambda function deleting Amazon S3 files

Before I test the Lambda function, I created a dummy test event.
Since we don't need input parameters or execute the function based on a trigger, the test event details is not important.

When I execute the AWS Lambda function for the first time, I see that execution took about 21 seconds.

After execution the AWS Lambda functon a few times manually I decided to set Timeout to 1 minute.
I also increased the Memory to 512 MB although the Max Memory Used is around 80 MB for each execution.
Since the AWS Lambda pricing is based on number of execution times and the execution period, the timeout is critical. Sparing huge amount of RAM will also effect the price, but insufficient RAM will cause longer execution times.

Since every execution only lists and deletes 1000 files from S3 bucket, I created a schedule using CloudWatch rules and call the AWS Lambda function every minute.

Following AWS tutorial for Lambda developers shows schedule an AWS Lambda function to run periodically and create scheduled Lambda functions by using Amazon CloudWatch rules

schedule AWS Lambda functions using Amazon CloudWatch rules

That is all, after the schelude is enabled the AWS serverless Lambda function starts executing periodically once every minute and clears the Amazon S3 bucket from unwanted log files.



AWS


Copyright © 2004 - 2024 Eralper YILMAZ. All rights reserved.