SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software
Development resources, articles, tutorials, code samples, tools and downloads for AWS Amazon Web Services, Redshift, AWS Lambda Functions, S3 Buckets, VPC, EC2, IAM

Connect Amazon DocumentDB from AWS Lambda Function using Python


In this AWS Lambda tutorial I will share sample Python codes for cloud developers that they can use to connect DocumentDB from Lambda function developed using Python. Amazon DocumentDB is MongoDB compatible document database developed for cloud. Using MongoDB drivers, developers can connect from their applications to Amazon DocumentDB database to store and retrieve their application data. I hope this tutorial will be helpful for Python developers who want to connect to DocumentDB from their serverless solution using Lambda functions.

First of all, there are some prerequisites to connect successfully from your Lambda function to DocumentDB database.


Network Connectivity

One major requirement for building a connection between Lambda function and DocumentDB database platform is network connectivity. Since DocumentDB is a VPC only service, you should place the DocumentDB cluster within a VPC in your AWS account. You should create your Lambda functions within the same VPC or within a VPC which is peered or connected to DocumentDB VPC via a transit gateway connectivity. Routes should be defined and DocumentDB port (which is 27017 by default) should be enabled. Security groups or firewall settings as we know from on prem should allow inbound and outbound connections over TCP port 27017 (which is the default port for DocumentDB database connections)

If you go details of the DocumentDB cluster, you will see attached security groups to you cluster on the "Connectivity & security" tab. If you check the configuration of the attached security group, you should validate that there is an inbound rule for TCP port 27017 from Lambda function security group.

So first, you should create your Lambda function within a VPC and subnets that has a network connection enabled. While creating your Lambda function, create a new security group for your Lambda function. Use this security group name in the configuration of the DocumentDB security group for inbound connection source.

Below is a screenshot from my sample DocumentDB cluster security group inbound rules. You will see there is an inbound rule referincing the security group of the Lambda function as source for TCP port 27017.

AWS Security Group inbound rule

If you want to connect to an existing Amazon DocumentDB cluster, you may not be able to identify the VPC subnets where the target DocumentDB cluster and its nodes are deployed into. Unfortunately, it is not crystal clear on the AWS Management Console Amazon DocumentDB dashboard even if you drill into details of the cluster. You can only see the attached subnet group identifier for a specific DocumentDB cluster using AWS CLI command describe-db-clusters

Assuming that your Amazon DocumentDB cluster name is "docdb-sandbox", following AWS CLI command will help Lambda developers to display the subnet group name

aws docdb describe-db-clusters --db-cluster-identifier docdb-sandbox
AWS CLI Command for Amazon DocumentDB Cluster details

If you read the output of the CLI command you will see the DBSubnetGroup name is "documentdb-subnet-group"

AWS CLI command for Amazon DocumentDB cluster details

After you get the DocumentDB subnet group name, you can go to Subnet Groups section on DocumentDB dashboard and check VPC details and selected subnets for the DocumentDB cluster nodes.

Amazon DocumentDB VPC and Subnet Group

Check route tables if you have used different VPCs, subnets, etc. Your AWS Lambda functions should be able to initiate connections to DocumentDB cluster.


Create a Layer for AWS Lambda Function to use PyMongo Library

Amazon DocumentDB is MongoDB compatible. Programmers can easily adapt their MongoDB applications to Amazon DocumentDB. MongoDB drivers are supported for connections to DocumentDB too.

Using PyMongo is the preferred way of connecting to MongoDB so to DocumentDB too from applications developed by using Pyton. That is why for Python programmers who are building AWS Lambda functions using Python, if they want to connect Amazon DocumentDB from Lambda function, use of PyMongo is the easiest way.

Unfortunately, PyMongo library is not included within your Lambda environment by default. To include the PyMongo distribution and use it in a Lambda function developed in Python, developers should create a specific Lambda Layer that includes PyMongo and its dependencies and import it into their Lambda function.

Please refer to Create Lambda Layer for PyMongo To Connect DocumentDB using Python for details.

After the Lambda layer containing the PyMongo package is created, cloud developers building Lambda function can refer to this newly created Lambda layer.
To add a layer to an existing Lambda function, go to the Code tab of the target Lambda function.
At the bottom of the screen, there is the "Layers" section.
Click on "Add a layer"
Choose option "Custom layers" and select the related Lambda layer including the PyMongo distribution.
If required you can select specific version of the Lambda layer. That is all for introducing PyMongo capabilities into your Lambda function.

AWS Lambda Layer to connect Amazon DocumentDB database


Lambda Function Code for Connecting DocumentDB using Python

Amazon DocumentDB supports TLS connections for clients to encrypt data in-transit. This will secure the communication between your AWS Lambda functions and Amazon DocumentDB cluster. Amazon DocumentDB database administrators can enable or disable TLS connection for their cluster using paramter group settings. The connection string and the Python code that Lambda developers will use requires slight changes based on the use of TLS for connection or not.

The client certificates to be used for TLS connection can be obtained from https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
You can find more information at Connecting Programmatically to Amazon DocumentDB

My approach for this example Lambda function was just to download the public key file required to connect DocumentDB prior and store it in an Amazon S3 bucket that my Lambda functions can access. You can create a VPC endpoint for Amazon S3 in your VPC where Lambda functions are created in. This will enable fast and reliable connection between your Lambda functions and S3 location of the pem file, the download time and speed will be more predictable.

Of course if you will run the Lambda function numerous times, I expect this if you are using MongoDB or NoSQL DocumentDB behind your application, you may think of storing the pem file with your Layer instead of downloading it everytime. Or you may attach an EFS file system and share the same pem file with your all Lambda functions, etc.

For demo purposes, I use a DocumentDB database named sandbox and a sample collection named namescollection. This sample Python Lambda function downloads certicate file from Amazon S3 bucket, generates sample data, connects to DocumentDB cluster, inserts sample data into target collection, reads it back and closes the database connection.

Here is the sample Python code for the Lambda function.
Please note that some of the values like connection string, username and password, etc can be moved to environment variables and encrypted or better moved to AWS Secrets Manager and kept in a more secure way.

One more important information for Lambda developers.
When the certificate file from Amazon S3 bucket is downloaded, it is saved into the /tmp directory on local storage. This is the working way of Lambda in general. All downloaded files are stored in tmp folder.
Please check the connection string in the following code block.
You will see that tlsCAFile argument value is set as "/tmp/global-bundle.pem" including the download directory "/tmp" and the file name.

import json
import pymongo as pm
import sys
import boto3
import random
import string


def generate_random_string(length):
   letters = string.ascii_lowercase
   result_str = ''.join(random.choice(letters) for i in range(length))
   return result_str


def lambda_handler(event, context):

   # Download pem file and save locally
   try:
      s3 = boto3.client('s3')
      bucket_name = "bucket4layers"
      filename = "global-bundle.pem"
      download_as = "/tmp/global-bundle.pem"

   s3_client = boto3.client('s3')
   s3_client.download_file(bucket_name, filename, download_as)
   except:
      print("An exception occurred while reading pem file from S3 bucket")


   # generate sample data
   pName = generate_random_string(8)
   print(pName)

   # Connect to DocumentDB
   client = pm.MongoClient('mongodb://docdbadmin:docdbadminpwd@docdb-sandbox.cluster-aabbccddeeffgg.eu-central-1.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=/tmp/global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false')

   # switch to target database
   db = client.sandbox

   # specify the target collection
   col = db.namecollection

   # insert sample document to the collection
   col.insert_one({'name': pName})

   # fetch the recent document that has been just written
   docread = col.find_one({'name':pName})

   # close DocumentDB database connection
   client.close()

   return {
      'statusCode': 200,
      'body': json.dumps(str(docread))
   }
AWS Lambda function Python code for Amazon DocumentDB access

Note that the connection string can be copied from Amazon DocumentDB cluster details screen "Connectivity & security" tab.
Use the "Copy" functionality on the section "Connect to this cluster with an application"
Just replace the tlsCAFile parameter value within the connection string according to your configuration.

connection string for Amazon DocumentDB

If TLS is not enabled on your Amazon DocumentDB cluster, then you don't need to download the pem file.
Just remove all Python code that connects to Amazon S3 service and reads public key file from related S3 bucket
The connection string will change too. It will be simple
You can omit tls and tlsCAFile connection string parameters to connect successfully to your Amazon DocumentDB cluster.
Please note that to enable or disable TLS encryption for data in-transit, administrators should modify the parameter group configuration file for MongoDB compatible Amazon DocumentDB cluster.

DocumentDB cluster parameter group configuration for TLS connection

I hope this AWS Lambda tutorial is useful for Python developers who want to connect MongoDB compatible Amazon DocumentDB cluster from their Lambda functions.



AWS


Copyright © 2004 - 2024 Eralper YILMAZ. All rights reserved.