Kloudnative
Posts
A Practical Guide to Understanding Why AWS Lambda Matters and How to Architect Scalable Solutions

A Practical Guide to Understanding Why AWS Lambda Matters and How to Architect Scalable Solutions

Where Scalability Meets Simplicity: Building Modern Cloud Architecture Without Infrastructure Headaches

Kate Roylin
December 11, 2024

In partnership with

AWS Lambda has emerged as one of the most popular services in modern cloud architectures, celebrated for its ability to run code in a fully managed, serverless environment. However, many developers and architects only scratch the surface of what Lambda truly offers. They often see it as just a way to “write code that magically runs in the cloud.” In reality, Lambda is built on a sophisticated architecture that allows it to handle various workloads efficiently. It automatically scales to meet demand, ensuring high availability and reliability without the need for manual intervention.

In this article, we’ll dive deeper into the inner workings of AWS Lambda. We’ll explore how it manages instances, provisions resources, and executes functions in a truly serverless manner. For example, Lambda operates on an event-driven model, meaning it can respond to triggers from other AWS services like S3 or DynamoDB with ease. By the end of our discussion, you’ll have a comprehensive understanding of Lambda’s architecture and functionality. This knowledge will empower you to make informed optimizations and leverage Lambda more effectively in your applications, allowing you to build scalable and cost-efficient solutions tailored to your needs. Let me know if you need any further adjustments!

When we talk about serverless and cloud computing, it's crucial to remember that there are indeed servers running behind the scenes. However, these servers are fully managed by the cloud provider, which means they take care of provisioning, scaling, security, and isolation. This setup allows developers to concentrate on building their applications without the hassle of managing infrastructure.

Let’s explore some foundational concepts within the Lambda architecture that will enhance your understanding of how it operates.

Kloudnative is committed to staying free for all our users. We kindly encourage you to explore our sponsors to help support us.

Start learning AI in 2025

Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.

It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.

Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.

☝️ Support Kloudnative by clicking the link above to explore our sponsors!

Key Concepts in AWS Lambda

Functions: At the heart of AWS Lambda is the concept of a function. A function is essentially a bundle of code that you can invoke to run specific tasks. You can trigger these functions in response to various events, such as changes in data or HTTP requests. This event-driven model allows for seamless integration with other AWS services, making it easy to build responsive applications.
Event-Driven Architecture: Lambda is designed to respond to events automatically. Whether it’s a file being uploaded to an S3 bucket or a new record being added to a DynamoDB table, Lambda can execute your code in real-time. This capability is particularly powerful for creating applications that need to react quickly to changes or user actions.
Automatic Scaling: One of the standout features of AWS Lambda is its ability to scale automatically. As demand for your application fluctuates, Lambda adjusts the number of function instances running in response to incoming requests without any manual configuration. This means you can handle sudden spikes in traffic without worrying about performance degradation.
Execution Environment: Each Lambda function runs in its own isolated environment, which includes all necessary resources such as memory and CPU power. This environment is ephemeral; it starts when an event triggers your function and shuts down once the execution is complete. This design not only enhances security but also optimizes resource usage.
Deployment Packages: To get started with AWS Lambda, you package your code along with any dependencies into a deployment package. This package can be uploaded directly to AWS Lambda, where it will be stored and executed as needed. You can also utilize layers to manage shared code across multiple functions efficiently.
Concurrency and Limits: Lambda supports concurrent executions, allowing multiple instances of your function to run simultaneously as demand increases. However, there are limits on how many concurrent executions are allowed per account, which you should consider when designing your applications.

By understanding these key concepts, you can leverage AWS Lambda more effectively in your projects, creating scalable and efficient serverless applications that respond dynamically to user needs and events. With AWS handling the heavy lifting of infrastructure management, you can focus on what really matters—writing great code!

Understanding Lambda Runtimes

A runtime in AWS Lambda is responsible for managing the execution of your function code in response to events. It handles the relay of invocation events, context information, and responses between AWS Lambda and your function. AWS supports several managed runtimes for popular programming languages, including Node.js, Python, Java, Go, and .NET Core. Each major release of these languages corresponds toa unique runtime identifier, suchasnodejs20.xorpython3.13, allowing developers to select the version that best fits their application requirements.

Custom Runtimes

In addition to the managed runtimes provided by AWS, developers have the option to create custom runtimes. This is particularly useful for applications that require specific language versions or less common programming languages not supported by AWS's default offerings. Custom runtimes can be implemented alongside your function code or packaged in a Lambda layer, giving developers the flexibility to tailor the execution environment to their needs.

Performance Characteristics

Different runtimes exhibit varying performance characteristics, which can impact the efficiency of your applications:

Startup Latency: Some runtimes, like Node.js and Python, are known for their quick startup times, making them ideal for short-lived functions such as event processing or API backends. In contrast, runtimes like Java and .NET may experience higher cold start latency due to their initialization processes but can provide robust performance for long-running tasks.
Resource Management: Each runtime operates within a secure and isolated execution environment that manages the resources required to run your function. AWS Lambda may reuse these execution environments across invocations, which can help reduce startup latency if the same environment is used again.

Worker Hosts and MicroVMs

AWS Lambda operates using a system of worker hosts, which are essentially EC2 Spot instances, to manage a multitude of microVMs. These microVMs, created by a virtualization technology called Firecracker, serve as the execution environments for Lambda functions. Each microVM is dedicated to a single function invocation, ensuring that execution is both isolated and secure. This architecture allows multiple worker hosts to handle concurrent invocations of the same Lambda function, providing high availability and robust load balancing across different availability zones. Worker hosts have a maximum lease duration of 14 hours. As a worker approaches this limit, it will not receive any further invocations. Instead, the microVMs running on that worker are gracefully terminated, and the underlying worker instance itself is also terminated. Throughout this process, AWS Lambda continuously monitors and tracks the lifecycle activities of its fleet to ensure efficient management and operational reliability. This proactive monitoring helps maintain optimal performance and resource utilization within the AWS Lambda environment.

Firecracker

Firecracker is an innovative virtualization technology developed by Amazon Web Services (AWS) that powers serverless computing, particularly through AWS Lambda and AWS Fargate. Designed to create and manage lightweight virtual machines known as microVMs, Firecracker combines the security of traditional virtual machines with the efficiency of container technology. Its minimalist architecture, built on the Linux Kernel-based Virtual Machine (KVM), allows for rapid provisioning of microVMs—up to 150 per second—while maintaining a small codebase of around 50,000 lines, significantly reducing overhead and enhancing security. Each microVM is dedicated to a single function invocation, ensuring isolated execution environments that optimize resource allocation. Firecracker employs optimized communication interfaces called virtio to improve performance and supports multiple processor architectures, including Intel, AMD, and ARM. As an open-source project under the Apache 2.0 license, it encourages community contributions and fosters innovation. Overall, Firecracker enhances the scalability, performance, and security of serverless applications, making it a vital component in modern cloud computing.

Lambda Invocations

AWS Lambda offers a variety of invocation methods to meet different application needs, with its architecture designed to efficiently support each method. Understanding how these invocation types function is crucial for optimizing performance and scalability.

Invocation Types

Synchronous Invocation: This method is commonly used for interactive workloads, such as APIs. For example, when an API Gateway triggers a Lambda function, the function processes the request, queries a database, and returns a response directly. This approach is immediate and responsive, making it ideal for real-time data processing where waiting for a response is essential.

Asynchronous Invocation: This type is suited for scenarios where immediate feedback isn’t necessary, such as processing data uploaded to Amazon S3. In this case, the event triggers an internal queue managed by AWS Lambda, which processes the function asynchronously. This method is perfect for workloads that can tolerate some delay in processing after the triggering event occurs.

Event Source Mapping: This invocation method is particularly useful for streaming data services like Amazon Kinesis or DynamoDB Streams. In this setup, Lambda polls these sources and invokes the function based on incoming data. This efficient handling of batch processing makes it integral for applications that deal with continuous data streams.

By leveraging these distinct invocation methods, AWS Lambda allows developers to tailor their serverless applications to specific requirements, ensuring optimal performance and scalability across various workloads.

Lambda General Architecture

There are multiple ways to invoke a Lambda function, and while the invocation method varies depending on the service triggering it, the core internal architecture remains consistent. However, Lambda interacts differently with synchronous (Sync) and asynchronous (Async) invocations. To understand this better, please refer to the following architecture diagram.

At the heart of each Lambda invocation is the Frontend Service, which is the entry point for Lambda functions. When a Lambda function is invoked, the Frontend Service manages the request and directs it to the appropriate data plane services, initiating the execution process.

Lambda functions can be invoked in two primary ways: synchronously or asynchronously.

Synchronous Invocation: In synchronous invocations, the Frontend Service directly routes the request to a MicroVM for immediate processing.
Asynchronous Invocation: For asynchronous invocations, the Frontend Service places the request into an internal queue within Lambda. This internal queuing mechanism efficiently handles the distribution of queued events to available MicroVMs. The queue enables Lambda to maintain a balanced load and optimize performance, particularly during high-traffic periods or demand spikes, by ensuring smooth event distribution.

The next step is to dive deeper into the execution of synchronous vs. asynchronous invocations within Lambda’s internal architecture. But first, let’s examine the architecture of the Event Source Mapping component, an essential part of Lambda’s functionality.

Event Source Mapping Architecture

In Lambda’s architecture, event source mappings play a critical role by enabling Lambda to poll for new records or messages from specific event sources and then invoke the target Lambda function. This process is managed internally by the Lambda service, which reads incoming messages in batches and passes them to your function as a single event payload. Batching enables high-throughput message processing, allowing up to 10,000 messages per batch.

Lambda also provides a “batching window” feature that allows you to specify the maximum time, in seconds, that Lambda waits to gather records before invoking the function. This feature helps you control how Lambda aggregates messages based on time, count, or payload size, optimizing function invocations to meet your application’s performance needs.

Synchronous Execution In Lambda

The Invoke API can be called in two modes: event mode and request-response mode.

Event mode queues the payload for an asynchronous invocation.
Request-response mode synchronously invokes the function with the provided payload and returns a response immediately.

In both cases, the function execution is always performed in a Lambda execution environment (we cover it later), but the payload takes diﬀerent paths.

When Lambda receives a request-response invoke, it is passed to the invoke service directly. If the invoke service is unavailable, callers may temporarily queue the payload client-side to retry the invocation a set number of times.

If the invoked service receives the payload, the service then attempts to identify an available execution environment for the request and passes the payload to that execution environment to complete the invocation. If no existing or appropriate execution environments exist, one will be dynamically created in response to the request. While in transit, invoke payloads sent to the invoke service are secured with TLS.

Traﬃc within the Lambda service (from the load balancer down) passes through an isolated internal virtual private cloud (VPC), owned by the Lambda service, within the AWS Region to which the request was sent.

Step 1: The Worker Manager communicates with a Placement Service which is responsible for placing a workload on a location for the given host (it’s provisioning the sandbox) and returns that to the Worker Manager.

Step 2: The Worker Manager can then call Init to initialize the function for execution by downloading the Lambda package from the S3/ECR image and setting up the Lambda runtime

Step 3: The Frontend Worker is now able to call Invoke.

Asynchronous Execution In Lambda

Event invocation mode payloads are always queued for processing before invocation. All payloads are queued for processing in an Amazon SQS queue. Queued events are always secured in-transit with TLS, and are encrypted at rest using Server-Side-Encryption (SSE).

Queued events can be stored in a shared queue but may be migrated or assigned to dedicated queues depending on a number of factors that cannot be directly controlled by customers (for example, rate of invoke, size of events, and so on).

Queued events are retrieved in batches by Lambda’s poller ﬂeet. The poller ﬂeet is a group of Amazon EC2 instances whose purpose is to process queued event invocations which have not yet been processed. When the poller ﬂeet retrieves a queued event that it needs to process, it does so by passing it to the invoke service just like a customer would in a request-response mode invoke.

If the invocation cannot be performed, the poller ﬂeet will temporarily store the event, in-memory, on the host until it is either able to successfully complete the execution, or until the number of run retry attempts have been exceeded. No payload data is ever written to disk on the poller ﬂeet itself.

When an event fails all processing attempts, it is discarded by Lambda. The dead letter queue (DLQ) feature allows sending unprocessed events from asynchronous invocations to an Amazon SQS queue or an Amazon SNS topic defined by the customer.

Step 1: The Application Load Balancer forwards the invocation to an available Frontend which places the event onto an internal queue (SQS).

Step 2: There is a set of pollers assigned to this internal queue which are responsible for polling it and moving the event onto a Frontend synchronously. After it’s been placed onto the Frontend it follows the synchronous invocation call pattern that we showed earlier.

Lambda Runtimes

We’ve covered how AWS Lambda can be invoked and how it interacts with these invocation methods, providing insight into Lambda’s architecture and its ability to handle different invocation types. Now, let’s take a closer look at Lambda’s MicroVMs to understand how Lambda executes your code and manages user requests and events.

Lambda supports a variety of programming languages through the use of runtimes. A runtime is a language-specific environment that facilitates the communication of invocation events, context information, and responses between Lambda and the function code. You can choose from AWS-provided runtimes or create custom ones to suit your application’s unique needs.

Each major version of a supported programming language has a unique runtime identifier, such as nodejs20.x or python3.13. Additionally, if you define a function as a container image, you can select both a runtime and a Linux distribution when creating the container image.

Lambda Runtime API

AWS Lambda provides an HTTP API for custom runtimes to receive invocation events from the Lambda service and send response data back within the Lambda execution environment (which runs on a MicroVM).

To create an API request URL, runtimes get the API endpoint from the AWS_LAMBDA_RUNTIME_API environment variable, add the API version, and add the desired resource path.

Example:

curl "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next"

let us explore the available API methods provided by the Runtime API

Next invocation

Path — /runtime/invocation/next

Method — GET
The runtime sends this message to Lambda to request an invocation event. The response body contains the payload from the invocation, which is a JSON document that contains event data from the function trigger. The response headers contain additional data about the invocation.

Do not set a timeout on the GET request as the response may be delayed. Between when Lambda bootstraps the runtime and when the runtime has an event to return, the runtime process may be frozen for several seconds.

The request ID tracks the invocation within Lambda. Use it to specify the invocation when you send the response.

Invocation response

Path — /runtime/invocation/AwsRequestId/response

Method — POST

After the function has run to completion, the runtime sends an invocation response to Lambda. For synchronous invocations, Lambda sends the response to the client.

Example success request

REQUEST_ID=156cb537-e2d4-11e8-9b34-d36013741fb9
curl "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response"  -d "SUCCESS"

Initialization error

If the function returns an error or the runtime encounters an error during initialization, the runtime uses this method to report the error to Lambda.

Path — /runtime/init/error

Method — POST

Example initialization error request

ERROR="{\"errorMessage\" : \"Failed to load function.\", \"errorType\" : \"InvalidFunctionException\"}"
curl "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/init/error" -d "$ERROR" --header "Lambda-Runtime-Function-Error-Type: Unhandled"

Invocation error

If the function returns an error or the runtime encounters an error, the runtime uses this method to report the error to Lambda.

Path — /runtime/invocation/AwsRequestId/error

Method — POST

Example error request

REQUEST_ID=156cb537-e2d4-11e8-9b34-d36013741fb9
ERROR="{\"errorMessage\" : \"Error parsing event data.\", \"errorType\" : \"InvalidEventDataException\"}"
curl "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/error" -d "$ERROR" --header "Lambda-Runtime-Function-Error-Type: Unhandled"

Lambda Execution Environment

Now that we’ve explored how Lambda interacts with the Runtime API within the execution environment, which handles function invocations and reports back to the Lambda service, it’s time to understand the lifecycle of this execution environment.

When Lambda invokes your function, it operates within an execution environment, a secure, isolated runtime environment that manages the resources required for function execution. This environment also supports the lifecycle of the function’s runtime and any associated extensions.

Runtime and Extensions Communication: Within the execution environment, the function’s runtime communicates with Lambda using the Runtime API. Any external extensions you add to your function interact with Lambda through the Extensions API, and these extensions can receive log messages and telemetry data using the Telemetry API.
Configuration and Resource Management: When creating a Lambda function, you define configurations such as memory allocation and maximum execution time. Lambda uses this configuration to set up the execution environment, ensuring that resources are allocated as specified.
Environment Sharing: The function’s runtime and any external extensions run as separate processes within the execution environment, but they share permissions, resources, credentials, and environment variables, enabling smooth interaction while maintaining isolation.

Lambda optimizes performance by reusing execution environments when possible. If an environment from a previous invocation is available, Lambda reuses it, reducing the overhead of creating a new environment. Otherwise, Lambda will initialize a fresh execution environment as needed.

Execution Environment Lifecycle events

Each phase starts with an event that Lambda sends to the runtime and to all registered extensions. The runtime and each extension indicate completion by sending a Next API request. Lambda freezes the execution environment when the runtime and each extension have completed and there are no pending events.

The lifecycle of an AWS Lambda execution environment can be divided into several key phases, each of which plays a crucial role in preparing, running, and managing the function. Understanding these phases can help you optimize function performance and make informed decisions about Lambda configurations.

Init phase

In the Init phase, Lambda performs three tasks:

Start all extensions (Extension init)
Bootstrap the runtime (Runtime init)
Run the function’s static code (Function init)
Run any beforeCheckpoint runtime hooks (Lambda SnapStart only ~ Java)

The Init phase ends when the runtime and all extensions signal that they are ready by sending a Next API request. The Init phase is limited to 10 seconds. If all three tasks are not completed within 10 seconds, Lambda retries the Init phase at the time of the first function invocation with the configured function timeout.

When you use provisioned concurrency, Lambda initializes the execution environment when you configure the PC settings for a function. Lambda also ensures that initialized execution environments are always available in advance of invocations. You may see gaps between your function’s invocation and initialization phases. Depending on your function’s runtime and memory configuration, you may also see variable latency on the first invocation on an initialized execution environment.

For functions using on-demand concurrency, Lambda may occasionally initialize execution environments ahead of invocation requests. When this happens, you may also observe a time gap between your function’s initialization and invocation phases. We recommend you to not take a dependency on this behavior.

Failures during the Init phase

If a function crashes or times out during the Init phase, Lambda emits error information in the INIT_REPORT log.

Example — INIT_REPORT log for timeout

INIT_REPORT Init Duration: 1236.04 ms Phase: init Status: timeout

Example — INIT_REPORT log for extension failure

INIT_REPORT Init Duration: 1236.04 ms Phase: init Status: error Error Type: Extension.Crash

If the Init phase is successful, Lambda doesn't emit the INIT_REPORT log—unless SnapStart is activated. SnapStart functions always emit INIT_REPORT.

Invoke phase

When a Lambda function is invoked in response to a Next API request, Lambda sends an Invoke event to the runtime and to each extension.

The function’s timeout setting limits the duration of the entire Invoke phase. For example, if you set the function timeout as 360 seconds, the function and all extensions need to complete within 360 seconds. Note that there is no independent post-invoke phase. The duration is the sum of all invocation time (runtime + extensions) and is not calculated until the function and all extensions have finished executing.

The invoke phase ends after the runtime and all extensions signal that they are done by sending a Next API request.

Failures during the invoke phase

If the Lambda function crashes or times out during the Invoke phase, Lambda resets the execution environment. The following diagram illustrates Lambda execution environment behavior when there's an invoke failure:

The first phase is the INIT phase, which runs without errors.
The second phase is the INVOKE phase, which runs without errors.
At some point, suppose your function runs into an invoke failure (such as a function timeout or runtime error). The third phase, labeled INVOKE WITH ERROR , illustrates this scenario. When this happens, the Lambda service performs a reset. The reset behaves like a Shutdown event. First, Lambda shuts down the runtime, then sends a Shutdown event to each registered external extension. The event includes the reason for the shutdown. If this environment is used for a new invocation, Lambda re-initializes the extension and runtime together with the next invocation.
Note that the Lambda reset does not clear the /tmp directory content prior to the next init phase. This behavior is consistent with the regular shutdown phase.
The fourth phase represents the INVOKE phase immediately following an invoke failure. Here, Lambda initializes the environment again by re-running the INIT phase. This is called a suppressed init. When suppressed inits occur, Lambda doesn’t explicitly report an additional INIT phase in CloudWatch Logs. Instead, you may notice that the duration in the REPORT line includes an additional INIT duration + the INVOKE duration.
The fifth phase represents the SHUTDOWN phase, which runs without errors.

Shutdown phase

When Lambda is about to shut down the runtime, it sends a Shutdown event to each registered external extension. Extensions can use this time for final cleanup tasks. The Shutdown event is a response to a Next API request.

Duration: The entire Shutdown phase is capped at 2 seconds. If the runtime or any extension does not respond, Lambda terminates it via a signal (SIGKILL).

After the function and all extensions have been completed, Lambda maintains the execution environment for some time in anticipation of another function invocation. However, Lambda terminates execution environments every few hours to allow for runtime updates and maintenance — even for functions that are invoked continuously. You should not assume that the execution environment will persist indefinitely.

When the function is invoked again, Lambda thaws the environment for reuse. Reusing the execution environment has the following implications:

Objects declared outside of the function’s handler method remain initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We recommend adding logic in your code to check if a connection exists before creating a new one.
Each execution environment provides between 512 MB and 10,240 MB, in 1-MB increments, of disk space in the /tmp directory. The directory content remains when the execution environment is frozen, providing a transient cache that can be used for multiple invocations. You can add extra code to check if the cache has the data that you stored.
Background processes or callbacks that were initiated by your Lambda function and did not complete when the function ended resume if Lambda reuses the execution environment. Make sure that any background processes or callbacks in your code are complete before the code exits.

Storage and state in Execution environments

Execution environments are never reused across diﬀerent function versions or customers, but a single environment can be reused between invocations of the same function version. This means data and state can persist between invocations. Data and/or state may continue to persist for hours before it is destroyed as a part of normal execution environment lifecycle management.

For performance reasons, functions can take advantage of this behavior to improve eﬃciency by keeping and reusing local caches or long-lived connections between invocations. Inside an execution environment, these multiple invocations are handled by a single process, so any process-wide state (such as a static state in Java) can be available for future invocations to reuse, if the invocation occurs on a reused execution environment.

Each Lambda execution environment also includes a writeable ﬁlesystem, available at /tmp. This storage is not accessible or shared across execution environments. As with the process state, ﬁles written to /tmp remain for the lifetime of the execution environment. This allows expensive transfer operations, such as downloading machine learning (ML) models, to be amortized across multiple invocations. Functions that don’t want to persist data between invocations should either not write to /tmp, or delete their ﬁles from /tmp between invocations. The /tmp directory is encrypted at rest.

If you want to persist data to the ﬁle system outside of the execution environment, consider integrating Lambda with Amazon EFS. Please refer to Using Amazon EFS with Lambda.

Lambda Control Plans & Security

When Lambda runs a function on your behalf, it manages both provisioning and conﬁguring the underlying systems necessary to run your code. This enables your developers to focus on business logic and writing code, not administering and managing underlying systems.

The Lambda service is split into the control plane and the data plane. Each plane serves a distinct purpose in the service. The control plane provides the management APIs (for example, CreateFunction, UpdateFunctionCode, PublishLayerVersion, and so on), and manages integrations with all AWS services. Communications to the Lambda control plane are protected in-transit by TLS. Customer content stored within Lambda's control plane is encrypted at rest using AWS KMS keys, which are designed to protect the content from unauthorized disclosure or tampering.

The data plane is Lambda’s invoke API that cues the invocation of Lambda functions. When a Lambda function is invoked, the data plane allocates an execution environment on an AWS Lambda Worker (or simply worker, a type of Amazon EC2 instance) to that function version, or chooses an existing execution environment that has already been set up for that function version, which it then uses to complete the invocation.

Execution role

Each Lambda function must also be conﬁgured with an execution role, which is an IAM role that is assumed by the Lambda service when performing control plane and data plane operations related to the function. The Lambda service assumes this role to fetch temporary security credentials which are then available as environment variables during a function’s invocation. For performance reasons, the Lambda service will cache these credentials, and may re-use them across diﬀerent execution environments which use the same execution role.

To ensure adherence to least privilege principle, Lambda recommends that each function has its own unique role, and that it is conﬁgured with the minimum set of permissions it requires.

The Lambda service may also assume the execution role to perform certain control plane operations such as those related to creating and conﬁguring elastic network interfaces (ENIs) for VPC functions, sending logs to Amazon CloudWatch Application Insights, sending traces to AWS X-Ray, or other non-invoke related operations.

Deployment in AWS Lambda

AWS Lambda offers two primary deployment methods for functions, each catering to different application sizes and requirements.

Deployment Options:

ZIP Deployment: This method suits smaller functions with limited dependencies. The ZIP deployment is straightforward but constrained by a lower size limit, making it less suitable for more extensive applications.
Container Image Deployment: For larger applications, Lambda supports container images up to 10 GB. This increased capacity is ideal for applications that need larger libraries or more significant dependencies.

AWS Lambda has several peculiarities here to optimize performance for deployment:

1) Invocation Constraint in Firecracker

Lambda uses Firecracker for creating MicroVMs, each handling one invocation at a time. This model means a single instance cannot simultaneously process multiple requests, which is a consideration for high-throughput applications.

2) Caching as a Performance Enhancement

Lambda employs a three-tiered caching system to improve function performance:

L1 Cache (Local Cache on Worker Host): Located directly on the worker host, this cache allows for quick access to frequently used data, essential for speeding up function invocations.
L2 Cache (Shared Across Worker Hosts and Customers): This shared cache holds common data across different Lambda functions and customers, optimizing performance by reducing redundant data fetching.
L3 Cache (S3 Bucket Managed by AWS): The L3 cache, for less frequently accessed data, provides efficient long-term storage in an S3 bucket, reducing retrieval times.

3) Optimizing Container Deployment

To maximize caching benefits, especially with container images, it’s advisable to strategically structure container layers. Place stable elements like the operating system and runtime in base layers, and put frequently changing business logic in upper layers. This setup allows for more efficient caching of static components, speeding up the Lambda function’s loading process.

Conclusion

AWS Lambda is a sophisticated service that abstracts much of the underlying complexity and infrastructure management. However, gaining a deep understanding of its inner workings can be invaluable. By knowing what happens behind the scenes, you can better appreciate how AWS manages scaling, resource allocation, and function execution to ensure seamless performance.

In this post, we explored Lambda’s invocation methods, core architecture, and the execution environment lifecycle. This knowledge empowers you to make more informed choices and optimize your Lambda functions, giving you greater control over their behavior, efficiency, and scalability in your applications.