Designing a serverless recommender in AWS

6 min readJan 14, 2021

In this post I sketch the architecture of a recommender in AWS. Recently, AWS has released a Recommender service that should in principle handles the whole infrastructure of such an application. However, building it “from scratch” is way more interesting since it requires hooking quite a lot of different AWS services for creating a champion architecture.

To be a bit more practical, let’s define a bit better what the recommender should do:

1- Ingest session data in real time

2- Combine the session information with another table stored in S3, which let’s say stores general customer information.

3- Update the recommender model with a frequency that is acceptable for the business case being tackled.

4- Serve the customer with the proper recommendations calculated by the recommender model.

While we always need to ingest the session data in real time, model updates could be triggered once a day, a week or month, so that the items being recommended for a customer are determined using historical data. This could be changed to a real time model update, but it would require a lot more of effort from the engineering team and also quite more expansive resources. Below I describe a general architecture overview and important considerations when designing the system. Later on I will discuss in more details, but by no means comprehensively, each of the elements of such a system.

Architecture overview

Customer data is stored on the fly in DynamoDB and exported to S3, using AWS Data Pipeline. Athena is then invoked for doing the necessary ETL and combining the sessions information with the customer data. The new data is then stored in an S3 bucket together with the historical training data and Sage Maker is invoked for rerunning the model with the updated data. Sage Maker can be configured to provide an end point for serving data back to the customer. How often the model is updated really depends on the business use, but once a day, probably enough for fulfilling most of the requirements.

If however the model needs to be updated on the fly, the data processing bit of the architecture explained above needs to be adapted. In this case Amazon Kinesis could be used to feed the new data to an EMR cluster that would also read the customer data available in the S3 bucket for the necessary ETL. It also makes sense to integrate the update of the Recommender model to the data processing framework running on the EMR cluster as this would benefit from the large number of nodes used in the cluster and also avoid delays between the ingestion of the new session data and the update of the model. The results could then be inserted again in a DynamoDB table from which an endpoint would be created for serving customers with the proper recommendation. I would like though to restrain this post to only the Batch Recommender, since the Streaming one is a lot more complex and requires a quite different flavor of services for getting it up and running.

Serverless

In my opinion, serverless is the way to go for cloud solutions. The whole idea of cloud serverless solution is to rely on the expertise of the cloud providers for maintaining the necessary infrastructure for running the pipeline. Fully managed solutions (either built on a private datacenter or using “bare metal” services from cloud providers) might be on the long term cheaper if you consider only computing resources, however, maintaining this infrastructure would require paying a handful of competent data engineers for debugging any possible issues that might arise. In AWS or any other cloud provider there is a dedicated team for each of the cloud services offered and they have a huge expertise on how to run, maintain and fix the infrastructure, since an issue that occurred for company X will probably also occur for companies A, B, C and D.

1- Data Storage

DynamoDB is a great “serverless” key-value or document database, which can scale to Petabytes of data. In DynamoDB the underlying infrastructure is handled by AWS, except for the IO capacity of the table. Increasing the IO capacity of the table is directly proportional to the number of shards used for storing the table. In case of hotkeys, i.e., particular customers that are unusually active, I suggest the activation of the DAX accelerator, a caching layer for DynamoDB, which can both reduce the latency for requests
and decrease IO calls on the DynamoDB table. The DAX accelerator is applicable for both read and write operations. For reads, DAX store hotkeys in a caching layer which is first invoked before fetching the data from DynamoDB. For writes, DAX uses a write-through strategy where the incoming data is first ordered and then asynchronously stored along the cache layers of all nodes of the cluster. For this particular task we would need to have a Lambda hooked up to an API Gateway for inserting/updating values in the DynamoDB table in a secure way. Of course a smart partitioning of the data should be chosen. Partitioning it by customer id might be enough to avoid data skewing and bringing the latency of read/writes to the DB to a minimum.

2- Data processing

AWS DataPipeline can be readily used for extracting snapshots from DynamoDB and store it in S3. The join between the two datasets (customer and session datasets) could then be done with AWS Athena or Glue, which are both serverless data processing services within the AWS ecosystem. Athena is built on Presto, a SQL engine developed a few years ago by Facebook. Glue, among other features, allows you to write code in Scala/Python and also use Apache Spark for distributed ETL workload. Athena should be the service of choice if the data processing uses basically SQL. Glue on the other hand is a better fit for ETL patterns that differ considerably from simple SQL, or that have a lot of analytics involved.

3- Recommender model and serving

AWS Sage Maker is a powerful service for building machine learning jobs. The jobs can be defined in a jupyter notebooks, the main tool that Data Scientists use for programming. Moreover Sage Maker provides an extensive library with several machine learning algorithms that can support Data Scientist in the development of models. Last, but not the least, Sage Maker can be configured to automatically create an endpoint, which makes the results of the machine learning model readily available for the consumer, in our case a customer waiting for useful recommendations.

4- Security

It is of course essential to protect your application from attacks. Fortunately AWS makes it easier to undertake security best practices with only a few extra configurations.

1- Always encrypt the data both in-flight and at rest

2- Enable Server Access Logging in the bucket where the data is being stored. This will allow you to search through the logs for external connections having access to your data.

3- Isolate the recommender from other resources in your AWS account by hosting it in a dedicated VPC.

4- Protect your API gateway against common attacks using AWS WAF, enabling CORS and also use SSL certificates to avoid man-in-the-middle attacks. By doing that you will secure your application against 99% of the simple attacks used.

Conclusions

And that’s it! As you see, setting up a recommender requires hooking up several AWS services and quite broad understanding of many interesting topics: databases, data processing, ML modelling and of course, network/security for making your application an iron clad against attacks. As mentioned, some organizations might need a real time update of the recommender model, but this is a quite rare but interesting case (check this paper for further details!).

Have you designed a recommender for an organization? What was your approach?