This is a hands on post where I show how to create a SFTP server in AWS using the S3 Transfer Family service. For security, the SFTP server only allows connections from a list of IPs and users in the SFTP server only have access to a limited number of files stored in a S3 bucket. Furthermore, the server runs in a dedicated VPC for better isolation of the cloud resources.
The infrastructure is done using CloudFormation JSON templates. Alternatives for building this infrastructure would be to use Terraform, or AWS cdk.
Networking infrastructure set up
The json below shows…
In this post I sketch the architecture of a recommender in AWS. Recently, AWS has released a Recommender service that should in principle handles the whole infrastructure of such an application. However, building it “from scratch” is way more interesting since it requires hooking quite a lot of different AWS services for creating a champion architecture.
To be a bit more practical, let’s define a bit better what the recommender should do:
1- Ingest session data in real time
2- Combine the session information with another table stored in S3, which let’s say stores general customer information.
3- Update the…
I have seen several posts and tutorials on Delta Lake using “Hello World” kind of examples, where everything works wonderfully. However, as most of you know, the performance of data processing technologies changes drastically as the amount of data that it handles increases. That’s why I decided to evaluate Delta Lake in the wild, using a real world in production Spark job that processes around 100GBs of data. Here, I am going to share with you how Delta Lake helped me to fulfill the new requirements for the job, but also the disappointments I had along the way.
Imagine that you are looking for the book “Crime and Punishment” from Fyodor Dostoevsky in the city library. All you need to do, is simply go to the “International literature” session, if you are not in Russia, and look for the shelve with authors having surnames starting with “D”. This simple query would not be so easy though, if for some strange reasons the librarians decided to divide their sessions on the date of acquisition of the book. In that case, you would not only have any idea where to look for the book, but also, the copies…
Data Engineer, father, retired physicist.