Introduction
This project was done as part of a course in SMU, CS301 IT Solution Architecture. This module is part of the SMU-X curriculum, and the participating sponsor for this project is Ascenda Loyalty.
The course mainly focuses on the analysis, design and implementation of an IT solution through which business requirements, software qualities and solution elements are transformed into implementable artefacts.
The main deliverable for this project is to implement a system that enables user authentication such that our client application can exchange data with other services in a secure manner.
The main responsibilities of the team members are
- Dong Xian - Backend (JWT), CI-CD, Cloud Architecture
- Timothy - Backend (Enrollment/MFA), Cloud Architecture
- Shan Mei - Frontend, Backend (Bank SSO)
- Dai Wei - Frontend, Topic Research
- Yi Xin - Frontend, Cloud Architecture, Terraform
Stakeholders
The team has identified 4 main stakeholders that will be using the service
- Bank partners who will provide customer user data for developers to administer enrollment
- Bank customers who are end users using the rewards platform to perform point redemptions
- Administrators who will handle any user requests for the application, as well as manage user roles and permissions
- Loyalty program partners who will provide the gifts for point exchange
Requirements
There are two main broad requirements - functional and non-functional requirements. The following sections will briefly go through what has been done to achieve these requirements.
Functional requirements
The functional requirements can be broken down into these key features
- Enrollment - registration and logging in
Enrollment sequence diagram
Example of JWT
- User Authentication via hosted login or Single Sign On (SSO)
Hosted login view
Hosted login sequence diagram
SSO login View
SSO login sequence diagram
- Multi-factor Authentication (MFA) and Password Recovery - implemented with Amazon SES
Password recovery view
MFA email
MFA view
Password recovery sequence diagram
- Authorisation and Access Control Management
Access control management sequence diagram
Non-functional requirements
The non-functional requirements can be broken down into these key features
- Interoperability
- Resilience & Disaster Recovery
- Scalability
- Data Security
More on the requirements will be elaborated in the subsequent sections.
Proposed Budget
These are the services we used from AWS. The justification will be further elaborated in the next section (see Key Architectural Decisions).
AWS Elastic Container Service (ECS)
- ECS is used for authorisation and web application servers, with a minimum of 2 instance across 2 AZs with auto scaling up to 10 instances (each).
- 2-10 x Fargate 0.25vCPU 0.5GB at $11.25 each for authorisation servers and 2-10 x Fargate 1vCPU 2GB at $44.98 each, giving a total cost of $112.46.
AWS Application Load Balancer (ALB)
- 2 load balancers per region, used to distribute traffic across ECS instances.
- 2 x load balancers are $18.40 each coming to a total cost of $36.80.
AWS Aurora (MySQL)
- 1 Master with 1 Read Replica in a multi AZ setup.
- 0.5 min ACU with autoscaling at $85.61 each, with a total cost of $171.22.
AWS Cloudfront
- 2 Cloudfront for each region. 1 each for the web application and authorisation server. Assuming free tier suffice, total cost is $0.
AWS Web Application Firewall (WAF)
- 1 firewall per region, used to protect against malicious attackers. Assume 5 rules per firewall, gives a total cost of $20.
Route 53
- Reliable and cost effective DNS service to route end users to Internet applications. Assuming free tier suffice, cost is $0.
Total Cost = $362.26 USD per month.
Key Architectural Decisions
These are some key architectural decisions we made, and the possible alternatives we could have done.
Building our own authorisation server
- Rather than go with AWS Cognito, we chose to run and build our own server to better handle scalability issues with respect to price. The cost of running Cognito would drastically be higher compared to running on ECS when the number of active users per month goes over 70,000 users.
- Running our own server also allows for higher customization and flexibility during the authentication process.
Aurora vs DynamoDB
- The alternative to Aurora would be using DynamoDB. The auto scaling and auto failover feature of Aurora helps ensure high availability and makes it easy for developers to maintain. The auto scaling feature also helps keep costs low when there are low loads.
ECS vs EC2
- We opted to use ECS fargate (serverless compute solution) over EC2 or other serverless options such as Amplify or Lambda. Aside from the higher cost, Amplify also has certain restrictions such as the inability to control its AZ. Since our application is expected to handle up to 100 requests per second, the cost for Lambda could be quite expensive.
- ECS provides a good middle ground for us to control what we require, while leaving the handling of hardware to AWS. However, we had to handle the building of containers and deployment which can take up some time.
Cloudfront vs API Gateway
- We chose CloudFront as using API gateway would require additional work to map each endpoint, which we may be unable to do due to the lack of time. Since CloudFront also came with caching, endpoints such as JWKS and our static pages can also be cached, reducing calls to our internet facing Application Load Balancers.
Development View
The project was mainly conducted with Scrum and Agile principles. We conducted weekly meetings to maintain project alignment.
Aside from that, Continuous Integration / Continuous Deployment (CI/CD) processes were also implemented. The team utilized Github actions to automate testing, building, and deployment processes. Here is a simple view of our CI/CD pipeline.
Github Actions workflow
Infrastructure as Code (IAC) was also implemented via Terraform. Benefits include
- Reproducing infrastructure environments with identical steps to reduce inconsistencies between deployments
- Automates provisioning and management of resources
- Handles resource dependencies intelligently, creating or modifying resources in the correct order
Availability View
The application is deployed in a multi-AZ environment.
- ALB distributes traffic across ECS Fargate containers
- If a Fargate container fails, the ALB will automatically route traffic to another ECS Fargate container
- Amazon Aurora Replicates data across AZs
Architectural diagram and failover
ECS is scaled horizontally, with an Active-Active node configuration.
- Failure detection done with active heartbeat
- Failover using ALB
- Session State Storage - Client Sessions
Amazon Aurora is also scaled horizontally, with the same Active-Active node configuration.
Aside from that, the team also conducted Load Testing with Apache JMeter to ensure that the application can handle the required traffic as per the requirements. The results are as such.
JMeter Load Testing results
Screenshot of JMeter summary table results
Screenshot of JMeter summary report results
Security View
Here are the potential vulnerabilities / threat to our assets, and the possible mitigation controls.
Servers
- Threat - DDoS Attack / Exploit servers which may lead to server outages and unavailability of application
- Mitigation - Cloudfront uses WAF, automatically mitigates DDoS attacks at the network and application layers.
Data in transit
- Threat - MITM exploit between client and server, hijacking user credentials
- Mitigation - enable SSL and HTTPS for the entire application
Data at rest
- Threat - SQL Injection / exploit APIs in web server, leading to unauthorised access or tampering of data such as passwords or user records
- Mitigation - Block using WAF, hash user passwords, password validation, and disable public access to database
Personally Identifiable Information (PII)
Systems security
Performance View
Below are our strategies to address performance requirements of the project.
ECS Auto Scaling
- To cope with expected high traffic during specific times of the day, we rely on auto scaling by ECS to adjust the capacity
Route read requests to database replica
- To mitigate the computational workload on the primary database, we route read-only requests to the replica database
Caching in CloudFront
- Static content will be retrieved from the closest point of presence. This allows for lower latency when serving static content
Conclusion
This marks the end of the project -
Overall, this was one of the more challenging projects I have done in SMU. It was my first time directly working and deploying code onto the cloud with AWS, allowing me to learn how different services work with each other.
Definitely the most technical project by far, and I thoroughly enjoyed it! Shoutout to everyone in the team for being super helpful, easygoing, and patient with one another despite most of us bidding alone for the module.
Thanks for reading :-)