CS6650 Building Scalable Distributed Systems
This is conceptually pretty simple. We’re going to implement the doPost() and goGet() methods and write/read to a database respectively. As always, simple things are not that simple in this course!
You only need to make minor changes to your client from assignment 1 (part2). Unless of course, it has unnecessary synchronization - if you have low throughput in assignment 1, then revisit your design and implementation with your instructors. There will be a reason ;)
The required change is to print out the following after the test has finished, along with the other statistics:
number of successful requests
number of failed requests
We of course want to see the latter as zero, or maybe a handful at most for each test.
You need to deploy, design and implement a database that enables you to:
Persist new album information, including the image and JSON-supplied data, during the doPost() method
Retrieve album information by primary key in the doGet() method.
You are free to choose any database you like that gives you the nescessary safety gurantees (ie you can’t lose writes) and hopefully high performance. Obvious choices include:
AWS RDS (MySQL or PostGresSQL). You can initially deploy on a free tier instance to keep costs low
MongoDB: There are managed services you can use but costs/latencies may be prohibitive. Installing MongoDB on its own instance could be a straightforward starting point
DynamoDB: Easy to access, fast. Cheap? Depending on how you configure it. Be careful and see additional notes below.
Others - talk to us …
Bear in mind you have a balanced workload - 50% write and 50% read. This should inform your data model design.
Use the same three workloads for your client as assignment 1, and see what throughput you can achieve?
One free tier server for your servlets will probably get pretty busy, so you will want to introduce load balancing.
You can set up AWS Elastic Load Balancing using either Application or Network load balancers. Enable load balancing with 2 free tier EC2 instances and see what effect this has on your performance. Depending on your database, you may have to allocate connections to each server so that you don’t exceed maximum connections.
A tutorial here should help. Remember to create AWS templates for your instances.
Run your client against the load balanced servlets and see what effect it has on overall throughput.
Somewhere you will probably have a bottleneck that you can try to address - use available monotoring tools to find this. Then think about how to remove it, ie:
database bottleneck - increase capacity (e.g. bigger server, higher throughput configuration)
Servlet bottleneck - increase capacity (e.g. more load balanced free tier VMs, beefed up instances)
There’s a lot of variables here so do your best. See if you can increase the throughput for the 30 Thread group client configuration.
URL for your code repo
A short description of your data model (5 points) - Please state size of image used if not using the stock image, and also Database/File storage solution.
Output windows for the 3 client configuration tests run against a single server/DB (5 points)
Output windows for the 3 client configuration tests run against a two load balanced servers/DB (15 points)
Output window for optimized server configuration for client with 30 Thread Groups. Briefly describe what configuration changes you made and what % throughput improvement you achieved (15 points)
For 3, 4, 5. The output windows means output window similar to A1 - client part II which contains
A table for the results for each stage for 3, 4, 5, and a overall table for comparison of results across 3, 4, 5.
Other optional but highly recommended to have in your submission:
For those interested in DynamoDB
If you are interested in using DynamoDB, here are some hopefully useful resources:
DynamoDBPricing: Pay attention to the difference between “provisioned capacity mode” and“on-demand capacity mode”.As to how to set billing mode in code, check here