CS6650 Building Scalable Distributed Systems
This is part 1 of a series of assignments that will guide you through designing a scalable distributed systems running on AWS.
We’re going to implement a simple music service that stores data about albums. In this assignment we’ll just build the simple API and a client that we can use to test performance.
The API specification is built using OpenAPI 3.0, and the Swagger toolset. It’s here. You’ll need to implement the endpoints that are defined for POST and GET.
In this assignment, you are just implementing API stubs that:
You should implement the API as:
Test the API with POSTMAN or an equivalent HTTP testing tools. You should first test these APIs on servers running on your laptop (ie localhost). Here’s an image you can use for testing.
Once the servers are running locally, ssh them to an AWS free tier instance, running in Oregon, that you have installed Java/Tomcat v8/9.0 (not 10).
If you cross compile your Go server on your laptop (instructions in labs), you should just be able to transfer the executable and run it on AWS.
Now the fun starts. We want a client that can exert request loads on our servers so we can compare their performance when running on AWS.
First, just write a trivial client to test connectivity and get to grips with client APIs. See the Client API Implementation section below for some details/options.
Once your have the client API calls working (just like in POSTMAN), write a Java programs that:
Once all thread have completed, you programs should output in the command window:
If the client receives a 5XX response code (Web server error), or a 4XX response code (from your servlet), it should retry the request up to 5 times before counting it as a failed request. This probably means your server or network is down, or very overloaded.
Note testing on the College network might trigger firewall rules that block your requests. You may need to do proper load testing at home, or a coffee shop, or bar, depending on your state of mind ;)
Use your client to load test the Java and Go servers running on an AWS instance. You should run 3 test loads for each server, running the client on your laptop i.e.:
Plot the throughput for the tests and compare the Go and Java implemenation’s performance.
With your load generating client working wonderfully, we want to now instrument the client so we have deeper insights into the performance of the system.
To this end, for each (POST/GET) request:
Once all threads/thread groups are completed, calculate for both POST and GET:
The client should calculate these and display them in the output window in addition to the output from the previous step, and then cleanly terminate.
You want to do all the processing of latencies to generate the statistics in your client after the test completes.
Carefully compare wall times with identical test loads from Step 3. If you see more than a 5% degradation, you need look carefully at your implementation and make it more effiecnet, as you do not want to reduce the load exerted on the server.
Repeat Step 3 above, plotting results and comparing the Go and Java performance. The results should look similar to Step 3.
For each test include a command window image showing your client output and the calculated results for each test.
You need to plot a graph that shows the average throughput of requests for the period (wall time) of your tests. To do this, create a chart that has:
x-axis values: unit is seconds, from 0 to test wall time, with intervals of one second
y-axis values: unit is throughput/second, showing the number of requests completed in each second of the test
You can create the chart programmatically of just dump the values into a spreadsheet and go for your life! Just show one chart for the (threadGroupSize = 10, numThreadGroups = 30, delay = 2) test configuration for either server.
Submit your work to Canvas Assignment 1 as a pdf document. The document should contain:
the URL for your git repo. Make sure that the code for the client part 1 and part 2 are in seperate folders in your repo
a 1-2 page description of your client design. Include major classes, packages, relationships, whatever you need to convey concisely how your client works.
Client (Part 1) - A Plot showngthe throughput for the tests comparing teh two servers. This should also include a screen shot of your output window with your wall time and throughput for each of the 6 tests.
Client (Part 2) - run the client as per Part 1, showing the output window for each run with the specified performance statistics listed at the end, and a plot comparing the two servers.
The plot of your throughput over time for a single test
Server implementations working (10 points)
Client design description (5 points) - clarity of description, good design practies used
Client Part 1 - (10 points) - Output window showing best throughput. Somewhere around 2k/sec should be a minimum target at higher loads
Client Part 2 - (10 points) - 5 points for throughput within ~5% of Client Part 1. 5 points for calculations of mean/median/p99/max/throughput (as long as they are sensible).
Step 6: Plot of throughput over time (5 points)
First you need to get a Java client to call your server APIs. You can generate a client API from the Swagger specification page. Look at:
Export-Client SDK-Java options (web page, top right)
Unzip the client and follow the instructions in the README to incorporate the generated code in your client project.
The generated code contains classes and methods for calling the server APIs.
Write a simple test that calls the API before proceeding, to establish that you have connectivity. The examples in the README and documentation are your friends ;).
To connect to your remote server on EC2 you need to call an ApiClient methods (hint - setBasePath(…)) . It’s recommended to create an instance of ApiClient
per thread in a multithreaded environment to avoid any potential issues. You then pass this ApiClient object to your specific xxxApi object. Again, look at the constructors for xxxApi objects. (Note - you should never modify generated code - in this case you don’t need to!)
If you don’t want to figure out the Swagger client, you can use the Java 11 HTTP client classes or the Apache Java HTTP API. These are both pretty stright forward, especially with AI to help!
You need to modify your POM, add the following dependencies:
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-core</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>javax.activation</groupId>
<artifactId>activation</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
</dependency>
if you thinking of using the studserver generated from The Swagger Web page for the API.
Here is one way to make it running in the local.
First way, set up module import
In the go folder where router.go file resides, run go mod init command and go mod tidy command to create and update the go.mod file
In the folder where main.go file resides,
change the sw “./go” to something like sw “example.com/router”
run go mod edit -replace example.com/router=/go command. This will tell the go to look for folder inside /go when reference to “example.com/router”
run go mod tidy command
It is highly recommended that you go through the tutorial below to understand how to import other module from local machine.
https://go.dev/doc/tutorial/create-module
Alternative way, places all files in the same module (folder)
Move all the .go file under the /go folder into the same folder for the main.go file
change the package name of all go file to be the same as the package name of the main.go file
Run go mod init command and go mod tidy command
Run main.go
If you opt for this route, as you start writing more functions and structs in the same module, your folder might become very big with lots of file. Which may not be a tidy way to do things
Note the studserver generated doesnt contain any logics. You probably still need additional work to amend it to suit your need. Especially when you need to process the multipart file from the request.
And the studserver is not using gin framework, which may be slower.
The alternative is to code the server from scratch based on the API.
Sometimes there’s an issue building the GSON jar file into a servlet, such that when the servlet is deployed it fails because GSON is missing.
Try adding the gson jar to your project from the Maven website
For IntelliJ you also need to:
Compile and deploy … and cross your fingers and toes!!
For more details check this out
Worth a read for random number generation in your client ;)
Your client threads want to keep a connection open and send many requests. Check out this stackoverflow post to delve into the complexities of how to do it properly.
You can calculate approximate percentiles efficiently using histograms. Google around - there are lots of sites that explain the approach. I haven’t tried this implementation, but it sounds promising.