The objective of the Digital Services Programme was to deliver joined-up digital services that met Bristol City Council’s customers' needs and wants, with the ultimate aim of providing “digital services so good that people prefer to use them”.
The platform was based on an integrated set of technologies and systems using open data, open standards and open-source (where possible). Information had to be shared seamlessly whilst reducing overall cost.
A Java front end, running on a Java Portal server, with TIBCO middleware and linking to several internal and external applications, including Liferay, Salesforce and other third party infrastructure/applications, including Capita Payments.
To develop an open-source Performance Testing framework to initially test the performance of the system at the current stage in the project, and also to integrate it with the Agile development methodology in use, allowing regular performance tests to be run, aligned with the Continuous Integration deployment of code. We recommended JMeter as the open-source performance testing tool, alongside Jenkins’ Continuous Integration Server to deliver the assignment.
- Ensure the desired number of users could simultaneously access the platform
- Determine if existing infrastructure could accommodate the initial externally-facing modules for public access
- Ascertain if the system could handle anticipated levels of traffic AND deliver an acceptable level of performance
- Identify bottlenecks - the overall system under test included several constituent systems with multiple interfaces
- Deliver a robust and scalable performance testing framework into the Agile project delivery process
Step 1: Planning
We developed a comprehensive test plan to agree scope, transactions, user numbers, environments, testing objectives, entry and exit criteria, test data, reporting structure, defect management processes, schedule and contact details for all personnel on the project. This included obtaining agreement on all key scenarios to be tested as a part of the project, such as Unregistered Users Getting Resident and Visitor Parking Permits, Unregistered User Applying for Concessionary Travel Bus Pass, and Parking Renewals.
Test data volumes were then calculated and obtained - in this case, a list of addresses and postcodes. All of the scenarios were dependent on postcode and address data, which were listed in text files to be accessed by JMeter during the tests. Once the test plan had been agreed and the data had been provided, automated scripts were generated for all scenarios in JMeter, before being integrated into the overall Framework. In parallel, the test environment and load generators were set up and tested.
The final step was to ensure that Server Monitors were in place and working to identify any problems and pinpoint bottlenecks. JMeter performance monitors and shell scripts were put in place on key servers to gather the necessary statistics. Ant was integrated with JMeter to generate summary reports to obtain additional data like page hits, transactions and error codes. JMeter listeners were used and server monitoring was done by running scripts on the Linux server, and for Windows servers, perfmon counters were configured and run.
Step 2: Text Execution
A set of tests were run initially on the application, a subset of which could be run automatically overnight via the Jenkins server:
- Debug Test: 10 users over 15 minutes – a smoke test to ensure the scripts remain valid and that all interfaces were in place. Without this test running successfully, the other tests would not be executed.
- Normal Load, Parking Module: 50 users over 30 minutes – a mix of pre-registered and unregistered users carrying out normal user activities on the system, applying for and renewing parking permits.
- Normal Load, Concessionary Travel Module: 50 users over 30 minutes – a peak load of users requesting concessionary travel vouchers only, in order to check this module of the system individually.
- Normal Load, Parking Module: 50 users over 30 minutes – pre-registered users only applying for new and renewal parking permits. This scenario was developed to distinguish between the full activity of users registering from scratch and then applying for parking permits, with applying for parking permits alone, using pre-registered accounts.
- Peak Load, All Scenarios: 200 Users, 1 hour – All scenarios combined together and tested simultaneously, in order to attain a real world scenario of system use at peak volumes.
- Soak Test, All Scenarios: 120 users, 2 hours – All scenarios combined together at a realistic volume and run over a longer period in order to identify any issues that occur over time, particularly memory leaks.
What happened next?
Initial tests generated a lot of 500 errors and slow transaction times when requesting permits or travel vouchers, even at relatively low levels of load. Configuration changes were initially made to the Parking Module and performance did improve, before the Concessionary Travel (CT) module was integrated into the system. On implementation, the CT module performed well at normal levels of use, giving the team confidence that the integration could proceed. Once this element was proven in isolation, load scenarios on both the Parking and CT modules could be executed. Initial tests again generated 500 errors, although a lower number, which indicated a bottleneck with a third party component, the postcode lookup.
These problems continued into the Stress Tests, with the initial tests being ramped up to 300 concurrent users, which also resulted in a number of both 500 and subsequently 503 errors. At this point, the decision was taken to scale down the stress test to 200 users, as this was in line with the expected level of users who would be accessing the system once made live.
The 503 errors were as a result of a proxy server going into ‘protective mode’ once the number of 500 errors increased, in order to protect downstream services. Changes to the proxy server were made at this point, to reduce these errors and streamline the process, allowing slightly longer response times for transactions including postcode and vehicle lookup. The Soak Test was then run, which initially did indicate a high but successful average and maximum response time for key transactions.
After each test an Interim Test report was generated, which included details of transaction summaries, server stats, transactions per second, hits per second and graphs, with a summary of the test result and any observations/recommendations. Once all the tests were completed, a final report was produced, summarising the performance test phase as a whole.
Jenkins was configured to run the JMeter file and was scheduled to run every day at 23:00 for the selected scenario, which was usually based on a low number of users for 13-15 minutes as a performance smoke test. If any issues were identified by the overnight test, it was then possible the next morning to run additional scenarios to further isolate the problems so fixes could be put in place.
Once the tests were complete, Bristol City Council took ownership of the automated scripts and scenarios used in the testing, as well as the integration with Jenkins in the development environment. The tests are kept as a re-usable resource so they can be re-run when necessary, for example if a change is made to the environment.
The implementation of JMeter and Jenkins at Bristol City Council allowed a series of tests to be built and run against an evolving platform to provide the Council’s key services online. The tests were structured to gradually prove each element of the system individually, at increasing volumes, before modules were integrated together and the performance of the solution as a whole could be measured. Enhancements and configuration changes were made along the way, so that the Council could be confident that the new platform would stand up to the expected load, at normal and peak volumes.