ChatGPT and Software Testing: Understanding Context

Friday, 15 September 2023

The rise of ChatGPT is a subject of much discussion in the software testing industry, with a myriad of articles on the topic. We use this blog to take stock of current thinking, plus take a look at two practical applications of the tool in the context of testing – documentation and test automation.

There are voices of concern that imagine a future where AI-driven tools replace human testers, leading to an industry dominated by machines. Yet, a more pragmatic consensus is emerging, placing AI not as a replacement but as a collaborator. While ChatGPT and other similar models are undeniably advanced, they cannot replicate the creativity, intuition, and curiosity that a seasoned tester brings to their projects. Instead, it serves as an enhancer, fine-tuning processes and providing insights. In parallel, improvements are offered by the toolset to the development process, offering a levelling up of more junior and mid-range developers, with the possibility of code review, suggestion and improvement from AI models, which should add further benefit and improvement to overall software quality.

One element important to note, a theme throughout the article, is the importance of context - the key to truly amplifying ChatGPT's utility in many areas, but especially in testing. Context can be a set of user requirements, the number of users by business process expected to access an an app at peak, application versions, details of test environments, defect logs, HTML, field identifiers, expected outcomes and more. At a granular unit level, context might encompass the specifications of a module, expected behaviour, or even integration touchpoints, enabling ChatGPT to draft an outline for automation scripts. By feeding it richer information, we open doors to more informed, precise, and valuable AI-driven insights in software testing. In this blog, we focus on the personas of the test manager and test analyst, with one example use of the technology for each, to highlight the importance of setting context, with some ideas on the sort of prompts that can be used to quickly generate benefit for both groups.

Test Managers: Strategy Design

A foundational aspect of a Test Manager's role is to design and implement test strategies catering to specific projects, programmes or even entire organisational units. By providing ChatGPT with explicit project details and requirements, a Test Manager can get a head start with documentation. For instance, if a Test Manager is responsible for a major project, the context can be shared with ChatGPT. The AI, in turn, can suggest:

  • Key focus areas based on common pitfalls and challenges of such migrations
  • Recommendations on testing tools and environments
  • Suggestions for phased testing, given the context
  • A useful outline format for the strategy that can be expanded on

This is worth trying - as it will show that much of the initial work can be sped up. It will get much better too, with the input of more context.

Getting the AI to ask questions about the details of the implementation can hugely improve the quality of the output. Doing this prompts ChatGPT to ask for additional information, which the test manager should already have to hand, or they can go and fetch it. For example;


There are lots of other ways ChatGPT can help with test management, including:

  • Resource planning
  • Historical data analysis for test outcomes and defect summaries to predict likely hotspots for future defects
  • Risk analysis and risk-based testing
  • Test planning
  • Developing matrices summaries of defect data for test completion reports


Test Engineers & QA Analysts: Test Automation

Test automation is a great example of the power of ChatGPT and how it can help today’s test teams. There are a few interesting use cases with AI and test automation which are streaming into improvements in testing tools, namely self-healing regression packs and object recognition, to name but two, but one of the most compelling for large language models is to simplify the creation of automated tests, using natural language which can then be turned into the coding framework and language of choice. This also opens up the possibility of less technical testers being able to automate more quickly, using multiple languages and allowing easier portability between web and mobile, for example.

For ChatGPT to generate scripts that interact with UI elements, it needs to understand the structure and unique identifiers associated with each component - returning to our context theme. Good context ensures that generated scripts correctly locate and interact with the correct parts of the application. We will look at an example below of testing a UI, although testing APIs is also possible and highly beneficial, which we will cover in a dedicated article.



When testing a GUI, the context can be provided up front, in the form of filenames, data types, element IDs etc., or the HTML from a webpage can be entered directly, which is surprisingly effective.

Something like the prompt below can be used to provide context so that ChatGPT can prompt for the information it needs:

Prompt: “acting as a test automation specialist, I want you to provide test automation code using Selenium with Python for a login screen for an online membership portal:

Element Description: "Username input field", Identifier:- ID: "username_input"

Element Description: "Password input field", Identifier:- ID: "password_input"

usernames to be tested are user1, user2, user3 and passwords should be up to 12 characters long and will be provided separately.

Assume Selenium WebDriver is already installed, set up with all necessary modules. Please suggest any improvements to my prompt and ask any questions that will improve the quality of your output.”

This resulted in ChatGPT requesting clarification on the following points:

  1. URL of the Login Page: We need to know the URL where the login page is hosted to navigate to it.
  2. Expected Outcome: Once we input the username and password, what's the expected outcome? For instance, is there a success message we should verify, or should we check if we have navigated to a certain page?
  3. Passwords for Testing: You've mentioned the usernames to be tested, but the exact passwords must be provided. If they are sensitive, you can use placeholders in the example and then replace them in your actual tests.
  4. Browser Preference: Which browser (Chrome, Firefox, etc.) would you like to use for this testing? For this example, I'll assume Chrome.

Inputting information for these points allowed the generation of a Python automation script as a good start - obviously, it would also be easy to generate this in multiple different languages/automation frameworks, using the same prompt. Again following instructions to the letter, ChatGPT also suggested a number of ways I could have improved this prompt, mainly being as specific as possible, especially with expected outcomes.  An excerpt of the output is provided below:


Summary of Setting Context and Interaction Limits with ChatGPT

Every software application exists within a specific context – be it the user base, its functional domain, or the operating environment. As we have seen, when leveraging ChatGPT for software testing, the context in which a query is posed, or a task is assigned is crucial. It determines the accuracy, relevance, and efficiency of the response. Often, to get the most out of interactions with the AI, it will be necessary to go back and forth, request clarifications and modify responses so that the requester gets the output they need at the end of it. It is thought, important to understand that there are limits to the interaction and how these work.

Each interaction with ChatGPT is measured in terms of tokens, which accrue towards a limit in each interaction. This includes both text in and text out; a token is based on an amount of text and characters; a rough rule of thumb is that every four letters of text is a token. The current free version of ChatGPT is limited to 4096 tokens, while the paid version is double this, at 8192. It is important to note that beyond these limits, the responses will slow down and become noticeably less accurate and slightly random, so it is worth being aware of these limitations and adopting a strategy to deal with them. Firstly, the tokenizer is a good way to get more of an understanding of how this works. This is available from the OpenAI website and can be found at the following location:

So if you are planning to dump a large amount of text into the tool for analysis, something like, say, a set of user requirements or HTML from a webpage, and you are hoping to generate some test cases, you can run into these limits very quickly and not achieve anything meaningful. A better approach is to segment the information and provide subsets for individual queries, allowing the user the opportunity for further interaction and refinement of the output, to better achieve their objectives. For example, high-level context could be set in a more concise way, followed by requirements statements by module, which can then be joined together by the user. At the moment, this will never be a completely reliable set of data to be used in a test execution phase, but it is a very useful start and can be used as the basis for further edits, potentially saving a significant amount of time.



In summary, we have explored just two of how ChatGPT can help in software testing; there are many more. The importance of context is clear, as well as the way we interact with the model.

Prompt engineering is an important skill/art; understanding how queries and interactions can be adjusted to suit the use case is a valuable skill that should be in every software tester’s toolkit. Engineering the best prompts to assist with testing is a larger, rapidly evolving topic that we will all benefit from investing some time in.


Jonathan Binks - Head of Delivery
Prolifics Testing UK

Scroll to top