Expanding a Software Tester’s Toolkit with Applied AI
Artificial intelligence is no longer a distant promise for the future — it’s actively reshaping business and industry today. Few areas feel this impact more than software testing, where the demand for speed, precision and innovation grows daily. For today’s testers, embracing AI isn’t just an opportunity; it’s a critical step to stay competitive in the ever-evolving world of quality assurance.
At ICS, we’ve made AI integration a strategic priority, and leadership has created an environment where exploring the latest AI solutions isn’t just encouraged — it’s an exciting shared mission. As the company moves forward with AI adoption, I was inspired to get involved and look for ways to enhance how we handle visual software testing. In this blog, I share my experience as a software tester building my first AI-powered screen-comparison agent, highlighting how curiosity and a willingness to experiment can lead to meaningful improvements in testing processes.
Building a Foundation in AI
My journey into AI was a blend of excitement and a bit of uncertainty. I wasn’t quite sure where to begin, but I was eager to unlock the possibilities it holds. A knowledge-sharing session at ICS gave me valuable insights and the guidance I needed to take the first steps. From there, I dove into beginner-friendly online courses, with DeepLearning.AI’s ‘AI for Everyone’ by Andrew Ng a key resource that helped me truly grasp the fundamentals.
Armed with this new understanding and a growing fascination, I was ready to find a small, manageable project to apply what I’d learned and start turning theory into practice.
Identifying the Perfect Use Case: Visual Testing with AI
My thoughts turned to a recent project that involved visual testing, which consisted of evaluating the visible output of an application and comparing that output against the results expected by design.
Ensuring that an app or website matches its original design mockups is a common assignment for software testers, but one that can be challenging. Minor inconsistencies — shifted buttons, subtle pixel differences or misaligned text — can slip through manual review, resulting in unwanted “design drift.” This is a typical pain point for testers relying on laborious comparisons of design files with implemented screens.
This seemed like a perfect opportunity to put my newly acquired AI skills to work and see how AI could make visual comparisons faster and more accurate.
My exploration began by taking two images, one representing the original design and the other showing the implemented screen, and uploading them into ChatGPT. I was curious to see how well the model could analyze visual data and identify differences between the two. The results were striking. It became clear to me how much potential AI holds for tackling tasks that often challenge manual reviewers.
Building an AI Agent with Python and Phidata
The natural progression was for me to automate the screen comparison test using a Python script that could connect to a LLM model and do the same task. My research identified a lot of Python AI frameworks capable of integrating with various LLMs.
The one that seemed to best fit my requirements was Phidata, an open source framework designed to help developers easily build AI agents using models like OpenAI’s GPT-4, Claude or open-source LLMs while keeping the coding overhead minimal through a ‘low code’ approach.
Before I could get started, I had to address a few pre-requisites:
- Python 3.8+
- Phidata library (installed via pip)
- Python-dotenv (for managing environment variables)
- OpenAI API credentials (stored in a .env file)
To access a LLM, I had to register for an OpenAI account and generate a key, which was securely stored using the .env setup. Development was done using Visual Studio Code as IDE.
With my environment ready, I moved onto the next step: create an AI agent in Python. My agent was designed to accept the following parameters:
- LLM Model
- Prompt(instructions of how to compare the images)
- Image files(the screens to compare)
Once initiated, the agent connects to the selected LLM and passes along the images and the prompt. The LLM then analyzes the provided visuals in the context of the prompt, performs the comparison and returns the result in a clear, human-readable format.
After ensuring the basic image comparison agent worked as expected, the script was extended to support multiple files by dynamically picking the input files from a folder and comparing it in a sequential manner.
This not only automates previously manual review steps, but leverages AI to deliver far more nuanced assessments, such as detecting subtle UI differences or inconsistencies that often escape the naked eye. One major advantage was how much faster visual testing became, which proved to me the value of this solution.
Embracing AI in Quality Assurance
This journey into AI has been eye-opening, showing me firsthand how powerful and practical AI can be in transforming the daily work of software testers. What started as a simple experiment quickly turned a routine challenge into an opportunity to learn, innovate and streamline our processes.
The key takeaway? You don’t need to be an expert to begin — starting small and staying curious can lead to remarkable breakthroughs. By embracing AI thoughtfully and iteratively, software testers can unlock new efficiencies, greater accuracy and deeper insights, setting the stage for continuous improvement and future-ready quality assurance.
For more on AI agents, read our blog Building a Device Driver From Scratch – with an AI Wingman.