2024 DevOps Lifecycle: Share your expertise on CI/CD, deployment metrics, tech debt, and more for our Feb. Trend Report (+ enter a raffle!).
Kubernetes in the Enterprise: Join our Virtual Roundtable as we dive into Kubernetes over the past year, core usages, and emerging trends.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
Production-Like Testing Environments in Software Development
Demystifying Static Mocking With Mockito
Today, websites and applications are vital tools for communication, commerce, education, and more. Ensuring that these digital platforms are accessible to everyone, regardless of disability or impairment, is both a social responsibility and a business imperative for companies in the digital economy. Web accessibility and accessibility testing are a cornerstone, ensuring that everyone, including individuals with disabilities, can effectively perceive, understand, navigate, and interact with digital content. However, achieving comprehensive web accessibility is not without its challenges. In this article, we explore accessibility testing, why it’s essential, its challenges for companies and digital product teams, and the benefits of incorporating web accessibility testing into your organization’s application development lifecycle. Web Accessibility Challenges Let’s delve into some common challenges organizations face in pursuing web accessibility. Understanding and overcoming these challenges is crucial to incorporating web accessibility into your application design, development, and testing processes. Lack of Awareness: Many organizations and web developers may not fully grasp the importance of web accessibility or comprehend its far-reaching implications, leading to a lack of prioritization and resources. Inadequate Training: Developers and content creators often lack sufficient training in accessible design practices, hindering effective implementation. Complexity of Guidelines: The Web Content Accessibility Guidelines (WCAG) can be intricate and challenging to interpret, especially for those not well-versed in the technical aspects of web development. Rapid Technological Advancements: The fast-paced nature of web technology development introduces new features and technologies that may not inherently prioritize accessibility. Legacy Systems and Content: Older websites or applications may not have been designed with accessibility in mind, necessitating complex and resource-intensive retrofitting efforts. Budget Constraints: Some organizations perceive web accessibility implementation as an added cost, leading to resistance to allocating resources for necessary changes. Inaccessible Third-Party Content: Websites often incorporate third-party content, such as plugins or widgets, which may lack accessibility considerations, contributing to overall inaccessibility. Mobile Accessibility: Ensuring accessibility across various devices, especially mobile devices, presents challenges due to differences in screen sizes and interaction methods. Testing and Monitoring: Regular testing and monitoring for accessibility compliance may be overlooked, allowing issues to go unnoticed or unaddressed. Legal and Compliance Issues: The increasing awareness of legal requirements for web accessibility has resulted in a surge of lawsuits. Organizations may face legal consequences if their digital assets are not accessible. User Diversity: Web users exhibit diverse needs and abilities. Meeting the needs of all users, including those with disabilities, poses a continuous and evolving challenge. Despite these challenges, organizations can prioritize and overcome them through comprehensive accessibility testing. Accessibility testing ensures that digital content is designed to be inclusive, allowing individuals with disabilities to access and use websites, applications, and other digital platforms effectively. What Is Accessibility Testing? Accessibility testing evaluates digital content (websites, web applications, mobile apps, documents, and more) to determine accessibility and usability for individuals with disabilities. It ensures that people with disabilities can effectively perceive, understand, navigate, and interact with digital content. Accessibility testing focuses on several aspects, including: Visual Impairment: Ensuring content is readable by screen readers and presented with proper text alternatives for images and graphic elements. Hearing Impairment: Providing captions or transcripts for multimedia content and ensuring that audio content is accessible. Motor Impairment: Ensuring digital interfaces are operable through keyboard commands or assistive technologies for individuals with limited dexterity. Cognitive Disabilities: Creating content that is easy to understand and navigate for users with mental challenges. Adaptability: Ensuring content is responsive and adaptable to various assistive devices or user preferences. Accessibility testing typically includes manual and automated testing using tools and software to evaluate compliance with established accessibility standards like the Web Content Accessibility Guidelines (WCAG) maintained by the World Wide Web Consortium (W3C). Founding Principles of Accessibility Testing The four principles that form the basis of the accessibility methodology are Perceivable, Operable, Understandable, and Robust (POUR), as defined by the W3C. Refer to the below diagram for more details: Four principles of accessibility methodology as defined by W3C Why Is Accessibility Testing More Important Than Ever? The potential business impact of accessibility testing can’t be overstated. According to The Drum, “65% of consumers with a disability have abandoned a purchase due to poor accessibility. According to an evaluation of over a million websites carried out by accessibility specialists WebAim, 98% don’t meet basic accessibility needs.“ Shockingly, companies with non-compliant websites are anticipated to lose $6.9 billion annually to competitors with compliant sites. These figures underscore the crucial need for accessibility testing, not just as a legal requirement but as a strategic imperative for organizations aiming to maximize their reach, user engagement, and financial performance. Most common WCAG failures for websites Benefits of Accessibility Testing By prioritizing accessibility testing, organizations comply with legal standards and unlock opportunities for broader audience reach, improved brand reputation, and enhanced overall user experience. Below are some of the benefits of accessibility testing: Inclusive User Experience: Accessibility testing ensures that digital content is designed to be inclusive, allowing individuals with disabilities to access and use websites, applications, and other digital platforms. Legal Compliance: It also helps organizations comply with laws and regulations related to digital accessibility, reducing the risk of legal challenges and penalties. Enhanced Brand Reputation: Prioritizing accessibility contributes to a positive brand image, demonstrating an organization’s commitment to inclusivity and social responsibility. Expanded Audience Reach: Accessible digital content reaches a broader audience, including individuals with disabilities, older adults, and users with diverse devices or assistive technologies. Improved SEO Performance: Accessibility standards and guidelines align with UX design and SEO best practices, improving digital experiences, making digital content more discoverable, and enhancing search engine rankings.
If you’ve noticed that your development costs are higher than you initially anticipated or it takes longer to release new products, it may very well be a sign that your testing cycle needs optimization. The Software Testing Life Cycle (STLC) is a crucial component of modern development processes, and optimizing it can help address a number of issues, from lowering the cost of software development and speeding up time to market to enhancing the overall quality of your product. If this is something you’ve been thinking about, don’t look further. In this article, we’ll discuss the key stages of software testing, methodologies, best optimization methods, and strategies that can help you streamline your test life cycle. Let’s dive into it! What Is Software Testing Life Cycle (STLC)? The software testing life cycle (STLC) is a sequence of verification and validation activities carried out in the course of software development to ensure that the software under test is functioning properly and aligns with the requirements set out before the development team. This process is iterative and collaborative and is essential to the development and release of high-quality software. During the life cycle, QA engineers may use a variety of tests alongside each other, including: Unit testing; Regression testing; Exploratory testing, Parallel testing; Performance testing; Automation testing and more. STLC is adaptable and flexible, which means testers can choose and combine these testing methods depending on the requirements and goals of the project. Ultimately, this allows companies to maintain a high level of software quality while efficiently managing development and testing resources. STLC vs. SDLC: What Is the Difference? Although these terms may seem very similar, they are not. The Software Development Life Cycle (simply abbreviated as SDLC) is a systematic approach to software development that includes all the phases and activities required to build a software product. On the other hand, the software testing lifecycle is a subset of the SDLC process. It focuses specifically on testing a software product and accompanies the development of software throughout all of its stages. Below, we outline the main differences between these two processes so you can better understand how they are approached. In general, while SDLC is a comprehensive software development process, STLC is responsible for creating a test plan to evaluate and ensure software quality using various testing tools. Both processes are important for delivering successful software, but they serve different purposes. Entry and Exit Criteria Most likely, you’ve heard about entry and exit criteria in STLC. However, since some people confuse these terms, we’re going to quickly explain them. In essence, everything is as simple as it sounds. Entry criteria are certain conditions that must be met before entering a specific testing phase, while exit criteria define the conditions that must be satisfied for a testing phase to conclude. For example, entry criteria may include identifying requirements for testing, setting up the required test environments, and having a documented test plan. Conversely, exit criteria require certain actions to signal the completion of the testing phase, which could be defect reports, updates on test results, certain test metrics, etc. Ideally, one phase should replace another one only when the previous one comes to an end, but in the imperfect world that we live in, this doesn’t always happen. Regardless, understanding these checkpoints is important for maintaining testing structure and effectively managing test risks. Further down, we’ll take a look at the important phases of testing, along with their entry and exit criteria. The 6 Phases of the Testing It’s time to finally dive deep into the phases of the testing. In general, there are six phases of the STLC life cycle. However, this number may vary depending on the chosen methodology (we’ll get there later in the article). Also, it may vary based on the complexity of the project itself. For example, if your website was created solely for the purpose of marketing activities, it may not need to go through all the testing life cycle phases. In most cases, it will be enough to test it only partially. On the other hand, if it’s a banking app we’re talking about, thorough testing is a must. So, here are the key phases of a typical STLC: 1. Requirement Analysis The first step in the software testing life cycle is requirement analysis. During this stage, QA engineers closely collaborate with their own department as well as other key members of the team. The goal of this collaboration is to study business objectives, features to be designed and supported, and stakeholder requirements, including functional and non-functional specifications. In case there are certain questions or doubts left, they may seek clarification from the tech and business specialists of the project. Entry Criteria: Defining the types of tests to be performed; Choosing test environments; Gathering information about the development and testing priorities; Preparing the RTM (Requirement Traceability Matrix) document for the project; Carrying out a feasibility analysis for test automation (in case the QA team decides to automate certain tests). Exit Criteria: Creation of the RTM and test strategy; Approval of the test automation feasibility report. 2. Test Planning In the second STLC phase, the QA team assesses the resources and effort needed to carry out the project based on the data collected and processed in the requirement analysis phase. The key objective of this phase is to provide the project team with documentation outlining the organization, approach, and execution of testing throughout the project, including the testing schedule and possible test limitations. Entry Criteria: Defining requirements and scope of the project; Developing a test strategy; Determining roles and responsibilities; Preparing hardware and software requirements for the test environment; Preparing documentation necessary to launch the project. Exit Criteria: Finalizing allocation of resources and test plan document; Approval of the test strategy document and test plan. 3. Test Case Development With a solid test plan and strategy in place, the team can move on to the next step — the design and development process. This phase involves the creation, verification, and rework of test cases and test automation scripts based on the data from the test plan. The team is also preparing test data to flesh out the details of the structured tests they will run. All test cases and scripts created in this phase will be continuously maintained and updated to test new and existing features. Entry Criteria: Creating test cases and test automation scripts (in case the QA team decides to automate certain tests); Reviewing and writing test cases and test automation scripts; Creating test data. Exit Criteria: Developing and approving test cases, test data, and test automation scripts; Finalizing test design document. 4. Test Environment Setup The purpose of the test environment stage is: To provide the QA team with a setting where they can exercise new and changed code provided by the development team; To locate possible faults and errors; To contact the responsible developer, providing them with a detailed test report. Different types of testing require distinct types of test environments, and the choice of testing methods is directly linked to their complexity. In some cases, tests can be run sequentially, while in others, they may need to be run in parallel (some or all at once). When setting up a test environment, the QA team considers a whole range of parameters, such as hardware, software, frameworks, test data, and network configurations, to name a few. These parameters are then adjusted depending on a particular test case. Entry Criteria: Setting up the test environment; Trying it out by conducting a series of smoke tests. Exit Criteria: The test environment is all set up and ready to go. 5. Test Execution The next part of the STLC process is the testing itself. At this stage, the QA team executes all of the test cases and test automation scripts they have prepared in test environments. The software testing process includes all kinds of functional and non-functional tests, during which software testers identify bugs and provide detailed testing reports to the project team. After developers make the necessary fixes, the QA team runs a series of retests to make sure that all detected defects are fixed. Entry Criteria: Carrying out test cases based on the testing strategy documents; Recording test results and metrics; Re-testing fixes provided by the development team; Tracking every log defect and error until they are resolved. Exit Criteria: Detailing testing reports; Updating test results; Completing the RTM with execution status. 6. Test Cycle Closure The final stage of the software testing life cycle phases involves several test activities, such as collecting test metrics and completing test reports. The QA team summarizes the results of their work in a test closure report, providing data on the types of testing performed, processes followed, the number of test cycles carried out, etc. This document ends the STLC life cycle. Entry Criteria: Assessing the cycle completion; Preparing test metrics; Preparing a detailed test closure report. Exit Criteria: Preparation and approval of the test closure report. From this point on, the project team strategizes for the application’s support and release. This includes analyzing all the testing artifacts and building a test strategy for the application’s further growth and expansion. Software Development Methodologies and the STLC Life Cycle As we’ve mentioned earlier, STLC may have various phases of testing based on the methodology used. Let’s talk about it in detail by taking a closer look at the two most popular models — Waterfall and agile. Waterfall Methodology The Waterfall model is the oldest and most popular methodology used. You’d be surprised, but even now, over 56% of companies follow this model to create software products, which shows that it’s still widely adopted across a vast number of projects. The beauty of the Waterfall model is its simplicity and linear approach. Every phase here strictly follows one after the other, providing a great level of predictability. However, the flip side of the coin is that it is rather difficult to go back if any issues are discovered at later stages. Therefore, it’s best suited for short-term projects with well-defined requirements. The typical software testing life cycle in this model consists of the following phases: 1. Requirement analysis. In the Waterfall model, the testing phase begins after the development phase is completed. The testing team is focused entirely on gathering and analyzing the requirements to ensure they are clear, complete, and testable. 2. System design. During the next stage, QA engineers work on creating detailed test design documents and test cases that meet the specifications of the software design. 3. Implementation. Further on, the team works on refining and finalizing test cases, which is an important step for the next stage where the code is completed and it’s time to test the product. 4. Testing. This phase includes unit testing, integration testing, system testing, and user acceptance testing, each of which verifies different aspects of the software quality. 5. Deployment. Once testing is successful, the software is deployed into the production environment. 6. Maintenance. The last stage is maintenance. This is an ongoing process during which developers deal with any post-production issues or necessary enhancements. The QA team may need to retest the software a few times until the detected issues are resolved. Agile Methodology Agile has emerged as an alternative to the Waterfall model. It addresses its limitations, offering a flexible and iterative approach to software development. According to surveys, at least 71% of businesses in the USA are adopting agile, while 29% of organizations have already been using this methodology for 1-2 years. Agile offers a lot of advantages that have played a crucial role in its widespread adoption. The most important of them is the ability to quickly respond to market changes and fix bugs on the spot, eliminating costly rework at the last stage of an SDLC. Besides, since agile encourages incremental development, it significantly accelerates time-to-market, allowing companies to gain a competitive advantage in the market. Let’s look at the testing procedure in agile development: 1. Planning. Like any testing, agile testing begins with planning. However, unlike the Waterfall model, in agile, the planning is more dynamic and iterative. The team prepares test plans in sprints, collaboratively defining user stories or features to be tested in the upcoming iteration. 2. Test design. Test design in agile is carried out simultaneously with development. The testing team creates test cases and acceptance criteria as features and user stories are being developed. 3. Testing. Various types of testing, such as usability testing, exploratory testing, and regression testing, are executed throughout the STLC in agile to ensure software quality. Each of them has its own purpose and is executed in different testing scenarios. 4. Deployment. Agile projects often employ CI/CD practices, allowing for automated testing and quick deployment of new features. This minimizes the risk of defects at later stages and speeds up product launches. 5. Review. The next step is the review stage, where the QA team evaluates the results and designs strategies to improve the development and testing processes. 6. Launch. Finally, testers and developers plan for the release of the product. They decide which features and user stories are going to be included in the release and what necessary testing activities must be completed to ensure the product meets the requirements. To summarize, the distinct difference between the Waterfall and agile models is that in agile, testing isn’t a separate phase but an integral part of the development process that is performed continuously throughout the project, right up to the launch. Conversely, in the Waterfall model, testing is run as a sequential phase that begins only after development is complete. Besides that, the testing life cycle follows the same flow. Best Practices To Improve the Software Test Cycle Now that we’ve covered the essential phases of testing and their activities and deliverables, as well as the specifics related to the software methodology used, it’s time to move on to the best practices that can help you optimize your software test cycle. By optimizing your testing stages, you can achieve significant improvements in your flow, from reduced time-to-market and quicker launches in upcoming releases to improved software quality overall. Start With a Testing Strategy One of the first steps to achieving a successful STCL is defining a testing strategy. The strategy should outline the scope of testing, budgets available, testing deadlines, and testing objectives. To help you create a well-defined test strategy, consult with all stakeholders, developers, and QA engineers on the team. “Focus on the requirement analysis phase. A lot of defects and logical errors can be detected at this step, eliminating them from slipping into further stages of testing.” Mykhailo Tomara, QA Team Lead Develop Test Plans Once the strategy is formed, the next step is to create a test plan. Unlike the strategy, test plans are live documents that should be regularly reviewed and updated. Typically, a test plan covers: Project description; Test strategy and approach; Features/parts in scope; Features/parts out of scope; Test environment description; Specific testing activities; Resources; Schedules; Deliverables; Team roles and responsibilities; Pass and fail criteria. During this stage, it’s also necessary to specify and configure test environments and devices and define which tools will be used. Most often, teams use testing looks like Katalon, Selenium, or Studio, depending on the specific project’s requirements. When it comes to test cases, it’s highly recommended that the team covers not only expected scenarios but also edge cases to ensure extensive coverage. In addition, testing conditions should be as close to real life as possible. This will give you confidence that the product is ready to be released to the public. Prepare Test Cases Test cases are an important part of the testing process that helps certify your software product. They act as a checkpoint to ensure that your product meets the set standards and quality. Therefore, it’s important to write them with attention to detail. To improve your test case development phases, start by identifying the purpose of testing and user requirements. Testers must have a clear understanding of why the product is being developed in the first place and what features it must have to deliver to customer expectations. It’s important to write test cases on time. The best time is in the early stages of testing, during either the requirement analysis phase or the test design phase. It is at this point that QA engineers can evaluate whether the test cases meet the requirements and make adjustments quickly. For test cases to be effective, avoid overcomplicating things. Each case should be easy to understand and execute, with a single, clearly defined expected result. This approach not only makes it easier for testers to evaluate software performance but also leaves no ambiguity. “It’s important to update test coverage regularly. Some testing cases may need to be added, while others may become irrelevant over time. Additionally, consider cross-reviewing test cases and checklists with different QA, if possible. This approach will help identify blind spots, uncover new insights, and ensure a more effective testing process.” Mykhailo Tomara, QA Team Lead Incorporate a Shift-Left Approach Shift-left testing is one of the recent trends in software development. This approach emphasizes early and continuous testing throughout the development cycle, allowing for early detection of bugs. As a result, if there are any serious issues found, they can be fixed in the initial stages of the software development process rather than choosing to wait for the last phase, where the cost of fixing bugs multiplies tenfold. The strength of the shift-left approach is that it focuses on problem prevention rather than fixing bugs. By encouraging testers to actively participate in testing closer to the beginning of the development pipeline and executing smaller tests more often it allows teams to quickly gather the initial feedback and take proactive steps to improve their test plans. Shift-left testing doesn’t always mean executing tests early in development, though. Quite often, it means involving testers in discussions with key business users so they can figure out the requirements from a testing perspective and ensure they know what to look for when coding begins. Conduct Formal Technical Reviews To minimize bugs and defects at later stages of software development, it’s a good practice to regularly conduct formal technical reviews (FTRs). The idea behind FTRs is to test a product when it reaches a mature state while remaining at an early stage to prevent major errors. Participants are typically assigned the roles of reviewers, producers, and speakers. In the end, they all draw a final report that outlines the results of the meeting, including what was reviewed, who took part in the review, and what decisions were made. Introduce Code Quality Metrics You can improve the quality of your software testing by implementing code quality metrics to help your team track success. These metrics can be any indicator that best fits your workflow and allows you to effectively assess code quality. Here’s an example of metrics that can be employed for developing software: Reliability. This metric can describe the number of times the code failed and passed during tests. Security. Code security can refer to the code resistance against potential vulnerabilities and threats. Maintainability. You can measure code maintainability by evaluating the number of lines. In general, the more lines it has, the harder it is to adapt it to new requirements. Testability. This metric can be used to outline testing technologies used to test the product, the documentation attached, and the ease with which new test cases can be added and executed. Performance. Your code’s ability to respond and execute actions in a certain interval of time can help you measure its performance efficiency. Usability. Usability can be verified through exploratory testing and measured by satisfaction levels. Implement Automation According to 35% of surveyed companies, manual testing takes up most of the time within the STLC. To address this challenge and enhance the efficiency of your testing processes, it’s crucial to implement automation. Automated testing allows you to execute tests in parallel, significantly speeding up the time of testing and improving test coverage. It also reduces the chance of human errors, contributing to the accuracy of the results. Examples of cases where automation can be particularly beneficial include: Regression testing; Cross-browser testing; Complex, multi-step workflows; Load and performance testing. Create Comfortable Work Conditions for the Team It goes without saying that in order for the team to be high-performing, people should have comfortable work conditions and know exactly what they’re expected to do. With this in mind, it’s important to assign roles and responsibilities to the QA team during the planning stage. Typically, there are three roles: QA lead, Manual engineer, and Automation tester. Aside from this, you should invest some time in building strong and trustful relationships with your team. Great teams don’t happen just like that. You need to be open in communication and create an environment where your team feels valued and respected. Respect and recognize the individual strengths and contributions of everyone on the team. Also, support and provide opportunities for professional development. By investing in the skills of your team members, you not only amp up their capabilities but also show them your commitment to their growth. Last but not least, always stay on top of the trends. Technology is constantly evolving, and staying current with the changes is essential for both your team’s success and the quality of your testing efforts. Conclusion To cut to the chase, testing is an important part of the software development process that ensures the quality of the product. However, if it’s not optimized, it can also be a very time-consuming activity, slowing down your product launches and burning the company’s budget. Therefore, optimizing the STLC can be a strategic move to impact the success of your projects. Testing may occur at different phases of software development. It can be run during a post-production phase or throughout all stages of the software development phase, depending on the project’s development methodology. Despite this, the need for thorough, well-planned testing remains constant. Follow the steps we’ve covered in this article, and you’ll be able to build better software, have happier customers, and leave your competitors irrelevant.
My most-used Gen AI trick is the summarization of web pages and documents. Combined with semantic search, summarization means I waste very little time searching for the words and ideas I need when I need them. Summarization has become so important that I now use it as I write to ensure that my key points show up in ML summaries. Unfortunately, it’s a double-edged sword: will reliance on deep learning lead to an embarrassing, expensive, or career-ending mistake because the summary missed something, or worse because the summary hallucinated? Fortunately, many years as a technology professional have taught me the value of risk management, and that is the topic of this article: identifying the risks of summarization and the (actually pretty easy) methods of mitigating the risks. Determining the Problem For all of the software development history, we had it pretty easy to verify that our code worked as required. Software and computers are deterministic, finite state automata, i.e., they do what we tell them to do (barring cosmic rays or other sources of Byzantine failure). This made testing for correct behavior simple. Every possible unit test case could be handled by assertEquals(actual, expected), assertTrue, assertSame, assertNotNull, assertTimeout, and assertThrows. Even the trickiest dynamic string methods could be handled by assertTrue(string.Contains(a), string.Contains(b), string.Contains(c) and string.Contains(d). But that was then. We now have large language models, which are fundamentally random systems. Not even the full alphabet of contains(a), contains(b), or contains(c) is up to the task of verifying the correct behavior of Gen AI when the response to an API call can vary by an unknowable degree. Neither JUnit nor Nunit nor PyUnit has assertMoreOrLessOK(actual, expected). And yet, we still have to test these Gen AI APIs and monitor them in production. Once your Gen AI feature is in production, traditional observability methods will not alert you to any potential failure modes described below. So, the problem is how to ensure that the content returned by Gen AI systems are consistent with expectations, and how can we monitor them in production? For that, we have to understand the many failure modes of LLMs. Not only do we have to understand them, we have to be able to explain them to our non-technical colleagues - before there’s a problem. LLM failure modes are unique and present some real challenges to observability. Let me illustrate with a recent example from OpenAI that wasn’t covered in the mainstream news but should have been. Three researchers from Stanford University UC Berkeley had been monitoring ChatGPT to see if it would change over time, and it did. Problem: Just Plain Wrong In one case, the investigators repeatedly asked ChatGPT a simple question: Is 17,077 a prime number? Think step by step and then answer yes or no. ChatGPT responded correctly 98% of the time in March of 2023. Three months later, they repeated the test, but ChatGPT answered incorrectly 87% of the time! It should be noted that OpenAI released a new version of the API on March 14, 2023. Two questions must be answered: Did OpenAI know the new release had problems, and why did they release it? If they didn’t know, then why not? This is just one example of your challenges in monitoring Generative AI. Even if you have full control of the releases, you have to be able to detect outright failures. The researchers have made their code and instructions available on GitHub, which is highly instructive. They have also added some additional materials and an update. This is a great starting point if your use case requires factual accuracy. Problem: General Harms In addition to accuracy, it’s very possible for Generative AI to produce responses with harmful qualities such as bias or toxicity. HELM, the Holistic Evaluation of Language Models, is a living and rapidly growing collection of benchmarks. It can evaluate more than 60 public or open-source LLMs across 42 scenarios, with 59 metrics. It is an excellent starting point for anyone seeking to better understand the risks of language models and the degree to which various vendors are transparent about the risks associated with their products. Both the original paper and code are freely available online. Model Collapse is another potential risk; if it happens, the results will be known far and wide. Mitigation is as simple as ensuring you can return to the previous model. Some researchers claim that ChatGPT and Bard are already heading in that direction. Problem: Model Drift Why should you be concerned about drift? Let me tell you a story. OpenAI is a startup; the one thing a startup needs more than anything else is rapid growth. The user count exploded when ChatGPT was first released in December of 2022. Starting in June of 2023, however, user count started dropping and continued to drop through the summer. Many pundits speculated that this had something to do with student users of ChatGPT taking the summer off, but commentators had no internal data from OpenAI, so speculation was all they could do. Understandably, OpenAI has not released any information on the cause of the drop. Now, imagine that this happens to you. One day, usage stats for your Gen AI feature start dropping. None of the other typical business data points to a potential cause. Only 4% of customers tend to complain, and your complaints haven’t increased. You have implemented excellent API and UX observability; neither response time nor availability shows any problems. What could be causing the drop? Do you have any gaps in your data? Model Drift is the gradual change in the LLM responses due to changes in the data, the language model, or the cultures that provide the training data. The changes in LLM behavior may be hard to detect when looking at individual responses. Data drift refers to changes in the input data model processes over time. Model driftrefers to changes in the model's performance over time after it has been deployed and can result in: Performance degradation: the model's accuracy decreases on the same test set due to data drift. Behavioral drift: the model makes different predictions than originally, even on the same data. However, drift can also refer to concept drift, which leads to models learning outdated or invalid conceptual assumptions, leading to incorrect modeling of the current language. It can cause failures on downstream tasks, like generating appropriate responses to customer messages. And the Risks? So far, the potential problems we have identified are failure and drift in the Generative AI system’s behavior, leading to unexpected outcomes. Unfortunately, It is not yet possible to categorically state what the risks to the business might be because nobody can determine beforehand what the possible range of responses might be with non-deterministic systems. You will have to anticipate the potential risks on a Gen AI use-case-by-use-case basis: is your implementation offering financial advice or responding to customer questions for factual information about your products? LLMs are not deterministic; a statement that, hopefully, means more to you now than it did three minutes ago. This is another challenge you may have when it comes time to help non-technical colleagues understand the potential for trouble. The best thing to say about risk is that all the usual suspects are in play (loss of business reputation, loss of revenue, regulatory violations, security). Fight Fire With Fire The good news is that mitigating the risks of implementing Generative AI can be done with some new observability methods. The bad news is that you have to use machine learning to do it. Fortunately, it’s pretty easy to implement. Unfortunately, you can’t detect drift using your customer prompts - you must use a benchmark dataset. What You’re Not Doing This article is not about detecting drift in a model’s dataset - that is the responsibility of the model's creators, and the work to detect drift is serious data science. If you have someone on staff with a degree in statistics or applied math, you might want to attempt to drift using the method (maximum mean discrepancy) described in this paper: Uncovering Drift In Textual Data: An Unsupervised Method For Detecting And Mitigating Drift In Machine Learning Models What Are You Doing? You are trying to detect drift in a model’s behavior using a relatively small dataset of carefully curated text samples representative of your use case. Like the method above, you will use discrepancy, but not for an entire set. Instead, you will create a baseline collection of prompts and responses, with each prompt-response pair sent to the API 100 times, and then calculate the mean and variance for each prompt. Then, every day or so, you’ll send the same prompts to the Gen AI API and look for excessive variance from the mean. Again, it’s pretty easy to do. Let’s Code! Choose a language model to use when creating embeddings. It should be as close as possible to the model being used by your Gen AI API. You must be able to have complete control over this model’s files, and all of its configurations, and all of the supporting libraries that are used when embeddings are created and when similarity is calculated. This model becomes your reference. The equivalent of the 1 kg sphere of pure Silicon that serves as a global standard of mass. Java Implementation The how-do-I-do-this-in-Java experience for me, a 20-year veteran of Java coding, was painful until I sorted out the examples from Deep Java Learning. Unfortunately, DJL has a very limited list of native language models available compared to Python. Though over-engineered, for example, the Java code is almost as pithy as Python: Setup of the LLM used to create sentence embedding vectors. Code to create the text embedding vectors and compare the semantic similarity between two texts: The function that calculates the semantic similarity. Put It All Together As mentioned earlier, the goal is to be able to detect drift in individual responses. Depending on your use case and the Gen AI API you’re going to use, the number of benchmark prompts, the number of responses that form the baseline, and the rate at which you sample the API will vary. The steps go like this: Create a baseline set of prompts and Gen AI API responses that are strongly representative of your use case: 10, 100, or 1,000. Save these in Table A. Create a baseline set of responses: for each of the prompts, send to the API 10, 50, or 100 times over a few days to a week, and save the text responses. Save these in Table B. Calculate the similarity between the baseline responses: for each baseline response, calculate the similarity between it and the response in Table A. Save these similarity values with each response in Table B. Calculate the mean, variance, and standard deviation of the similarity values in table B and store them in table A. Begin the drift detection runs: perform the same steps as in step 1 every day or so. Save the results in Table C. Calculate the similarity between the responses in Table A at the end of each detection run. When all the similarities have been calculated, look for any outside the original variance. For those responses with excessive variance, review the original prompt, the original response from Table A, and the latest response in Table C. Is there enough of a difference in the meaning of the latest response? If so, your Gen AI API model may be drifting away from what the product owner expects; chat with them about it. Result The data, when collected and charted, should look something like this: The chart shows the result of a benchmark set of 125 prompts sent to the API 100 times over one week - the Baseline samples. The mean similarity for each prompt was calculated and is represented by the points in the Baseline line and mean plot. The latest run of the same 125 benchmark samples was sent to the API yesterday. Their similarity was calculated vs the baseline mean values, the Latest samples. The responses of individual samples that seem to vary quite a bit from the mean are reviewed to see if there is any significant semantic discrepancy with the baseline response. If that happens, review your findings with the product owner. Conclusion Non-deterministic software will continue to be a challenge for engineers to develop, test, and monitor until the day that the big AI brain takes all of our jobs. Until then, I hope I have forewarned and forearmed you with clear explanations and easy methods to keep you smiling during your next Gen AI incident meeting. And, if nothing else, this article should help you to make the case for hiring your own data scientist. If that’s not in the cards, then… math?
This article provides a structured approach to create and update a regression test suite. What kinds of tests should be in a regression test suite? Which regression tests should be run, how do you respond to regression tests that fail, and how does a regression test suite evolve? These questions and other considerations are explored in a step-by-step manner. I will first explore the basic dynamics and considerations of regression testing. Then I will provide a set of steps that can help bring long-term software stability from regression testing. Nuts and Bolts of Regression Testing Let's assume that we did a couple of changes in our software code, any kind of changes. How can we be confident that these changes will not negatively affect our code overall? One way to achieve confidence is to perform thorough regression testing. Write and execute tests to check and explore how our code behaves after our changes. So, the more tests we write and execute, the more confident we will be? Yes, but there are practical costs to be considered as well — the time, effort, and money required to write, execute, and maintain a regression test suite. Including every test possible results in a regression test suite that is too large to manage — and it’s challenging to run, as often as changes are made to the software. If the regression tests do not finish in a timely way, the development process is disrupted. It is well worth throwing money at this problem in terms of additional computational resources and/or new hires to execute tests. At some point, however, the added value of adding a test may not be worth the added expenditure of the resources needed to execute it. On the other hand, a test suite that is too small will not cover the functionality of the software sufficiently well, and too many bugs will be passed on to the users. Adding a small number of tests to a regression test suite is usually simple. Even if the marginal cost of each additional test is quite small, cumulatively, the test suite can become unwieldy. Removing tests from a regression test suite may create problems. What if a customer reports a bug that one of the removed tests would have found? Test Case Selection Techniques Selecting the right test cases involves identifying directly and indirectly affected test cases. We should know at least which features are used the most from our customers, which tests cover important functionality and which tests fail often. Other selection techniques include linear equations, symbolic execution, path analysis, data flow analysis, program dependence graphs, system dependence graphs, modification analysis, cluster identification, slicing, graph walks, and modified entity analysis. When it comes to choosing a selection technique, it is useful to think in terms of the following criteria: Inclusiveness Inclusiveness refers to the extent to which a regression test selection technique includes tests that are likely to expose faults introduced by recent changes to the software. A technique is considered more inclusive if it effectively identifies and selects tests that cover modified or affected parts of the code. Inclusiveness is vital to ensure that the selected tests provide thorough coverage of the changes made since the last test cycle. Unsafe techniques have inclusiveness of less than 100%. Precision Precision measures the ability of a regression test selection technique to exclude tests that are unnecessary for the current testing objectives. A precise technique should minimize the inclusion of tests that do not contribute to detecting faults related to recent modifications. This criterion aims to prevent over-testing, which can lead to longer test execution times and resource inefficiency. Efficiency Efficiency evaluates the computational and time resources required to perform regression testing using a particular technique. An efficient technique should be able to quickly identify and select the relevant subset of tests while minimizing the overall testing time. This is especially crucial for large software systems with extensive test suites where faster testing cycles are desirable to support agile development practices. Generality Generality assesses the applicability of a regression test selection technique across various software testing scenarios and domains. A more general technique can be used in a wide range of practical situations without significant customization. It should not be overly specialized for a specific type of software or testing context, making it adaptable to different development projects. In what follows, the four steps of regression testing are explored. We start by identifying the modified code under test. The tests that need to be executed are identified followed by a step for balancing the test suite’s size. It’s all about our test execution results. How extensive did we test, how fast did our tests run, and how confident are we that our test results provide a true picture of the system under test? Software stability in the long term can follow as we get better at each of the four steps. Step 1: Identify Modified Code Determine the specific parts of the software that have been modified since the last regression test cycle. This can be achieved through version control systems and change tracking mechanisms. This step is the foundation for the subsequent regression testing steps. Version Control and Change Tracking To identify modified code, we can use version control systems and change tracking mechanisms. Version control systems like Git, SVN, or Mercurial keep a historical record of changes made to the software's source code. Developers use these systems to commit changes along with descriptive commit messages. Change tracking mechanisms can also include issue tracking systems like JIRA or bug databases. Analyze Commit History In a version control system, we can examine the commit history to see what changes have been made to the codebase. Each commit typically includes information about which files were modified, what lines of code were added, deleted, or modified, and a description of the changes made. By analyzing this commit history, we can pinpoint the specific code that has been altered. Identify Modified Files and Code Sections Based on the commit history, we can identify the modified files and the sections of code within those files that have undergone changes. This may include functions, classes, methods, or even individual lines of code. It's essential to be as granular as possible in identifying the modified code. Document Changes It's helpful to document the nature of the changes. Are these modifications bug fixes, new features, enhancements, or other types of changes? Understanding the nature of the changes can guide our regression testing strategy. Collaboration With Development Team Collaboration between the testing and development teams is crucial during this step. Testers should communicate with developers to get a clear understanding of the changes and their impact on the software's functionality. Traceability Establish traceability between the identified modified code and the corresponding requirements or user stories. This helps ensure that the modifications align with the intended functionality and that our regression tests adequately cover these changes. By the end of Step 1, we should have a comprehensive list of the code that has been modified, along with details about the changes. This information serves as the basis for selecting the relevant tests in Step 2, ensuring that we focus our regression testing efforts on the areas of the software that are most likely to be affected by recent modifications. This targeted approach is the backbone of our regression testing’s structure. It is essential for efficient and effective regression testing. Step 2: Select Relevant Tests Once we have identified the modified code, the next step is to select the relevant tests to include in our regression test suite. This step is critical to ensure that we thoroughly test the changes made to the software. Coverage Criteria The first part of this step involves evaluating coverage criteria to determine which types of tests should be included in our regression test suite. Coverage criteria help us define the scope of our testing efforts. Two common coverage criteria are: Node Coverage (or Method Call Coverage) Node coverage focuses on identifying methods or functions that are never invoked in the modified code. This criterion is essential for ensuring that all parts of our codebase are exercised, which can help uncover dead code or unused functionality. Structural Coverage Structural coverage goes a step further by analyzing which code paths are affected by the modifications. This criterion considers not only whether methods are called but also the specific execution paths within those methods. Techniques like statement coverage, branch coverage, and path coverage fall under this category. It helps ensure that not only every method is invoked but also that different execution branches and scenarios are tested. Selection of Tests For each modification identified in Step 1, we need to select tests that directly or indirectly exercise the modified code. Directly Affected Tests Identify the tests that directly cover the modified code. These are the tests that specifically target the functions or methods that have been changed. Running these tests helps ensure that the modifications are working as intended and haven't introduced new bugs. Indirectly Affected Tests Some changes may have ripple effects on other parts of the software. Indirectly affected tests are those that may not directly exercise the modified code but interact with it in some way. For example, if a change in one module affects the output of another module, tests for the latter module should also be considered. Test Adequacy It's crucial to assess the adequacy of our selected tests. Ask yourself if these tests provide sufficient coverage of the modified code. Consider the complexity of the changes and the potential impact on the software's behavior. In some cases, we may need to create new tests specifically tailored to the changes. Documentation Keep thorough documentation of which tests are selected for each modification. This documentation ensures transparency and allows for easy tracking of test coverage for different code changes. By the end of Step 2, we should have a well-defined regression test suite that includes the necessary tests to validate the modified code effectively. This focused approach to test selection ensures that our testing efforts are comprehensive, helping us catch regressions and defects early in the development cycle. Step 3: Balance Test Suite Size While it's essential to select tests that adequately cover the modified code, it's equally important to avoid including every possible test in the regression test suite. Managing a massive test suite can become time-consuming and resource-intensive. The third step in regression testing can focus on managing the size of our regression test suite effectively. It's essential to strike a balance between thorough testing and practicality. Avoid Including Every Possible Test Including every conceivable test in our regression test suite is generally not feasible. As our software evolves, the number of tests can grow exponentially, making it impractical to execute them all within a reasonable timeframe. Running an exhaustive set of tests could significantly slow down the testing process, making it difficult to keep up with the pace of development. The optimal size of our regression test suite should be determined. This size can be based on factors like resources availability, time constraints, risks, development process and prioritization. Available Resources Consider the hardware, software, and team members available for testing. Limited resources may restrict the size of our test suite. Time Constraints Be aware of project deadlines and release schedules. We should aim to complete regression testing within the available time while ensuring adequate coverage. Risk Assessment Evaluate the criticality of the modified code and the potential impact of defects. Highly critical code changes may require more extensive testing, while less critical changes can be covered with a smaller test suite. Development Process Considerations The choice of test suite size should align with the development process. In Agile methodologies, where changes are frequent, regression tests are typically executed more often (e.g., after each sprint or iteration). Therefore, the test suite size for each regression cycle may be smaller to keep testing agile and responsive to changes. In more traditional development processes, where changes are less frequent and releases occur less often, regression tests may be conducted less frequently. In such cases, we might have larger test suites that cover a broader range of functionality. Prioritization Consider prioritizing tests based on factors such as critical business functionality, frequently used features, or areas with a history of defects. This can help ensure that the most critical parts of the software are thoroughly tested even if we have limitations on the test suite size. Documentation Our decisions regarding test suite size should be documented. This documentation will serve as a guideline for our testing strategy and provide transparency to all stakeholders. Balancing the size of our regression test suite is essential for efficient testing. It allows us to focus our testing efforts where they matter most while ensuring that we can complete regression testing within our project's constraints. By tailoring our test suite size to our specific context, we can strike the right balance between thoroughness and practicality in regression testing. Step 4: Execute Tests and Handle Results With a balanced regression test suite at hand, we can now execute it and evaluate our test results. Tests that Fail If one or more regression tests fail, investigate whether the failure is due to a fault in the software modification or an issue within the regression test itself. Did the test fail for the right reasons or for the wrong reasons? The right reason for a test to fail is that it found a bug. One wrong reason is that there is no bug and the test fails because of how it is written or executed. Additional work is required in either case. Tests that Pass If no regression tests fail, then we should be able to answer the following question confidently. Do our tests pass for the right reasons or for the wrong reasons? One right reason for this to happen is that tests exercise parts of the code that function properly. However, a test could pass testing because it may actually test nothing. It is an old test that was not maintained properly and it currently doesn't test what it was intended to test. It happens to pass testing accidentally. Test Automation Regression testing is an area that test automation gives its most benefit. Ideally, we would like to ensure that all regression tests are automated. If this was always feasible and practical, then regression testing execution would take no manual effort and would be repeatable at any time. Unfortunately, there may be automated regression tests that fail for unknown reasons, while some of them fail regularly and others irregularly. Some tests will execute fast, others will be slow, and others may execute fast in some runs and slow at other runs. Problems like these may be solved, but if we don’t solve them, as the number of regression tests increases, they can only get worse. Test automation is most valuable when it is used in a continuous integration/continuous delivery (CI/CD) pipeline. If our project follows a CI/CD pipeline, we may have the opportunity to automate and streamline regression testing. Smaller, focused test suites can be run more frequently as part of the CI/CD process, catching regressions early in the development cycle. Keep in mind that automation testing has the same goal as manual testing: To give us a clear picture if the system under test behaves as expected. As we should be confident from our manual testing results, we should also be confident from our automation testing results. When an automated regression test suite finishes execution, we should be confident that the test results depict the true picture of the system under test. The more confident we are, the less time that we will spend debugging the results of our automated tests and identifying real bugs or fixing tests that are useless. Wrapping Up To adapt to software changes, we must first recognize that a regression test suite suitable for one version of the software may not suffice for subsequent versions. The first step is to identify what changed in our code. We must account for different types of code changes like creating new features, improving existing features, fixing bugs and glitches, refactorings, and performance improvements. Even when functionality remains unchanged, we must reassess the regression test suite's adequacy, especially if there's code restructuring. The second step is to identify what tests to include in our regression test suite. We can select all relevant tests that cover the code changes identified from step 1. Regression test suites should be created or modified as needed to incorporate new tests that cover altered functionality or code paths. Bear in mind that there are no perfect metrics. For example, node coverage can be a useful metric but it is possible to have a high node coverage and still have gaps in the test suite. For example, a test suite may cover all the nodes in a program, but it may not test important input values or execution paths. The third step is to balance our test suite’s size. This is very important, as the number of tests grows and/or the time it takes for test execution to finish becomes an obstacle. Obsolete tests that are no longer relevant due to software changes can be removed. Once test execution has finished, it is important to scrutinize our test results. Once we are confident that our test results are trustworthy, we can share our findings with the appropriate stakeholders.
Mocking, in a broader software development and testing context, is a technique used to simulate the behavior of certain components or objects in a controlled manner. It involves creating fake or mock objects that imitate the behavior of real objects or components within a software system. Mocking is often used in various stages of software development, including testing, to isolate and focus on specific parts of a system while ignoring the complexities of its dependencies. Mocking allows developers and testers to isolate specific parts of a system for testing without relying on the actual implementation of external components, services, or modules. Benefits of Mocking A few of the benefits of Mocking are: Simulation of Dependencies: Mocking involves creating mock objects or functions that imitate the behavior of real components or services that a software system relies on. These mock objects or functions provide predefined responses to interactions, such as method calls or API requests. Isolation: Mocking helps in isolating the unit or component being tested from the rest of the system. This isolation ensures that any issues or failures detected during testing are specific to the unit under examination and not caused by external factors. Speed and Efficiency: Using mock objects or functions can expedite the testing process because they provide immediate and predictable results. There is no need to set up and configure external services or wait for actual responses. Reduced Risk: Since mocking avoids the use of real external dependencies, there is a reduced risk of unintended side effects or data corruption during testing. The test environment remains controlled and predictable. Speed: Real dependencies might involve time-consuming operations or external services like databases, APIs, or network calls. Mocking these dependencies can significantly speed up the testing process because you eliminate the need to perform these real operations. Reduced Resource Usage: Since mocks don’t use real resources like databases or external services, they reduce the load on these resources during testing, which can be especially important in shared development and testing environments. Mocking API In Cypress? Mocking APIs in Cypress is a powerful technique for simulating APIs and external services in your tests. This allows you to create controlled environments for testing different scenarios without relying on the actual services. To mock an API in Cypress, you can use the cy.intercept() command. This command intercepts HTTP requests and returns a predefined response. Cypress’s cy.intercept() method empowers developers and testers to intercept and manipulate network requests, allowing them to simulate various scenarios and responses, thus making it an indispensable tool for testing in dynamic and unpredictable environments. By crafting custom responses or simulating error conditions, you can comprehensively assess your application’s behavior under diverse conditions. To use cy.intercept(), you must first specify the method and URL of the request to intercept. You can also specify other parameters, such as the request body and headers. The second argument to cy.intercept() is the response that you want to return. For example, the following code mocks the response to a GET request to the /api/users endpoint: cy.intercept('GET', '/api/users', { statusCode: 200, body: [ { id: 1, name: 'John Doe', email: 'john.doe@example.com', }, { id: 2, name: 'Jane Doe', email: 'jane.doe@example.com', }, ], }); Example Mocking API Data in Cypress Step 1 Create a Scenario To Automate To gain deeper insights, we’ll automate a specific scenario at this link. Below are the steps we are going to automate for Mocking the data : Visit the website at this link. Upon opening the page, two requests are triggered in the Network call — one for Tags and the other for Articles. Intercept the Tags request, and instead of the original list of Tags, insert two new tags: “Playwright” and “QAAutomationLabs”. Make sure to verify that these tags are displayed correctly in the user interface. Intercept the Article request, and instead of the original list of articles, provide just one article with modified details. You should change the username, description, and the number of likes. Afterward, confirm that these modifications are accurately reflected in the user interface. Before Automating the above steps, let's create the data that we want to Mock. Step 2 Create Data To Mock the API Create two files with name mockTags.json,mockArticles.json 1. mockTags.json { "tags":[ "Cypress", "Playwright", "SLASSCOM" ] } 2. mockArticles.json { "articles":[ { "title":"Hi qaautomationlabs.com", "slug":"Hi - qaautomationlabs.com", "body":"qaautomationlabs", "createdAt":"2020-09-26T03:18:26.635Z", "updatedAt":"2020-09-26T03:18:26.635Z", "tagList":[ ], "description":"SLASSCOM QUALITY SUMMIT 2023", "author":{ "username":"Kailash Pathak", "bio":null, "image":"https://static.productionready.io/images/smiley-cyrus.jpg", "following":false }, "favorited":false, "favoritesCount":1000 } ], "articlesCount":500 } Step 3 Create Script Let’s create Script for mocking the API data. describe("API Mocking in Cypress using cy.intercept Method ", () => { beforeEach(() => { cy.visit("https://angular.realworld.io/"); cy.intercept("GET", "https://api.realworld.io/api/tags", { fixture: "mockTags.json", }); cy.intercept( "GET", "https://api.realworld.io/api/articles?limit=10&offset=0", { fixture: "mockArticles.json" } ); }); it("Mock API Tags, and then validate on UI", () => { cy.get(".tag-list", { timeout: 1000 }) .should("contain", "Cypress") .and("contain", "Playwright"); }); it("Mock the Article feed, and then validate on UI", () => { cy.get("app-favorite-button.pull-xs-right").contains("10"); cy.get(".author").contains("Kailash Pathak"); cy.get(".preview-link > p").contains("SLASSCOM QUALITY SUMMIT 2023"); }); }); Code Walkthrough Let me break down the code step by step: describe("API Mocking in Cypress using cy.intercept Method", () => { ... }): This is a test suite description. It defines a test suite titled "API Mocking in Cypress using cy.intercept Method." beforeEach(() => { ... }): This is a hook that runs before each test case in the suite. It sets up the initial conditions for the tests. cy.visit("https://angular.realworld.io/");: It opens a web page at the URL "angular.realworld" using Cypress. cy.intercept("GET", "https://api.realworld.io/api/tags", { fixture: "mockTags.json" });: This line intercepts a GET request to "tags" and responds with data from the fixture file "mockTags.json." It mocks the API call to retrieve tags. cy.intercept("GET", "https://api.realworld.io/api/articles?limit=10&offset=0", { fixture: "mockArticles.json" });: Similar to the previous line, this intercepts a GET request to "article" and responds with data from the fixture file "mockArticles.json." It mocks the API call to retrieve articles. it("Mock API Tags, and then validate on UI", () => { ... }): This is the first test case. It verifies that the mocked tags are displayed on the UI. cy.get(".tag-list", { timeout: 1000 })...: It selects an element with the class "tag-list" and waits for it to appear for up to 1000 milliseconds. Then, it checks if it contains the tags "Cypress" and "Playwright." it("Mock the Article feed, and then validate on UI", () => { ... }): This is the second test case. It verifies that the mocked articles are displayed correctly on the UI. cy.get("app-favorite-button.pull-xs-right").contains("10");: It selects an element with the class "app-favorite-button.pull-xs-right" and checks if it contains the text "10." cy.get(".author").contains("Kailash Pathak");: It selects an element with the class "author" and checks if it contains the text "Kailash Pathak." cy.get(".preview-link > p").contains("SLASSCOM QUALITY SUMMIT 2023");: It selects an element with the class "preview-link" and checks if it contains the text "SLASSCOM QUALITY SUMMIT 2023." Step 4 Execute the Script Run the command yarn Cypress Open. Default Data displaying in the site: The below tags are displayed by default in the site for Tags. Below Feed are displayed by default in the site for Tags. Data After Mocking the Data In the screenshot below, you can see the tags that we have provided in mockTags.json replaced with the default tags. In the screenshot below, you can see the Feed that we have provided in mockArticles.json replaced with the default Feeds. Conclusion In conclusion, mocking API responses in Cypress is a powerful technique for testing your application’s frontend behavior in a controlled and predictable manner. It promotes faster, more reliable, and isolated testing of various scenarios, helping you catch bugs and ensure your application works as intended. Properly maintained mock data and well-structured tests can be invaluable assets in your testing strategy.
The topic of Serverless testing is a hot one at the moment. There are many different approaches and opinions on how best to do it. In this post, I'm going to share some advice on how we tackled this problem, what the benefits are of our approach, and how things could be improved. The project in question is Stroll Insurance, a fully Serverless application running on AWS. In previous posts, we have covered some of the general lessons learned from this project, but in this post, we are going to focus on testing. For context, the web application is built with React and TypeScript, which makes calls to an AppSync API that makes use of the Lambda and DynamoDB datasources. We use Step Functions to orchestrate the flow of events for complex processing like purchasing and renewing policies, and we use S3 and SQS to process document workloads. The Testing 'Triangle' When the project started, it relied heavily on unit testing. This isn't necessarily a bad thing, but we needed a better balance between getting features delivered and maintaining quality. Our testing pyramid triangle looked like this: Essentially, we had an abundance of unit tests and very few E2E tests. This worked really well for the initial stages of the project, but as the product and AWS footprint grew in complexity, we could see that a number of critical parts of the application had no test coverage. Specifically, we had no tests for: Direct service integrations used by AppSync resolvers + StepFunctions Event-driven flows like document processing These underpin critical parts of the application. If they stop working, people will be unable to purchase insurance! One problem that we kept experiencing was unit tests would continue to pass after a change to a lambda function, but subsequent deployments would fail. Typically, this was because the developer had forgotten to update the permissions in CDK. As a result, we created a rule that everyone had to deploy and test their changes locally first, in their own AWS sandbox, before merging. This worked, but it was an additional step that could easily be forgotten, especially when under pressure or time constraints. Balancing the Triangle So, we agreed that it was time to address the elephant in the room. Where are the integration tests? Our motivation was simple: There is a lack of confidence when we deploy to production, meaning we perform a lot of manual checks before we deploy and sometimes these don't catch everything. This increases our lead time and reduces our deployment frequency. We would like to invert this. The benefits for our client were clear: Features go live quicker, reducing time to market while still maintaining quality Gives them a competitive edge Reduces feedback loop, enabling iteration on ideas over a shorter period of time Critical aspects of the application are tested Issues can be diagnosed quicker Complex bugs can be reproduced Ensures / Reduces no loss of business Integration testing can mean a lot of different things to different teams, so our definition was this: An integration test in this project is defined as one that validates integrations with AWS services (e.g. DynamoDB, S3, SQS, etc) but not third parties. These should be mocked out instead. Breaking Down the Problem We decided to start small by first figuring out how to test a few critical paths that had caused us issues in the past. We made a list of how we “trigger” a workload: S3 → SQS → Lambda DynamoDB Stream → SNS → Lambda SQS → Lambda StepFunction → Lambda The pattern that emerged was that we have an event that flows through a messaging queue, primarily SQS and SNS. There are a number of comments we can make about this: There's no real business logic to test until a Lambda function or a State Machine is executed, but we still want to test that everything is hooked up correctly. We have the most control over the Lambda functions, and it will be easier to control the test setup in there. We want to be able to put a function or a State Machine into “test mode” so that it will know when to make mocked calls to third parties. We want to keep track of test data that is created so we can clean it up afterward. Setting the Test Context One of the most critical parts of the application is how we process insurance policy documents. This has enough complexity to be able to develop a good pattern for writing our tests so that other engineers could build upon it in the future. This was the first integration test we were going to write. The flow is like this: The file is uploaded to the S3 bucket This event is placed onto an SQS queue with a Lambda trigger The Lambda function reads the PDF metadata and determines who the document belongs to. It fetches some data from a third-party API relating to the policy and updates a DynamoDB table. The file is moved to another bucket for further processing. We wanted to assert that: The file no longer exists in the source bucket The DynamoDB table was updated with the correct data The file exists in the destination bucket This would be an incredibly valuable test. Not only does it verify that the workload is behaving correctly, but it also verifies that the deployed infrastructure is working properly and that it has the correct permissions. For this to work, we needed to make the Lambda Function aware that it was running as part of a test so that it would use a mocked response instead. The solution that we came up with was to attach some additional metadata to the object when it was uploaded at the start of the test case — an is-test flag: If the S3 object is moved to another bucket as part of its processing, then we also copy its metadata. The metadata is never lost, even in more complex or much larger end-to-end workflows. The Middy Touch Adding the is-test flag to our object metadata gave us our way of passing some kind of test context into our workload. The next step was to make the Lambda Function capable of discovering the context and then using that to control how it behaves under test. For this, we used Middy. If you're not familiar, Middy is a middleware framework specifically designed for Lambda functions. Essentially, it allows you to wrap your handler code up so that you can do some before and after processing. I'm not going to do a Middy deep dive here, but the documentation is great if you haven't used it before. We were already using Middy for various different things, so it was a great place to do some checks before we executed our handler. The logic is simple: In the before phase of the middleware, check for the is-test flag in the object's metadata, and if true, set a global test context so that the handler is aware it's running as part of a test. In the after phase (which is triggered after the handler is finished), clear the context to avoid any issues for subsequent invocations of the warmed-up function: TypeScript export const S3SqsEventIntegrationTestHandler = (logger: Logger): middy.MiddlewareObj => { // this happens before our handler is invoked. const before: middy.MiddlewareFn = async (request: middy.Request<SQSEvent>): Promise<void> => { const objectMetadata = await getObjectMetadata(request.event); const isIntegrationTest = objectMetadata.some(metadata => metadata["is-test"] === "true"); setTestContext({isIntegrationTest}); }; // this happens after the handler is invoked. const after: middy.MiddlewareFn = (): void => { setTestContext({isIntegrationTest: false}); }; return { before, after, onError }; }; Here's the test context code. It follows a simple TypeScript pattern to make the context read-only: TypeScript export interface TestContext { isIntegrationTest: boolean } const _testContext: TestContext = { isIntegrationTest: false }; export const testContext: Readonly<TestContext> = _testContext; export const setTestContext = (updatedContext: TestContext): void => { _testContext.isIntegrationTest = updatedContext.isIntegrationTest; }; I think this is the hardest part about solving the Serverless testing “problem.” I believe the correct way to do this is in a real AWS environment, not a local simulator, and just making that deployed code aware that it is running as part of a test is the trickiest part. Once you have some kind of pattern for that, the rest is straightforward enough. We then built upon this pattern for each of our various triggers, building up a set of middleware handlers for each trigger type. For our S3 middleware, we pass the is-test flag in an object's metadata, but for SQS and SNS, we pass the flag using message attributes. A Note on Step Functions By far the most annoying trigger to deal with was Lambda Functions invoked by a State Machine task. There is no easy way of passing metadata around each of the states in a State Machine - a global state would be really helpful (but would probably be overused and abused by people). The only thing that is globally accessible by each state is the Context Object. Our workaround was to use a specific naming convention when the State Machine is executed, with the execution name included in the Context Object and therefore available to every state in the State Machine. For State Machines that are executed by a Lambda Function, we can use our testContext to prefix all State Machine executions with "IntegrationTest-". This is obviously a bit of a hack, but it does make it easy to spot integration test runs from the execution history of the State Machine. We then make sure that the execution name is passed into each Lambda Task and that our middleware is able to read the execution name from the event. (Note that $$ provides access to the Context Object). Another difficult thing to test with Step Functions is error scenarios. These will often be configured with retry and backoff functionality, which can make tests too slow to execute. Thankfully, there is a way around this, which my colleague, Tom Bailey, has covered in a great post. I would recommend giving that a read. Mocking Third-Party APIs We're now at a point where a Lambda Function is being invoked as part of our workload under test. That function is also aware that it's running as part of a test. The next thing we want to do is determine how we can mock the calls to our third-party APIs. There are a few options here: Wiremock: You could host something like wiremock in the AWS account and call the mocked API rather than the real one. I've used Wiremock quite a bit, and it works really well, but can be difficult to maintain as your application grows. Plus, it's another thing that you have to deploy and maintain. API Gateway: Either spin up your own custom API for this or use the built-in mock integrations. DynamoDB: This is our current choice. We have a mocked HTTP client that, instead of making an HTTP call, queries a DynamoDB table for a mocked response, which has been seeded before the test has run. Using DynamoDB gave us the flexibility we needed to control what happens for a given API call without having to deploy a bunch of additional infrastructure. Asserting That Something Has Happened Now, it's time to determine if our test has actually passed or failed. A typical test would be structured like this: TypeScript it("should successfully move documents to the correct place", async () => { const seededPolicyData = await seedPolicyData(); await whenDocumentIsUploadedToBucket(); await thenDocumentWasDeletedFromBucket(); await thenDocumentWasMovedToTheCorrectLocation(); }); With our assertions making use of the aws-testing-library: TypeScript async function thenDocumentWasMovedToTheCorrectLocation(): Promise<void> { await expect({ region, bucket: bucketName, }).toHaveObject(expectedKey); } The aws-testing-library gives you a set of really useful assertions with built-in delays and retries. For example: Checking an item exists in DynamoDB: TypeScript await expect({ region: 'us-east-1', table: 'dynamo-db-table', }).toHaveItem({ partitionKey: 'itemId', }); Checking an object exists in an S3 bucket: TypeScript await expect({ region: 'us-east-1', bucket: 's3-bucket', }).toHaveObject( 'object-key' ); Checking if a State Machine is in a given state: TypeScript await expect({ region: 'us-east-1', stateMachineArn: 'stateMachineArn', }).toBeAtState('ExpectedState'); It's important to note that because you're testing in a live, distributed system, you will have to allow for cold starts and other non-deterministic delays when running your tests. It certainly took us a while to get the right balance between retries and timeouts. While at times it has been flakey, the benefits of having these tests far outweigh the occasional test failure. Running the Tests There are two places where these tests get executed: developer machines and CI. Each developer on our team has their own AWS account. They are regularly deploying a full version of the application and run these integration tests against it. What I really like to do is get into a test-driven development flow where I will write the integration test first and make my code changes, which will be hot-swapped using CDK, and then run my integration test until it turns green. This would be pretty painful if I was waiting on a full stack to deploy each time, but Hot Swap works well at reducing the deployment time. On CI, we run these tests against a development environment after a deployment has finished. It Could Be Better There are a number of things that we would like to improve upon in this approach. Temporary environments: We would love to run these tests against temporary environments when a Pull Request is opened. Test data cleanup: Sometimes, tests are flaky and don't clean up after themselves properly. We have toyed with the idea of setting a TTL on DynamoDB records when data is created as part of a test. Run against production: We don't run these against production yet, but that is the goal. Open source the middleware: I think more people could make use of the middleware than just us, but we haven't got around to open-sourcing it yet. AWS is trying to make it better: Serverless testing is a hot topic at the moment. AWS has responded with some great resources. Summary While there are still some rough edges to our approach, the integration tests really helped with the issues we have already outlined and can be nicely summarised with three of the four key DORA metrics: Deployment frequency: The team's confidence increased when performing deployments, which increased their frequency. Lead time for changes: Less need for manual testing reduced the time it takes for a commit to make it to production. Change failure rate: Permissions errors no longer happen in production, and bugs are caught sooner in the process. The percentage of deployments causing a failure in production was reduced. Resources
The test pyramid (testing pyramid, test automation pyramid) was originally published in the famous book by Mike Cohn: Succeeding with Agile: Software Development Using Scrum (Cohn, 2010). The original figure is: The concept is simple: you should write more unit tests than service tests and only a few UI tests. The reason behind this is that: UI tests are slow. UI tests are brittle. There were many modifications to the original version: Adding manual tests at the top of the pyramid Modify the name ‘service’ to ‘integration’. Modify the name ‘UI’ to ‘e2e’. Add more layers such as ‘API’ or ‘component.’ An article that considers several alternatives is (Roth 2019). However, the main problem is that this concept Considers only some aspects of testing, Cannot consider the progress of test (design) automation. In this article, we delve into a comprehensive approach to test design and test automation, focusing on the primary objective of testing, i.e., bug detection rather than execution speed optimization. Therefore, let's explore the effectiveness of test cases. Mutation Testing To measure the efficiency of test cases, we employ mutation testing, a technique that involves testing the tests themselves by introducing slight modifications to the original code, creating multiple mutants. A robust test dataset should be capable of distinguishing the original code from all carefully selected mutants. In mutation testing, we intentionally inject faults into the code to assess the reliability of our test design. A dependable test dataset must effectively "eliminate" all mutants. A test eliminates a mutant when there are discernible differences in behavior between the original code and the mutant. For instance, if the original code is y = x, and a mutant emerges as y = 2 * x, a test case like x = 0 fails to eliminate the mutant, whereas x = 1 succeeds in doing so. Unfortunately, the number of potential mutants is excessively high. However, a significant reduction can be achieved by concentrating on efficient first-order mutants. In the realm of first-order mutants, modifications are limited to a single location within the code. Conversely, second-order mutants involve alterations at two distinct locations, and during execution, both modifications come into play. An investigation by Offutt demonstrated that when all first-order mutants are effectively eliminated, only an exceptionally minute fraction of second-order mutants remain unaddressed. This implies that if a test set is capable of exterminating all first-order mutants, it can also address second-order mutants with efficacy ranging from 99.94% to 99.99%, as per Offutt's empirical study. It's essential to note that our consideration solely pertains to non-equivalent mutants, signifying that test cases exist for each mutant's elimination. We can further reduce the mutants, but first, we should consider the efficiency of the test cases. We consider test case efficiency with respect to the reduced mutation set. It’s only a very slight restriction as the imprecision is less than 0.1%. A test case is unreliable if it cannot find any bug in any mutants, i.e., it doesn’t eliminate any mutants. A test case is superfluous if there are other test cases that eliminate the same mutants. A test case T1 substitutes test T2 if it eliminates all the mutants as T2 and at least one more. A test case T1 is stronger than T2 if it eliminates more mutants than T2, but T1 doesn’t substitute T2. A test set is quasi-reliable if it eliminates all the mutants. A test set is reliable if it finds any defect in the code. A quasi-reliable test set is very close to a reliable test set. A test set is quasi-optimal if it is quasi-reliable and it consists of less than or equal number of test cases for all other quasi-reliable test sets. A test case is deterministic if it either passes or fails for all executions. A test case is non-deterministic if it both passes and fails for some executions. Flaky tests are non-deterministic. Non-deterministic test cases should be improved to become deterministic or should be deleted. Now, we can reduce the mutant set. A mutant is superfluous if no test case in a quasi-reliable test set eliminates only this mutant. For example, if only test case T1 eliminates this mutant, but T1 also eliminates another mutant, then we can remove this mutant. The reduced mutant set is called the optimal mutant set. Ideal Mutant Set In this manner, assuming we possessed flawless software, we could create an ideal mutant set from which we could derive a test set that is quasi-reliable. It's of paramount significance that the number of test cases required to detect nearly all defects (99.94% or more) would not exceed the count of optimal mutants. The author, with his co-author Attila Kovács, developed a website and a mutation framework with mutant sets that are close to optimal (i.e., there are only very few or zero superfluous mutants but no missing mutants). The following table shows the code size and the number of the mutants in the near optimum mutant sets (in the code, each parameter is in a different line): Program Code size (LOC) Number of reliable mutants Pizza ordering 142 30 Tour competition 141 24 Extra holiday 57 20 Car rental 49 15 Grocery 33 26 You can see that the code size and the number of mutants correlate, except for the Grocery app. We believe that the number of optimum mutants in the mutant sets is (close to) linear with the code, which means a reliable test set could be developed. Unfortunately, developing an optimal mutant set is difficult and time-consuming. Don’t Use the Test Pyramid or its Alternatives Why is this ‘artificial mutant creation’ important? We argue that during test automation, we should optimize the test design to find as many defects as we can but avoid superfluous tests. As the tests should the mutants, it’s the system’s attribute that a test eliminating a mutant is a unit, an integration, or a system (e2e) test. We should optimize the test design for each level separately; that’s why there cannot be a pre-described shape of test automation. You can argue that you can add more unit test cases as it is cheap to execute. However, there are other factors of tests as well. You should design and code tests. The difficulty is the calculation of the results, which can be time-consuming and error-prone. In creating e2e tests, the outputs (results) are only checked instead of calculated, which is much easier to see (Forgacs and Kovacs, 2023). Another problem is maintenance. While maintaining e2e tests is cheap, see the book above again, unfortunately, maintaining the unit tests is expensive, see (Ellims et al. 2006). OK, but if most defects can be found by unit testing, then the test pyramid is appropriate to use. However, it’s not true. Runeson et al. (2006) showed in a case study that unit tests detected only 30-60% of the defects. In addition, Berling and Thelin 2004 showed that for different programs, the ratio of bug detection for different test levels is different. That’s why the test design should be carried out one by one for the different levels independently from each other. Don’t design fewer e2e test cases than needed, as your system’s quality remains low, and the bug-fixing costs will be higher than the costs of missing test design and execution. Don’t design more unit tests; your costs will significantly increase without improving quality. But how to decrease unit tests? If you find some bugs with the unit test and the bugs could have been detected by several other test unit test cases, then you included superfluous tests, and you should remove them. Conclusion We showed that any type of test automation (shape) is faulty. The ratio of the test cases of different test levels is not an input but an output because of your system and the selected test design techniques based on risk analysis. We can conclude other things as well. As the number of quasi-reliable test cases is probably linear with the code, it’s enough to apply linear test design techniques. In this way, let’s apply each-transition (0-switch) testing instead of n-switch testing, where n > 0. Similarly, in most cases, avoid using combinatorial testing, such as all-pair testing, as a large part of the tests will be superfluous. Instead, we should develop more efficient linear test design techniques (see Forgacs and Kovacs, 2023). There are cases when you still need to use a stronger test design. If a defect may cause more damage than the whole SDLC cost, then you should apply stronger methods. However, most systems do not fall into this category.
During project and product development, software engineering teams need to make architectural decisions to reach their goals. These decisions can be technical or process-related. Technical: Deciding to use JBOSS Data Grid as a caching solution vs Amazon Elasticache or deciding to use the AWS Network Load Balancer (NLB) vs AWS Application Load Balancer (ALB). Process: Deciding to use a Content Management portal for sharing documents or project-related artifacts. Making these decisions is a time-consuming and difficult process, and it's essential that teams justify, document, and communicate these decisions to relevant stakeholders. Three major anti-patterns often emerge when making architectural decisions: No decision is made at all out of fear of making the wrong choice. A decision is made without any justification, and most of the time, people don’t understand why it was made and the use case or the scenario that has been considered. This results in the same topic being discussed multiple times. The decision isn’t captured in an architectural decision repository, so team members forget or don’t know that the decision was made. What Is an ADR? An Architecture Decision Record (ADR) is a document that captures a decision, including the context of how the decision was made and the consequences of adopting the decision. When Will You Write an ADR? ADRs are typically written when a significant architectural decision needs to be made, such as when selecting a new technology, framework, or design pattern or when making a trade-off between different architectural goals, such as performance, scalability, and maintainability. ADRs are also useful for documenting decisions that have already been made to ensure that the rationale behind them is clear to all members of the development team. ADRs also ensure that you are aligned with the organization’s IT strategies. ADRs typically include information such as the problem being addressed, the options considered, the decision made, the reasons behind the decision, and any relevant technical details. They may also include any implications or risks associated with the decision, as well as any future work that may be required because of the decision. Writing ADRs can help to promote transparency and collaboration within a development team, as well as provide a valuable resource for future developers who may need to understand the reasoning behind past decisions. Best Practices When Writing an ADR When writing an Architecture Decision Record (ADR), it's important to follow some best practices to ensure that the ADR is clear, useful, and easy to understand. Here are some best practices for writing an ADR: Start with a clear title: The title of the ADR should be clear and concise and should summarize the decision being made. Define the problem: Begin the ADR by clearly defining the problem or challenge that the decision is addressing. This helps to provide context for the decision and ensures that everyone understands the problem being solved. Describe the decision: Clearly describe the decision that has been made, including the alternatives considered and the reasons for selecting the chosen option. This should include any trade-offs or compromises that were made, as well as any technical details that are relevant. Explain the rationale: Provide a clear and detailed explanation of the rationale behind the decision. This should include any relevant business or technical considerations, as well as any risks or potential drawbacks. Document any implications: Document any implications of the decision, including any dependencies on other parts of the system, any impacts on performance or scalability, and any risks or issues that may arise because of the decision. Keep it concise: ADRs should be concise and easy to read. Avoid including unnecessary information or technical jargon and focus on providing clear and concise explanations of the decision-making process and its rationale. Keep it up-to-date: ADRs should be kept up-to-date as the project progresses. If new information or considerations arise that impact the decision, the ADR should be updated to reflect these changes. By following these best practices, ADRs can provide a clear and useful record of important architectural decisions and help to ensure that everyone on the team is aligned and informed about the reasoning behind those decisions. Example ADR Now that we have defined what an ADR is and the best practices to be followed when writing an ADR let’s try and put those best practices in writing an ADR. For writing an example ADR, we will try and document one of the solutions described in the blog, migrating unstructured data (files) from on-premises storage to AWS. In the blog, there are three scenarios and a solution for each of those scenarios. For this ADR example, we will pick the solution for Migrating from NAS to AWS using AWS DataSync. Plain Text Title: Migrating from NAS to AWS using AWS DataSync Status: Accepted Date: 6th October 2023 Context: Application A picks up incoming files from an Application X, processes them and generates data files that are 50–300 GB. That, then, becomes the input for another Application Y to consume. The data is shared by means of an NFS Storage accessible to all three applications. Application A is being migrated to AWS and the Applications X and Y continue to remain on-premises. We used AWS Elastic File System (EFS) to replace NFS on AWS. However, that makes it difficult for the applications to read/write from a common storage solution, and network latency slows down Application X and Application Y Decision: We will use AWS DataSync Service to perform the initial migration of nearly 1 TB of data from the on-premises NFS storage to AWS EFS AWS DataSync can transfer data between any two network storage or object storage. These could be network file systems (NFS), server message block (SMB) file servers, Hadoop distributed file systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems and Amazon FSx for OpenZFS file systems. To solve the need for the applications to read/write from a common storage solution and address the network latency involved during read/write operations across the Direct Connect, we scheduled a regular synchronization of the specific input and output folders using the AWS DataSync service between the NFS and EFS. This means that all three applications look at same set of files after the sync is complete. Consequences: Positive • No fixed/upfront cost and only $0.0125 per gigabyte (GB) for data transferred. Negative • Syncs can be scheduled at minimum one-hour intervals. This soft limit can be modified for up to 15-minutes intervals, however, that leads to performance issues and subsequent sync schedules getting queued up, which forms a loop. • Bidirectional Syncs were configured to run in a queued fashion. That is, only one-way sync can be executed at a time. Applications will have to read the files after the sync interval is completed. In our case, files are generated only one time per day, so this challenge was mitigated by scheduling the read/writes in a timely fashion. • AWS DataSync Agent (virtual appliance) must be installed on a dedicated VM on-premises. Compliance: Notes: Author(s): Rakesh Rao and Santhosh Kumar Ramabadran Version: 0.1 Changelog: 0.1: Initial proposed version While the above is one format, an ADR can be created in any format agreed with the stakeholders. It could be as simple as a Word document, a spreadsheet, or a presentation. When Will You Not Write an ADR? While Architecture Decision Records (ADRs) can be helpful in documenting important architectural decisions, there may be some cases where writing an ADR is not necessary or appropriate. Here are a few examples: Minor decisions: If a decision has minimal impact on the architecture of the system or is relatively straightforward, it may not be necessary to write an ADR. For example, if a team decides to update a library to a newer version, and the update is expected to have little impact on the overall architecture, an ADR may not be necessary. Temporary decisions: If a decision is expected to be temporary or is only applicable to a specific context or situation, it may not be necessary to write an ADR. For example, if a team decides to implement a temporary workaround for a bug, and the workaround is not expected to be a long-term solution, an ADR may not be necessary. Routine decisions: If a team makes routine decisions that are not particularly significant or require little discussion or debate, an ADR may not be necessary. For example, if a team decides to follow an established design pattern or uses a commonly used technology, an ADR may not be necessary. Existing documentation: If the decision has already been documented elsewhere, such as in project requirements or design documentation, it may not be necessary to create an ADR specifically for that decision. Ultimately, the decision of whether to write an ADR depends on the significance of the decision and the context in which it is being made. If the decision has a significant impact on the architecture of the system, involves trade-offs or alternatives, or is likely to have long-term implications, it is generally a good idea to create an ADR to document the decision-making process. Alternatives to ADR While Architecture Decision Records (ADRs) are a common and effective way to document important architectural decisions, there are several alternative approaches that can be used depending on the specific context and needs of a project. Here are a few alternatives to ADRs: Code comments: One simple alternative to ADRs is to use code comments to document architectural decisions directly within the codebase. This can be a useful approach for smaller projects or for teams that prefer a more lightweight approach to documentation. However, code comments can become difficult to manage and may not provide enough context or detail for more complex decisions. Design documents: Design documents can provide a more comprehensive and detailed way to document architectural decisions. These documents can include diagrams, flowcharts, and other visual aids to help explain the architecture of a system. However, design documents can be time-consuming to create and may become outdated as the project evolves. Wikis or knowledge bases: Wikis or knowledge bases can be used to document architectural decisions in a more flexible and searchable way than ADRs. This approach can be particularly useful for large or complex projects, as it allows teams to easily find and reference information related to specific architectural decisions. However, wikis and knowledge bases can also become difficult to manage and may require additional effort to keep up-to-date. Meetings and discussions: Another approach to documenting architectural decisions is to hold regular meetings or discussions to review and document decisions. This approach can be useful for teams that prioritize face-to-face communication and collaboration but may not be as effective for remote teams or those with members in different time zones. Ultimately, the best approach to documenting architectural decisions depends on the specific needs and context of a project. Teams should consider factors such as project size, team size, and communication preferences when deciding which approach to use.
When doing unit tests, you have probably found yourself in the situation of having to create objects over and over again. To do this, you must call the class constructor with the corresponding parameters. So far, nothing unusual, but most probably, there have been times when the values of some of these fields were irrelevant for testing or when you had to create nested "dummy" objects simply because they were mandatory in the constructor. All this has probably generated some frustration at some point and made you question whether you were doing it right or not; if that is really the way to do unit tests, then it would not be worth the effort. That is to say, typically, a test must have a clear objective. Therefore, it is expected that within the SUT (system under test) there are fields that really are the object of the test and, on the other hand, others are irrelevant. Let's take an example. Let's suppose that we have the class "Person" with the fields Name, Email, and Age. On the other hand, we want to do the unit tests of a service that, receiving a Person object, tells us if this one can travel for free by bus or not. We know that this calculation only depends on the age. Children under 14 years old travel for free. Therefore, in this case, the Name and Email fields are irrelevant. In this example, creating Person objects would not involve too much effort, but let's suppose that the fields of the Person class grow or nested objects start appearing: Address, Relatives (List of People), Phone List, etc. Now, there are several issues to consider: It is more laborious to create the objects. What happens when the constructor or the fields of the class change? When there are lists of objects, how many objects should I create? What values should I assign to the fields that do not influence the test? Is it good if the values are always the same, without any variability? Two well-known design patterns are usually used to solve this situation: Object Mother and Builder. In both cases, the idea is to have "helpers" that facilitate the creation of objects with the characteristics we need. Both approaches are widespread, are adequate, and favor the maintainability of the tests. However, they still do not resolve some issues: When changing the constructors, the code will stop compiling even if they are fields that do not affect the tests. When new fields appear, we must update the code that generates the objects for testing. Generating nested objects is still laborious. Mandatory and unused fields are hard coded and assigned by default, so the tests have no variability. One of the Java libraries that can solve these problems is "EasyRandom." Next, we will see details of its operation. What is EasyRandom? EasyRandom is a Java library that facilitates the generation of random data for unit and integration testing. The idea behind EasyRandom is to provide a simple way to create objects with random values that can be used in tests. Instead of manually defining values for each class attribute in each test, EasyRandom automates this process, automatically generating random data for each attribute. This library handles primitive data types, custom classes, collections, and other types of objects. It can also be configured to respect specific rules and data generation restrictions, making it quite flexible. Here is a basic example of how EasyRandom can be used to generate a random object: Java public class EasyRandomExample { public static void main(String[] args) { EasyRandom easyRandom = new EasyRandom(); Person randomPerson = easyRandom.nextObject(Person.class); System.out.println(randomPerson); } } In this example, Person is a dummy class, and easyRandom.nextObject(Person.class) generates an instance of Person with random values for its attributes. As can be seen, the generation of these objects does not depend on the class constructor, so the test code will continue to compile, even if there are changes in the SUT. This would solve one of the biggest problems in maintaining an automatic test suite. Why Is It Interesting? Using the EasyRandom library for testing your applications has several advantages: Simplified random data generation: It automates generating random data for your objects, saving you from writing repetitive code for each test. Facilitates unit and integration testing: By automatically generating test objects, you can focus on testing the code's behavior instead of worrying about manually creating test data. Data customization: Although it generates random data by default, EasyRandom also allows you to customize certain fields or attributes if necessary, allowing you to adjust the generation according to your needs. Reduced human error: Manual generation of test data can lead to errors, especially when dealing with many fields and combinations. EasyRandom helps minimize human errors by generating consistent random data. Simplified maintenance: If your class requirements change (new fields, types, etc.), you do not need to manually update your test data, as EasyRandom will generate them automatically. Improved readability: Using EasyRandom makes your tests cleaner and more readable since you do not need to define test values explicitly in each case. Faster test development: By reducing the time spent creating test objects, you can develop tests faster and more effectively. Ease of use: Adding this library to our Java projects is practically immediate, and it is extremely easy to use. Where Can You Apply It? This library will allow us to simplify the creation of objects for our unit tests, but it can also be of great help when we need to generate a set of test data. This can be achieved by using the DTOs of our application and generating random objects to later dump them into a database or file. Where it is not recommended: this library may not be worthwhile in projects where object generation is not complex or where we need precise control over all the fields of the objects involved in the test. How To Use EasyRandom Let's see EasyRandom in action with a real example, environment used, and prerequisites. Prerequisites Java 8+ Maven or Gradle Initial Setup Inside our project, we must add a new dependency. The pom.xml file would look like this: XML <dependency> <groupId>org.jeasy</groupId> <artifactId>easy-random-core</artifactId> <version>5.0.0</version> </dependency> Basic Use Case The most basic use case has already been seen before. In this example, values are assigned to the fields of the person class in a completely random way. Obviously, when testing, we will need to have control over some specific fields. Let's see this as an example. Recall that EasyRandom can also be used with primitive types. Therefore, our example could look like this. Java public class PersonServiceTest { private final EasyRandom easyRandom = new EasyRandom(); private final PersonService personService = new PersonService(); @Test public void testIsAdult() { Person adultPerson = easyRandom.nextObject(Person.class); adultPerson.setAge(18 + easyRandom.nextInt(80)); assertTrue(personService.isAdult(adultPerson)); } @Test public void testIsNotAdult() { Person minorPerson = easyRandom.nextObject(Person.class); minorPerson.setAge(easyRandom.nextInt(17)); assertFalse(personService.isAdult(minorPerson)); } } As we can see, this way of generating test objects protects us from changes in the "Person" class and allows us to focus only on the field we are interested in. We can also use this library to generate lists of random objects. Java @Test void generateObjectsList() { EasyRandom generator = new EasyRandom(); //Generamos una lista de 5 Personas List<Person> persons = generator.objects(Person.class, 5) .collect(Collectors.toList()); assertEquals(5, persons.size()); } This test, in itself, is not very useful. It is simply to demonstrate the ability to generate lists, which could be used to dump data into a database. Generation of Parameterized Data Let's see now how to use this library to have more precise control in generating the object itself. This can be done by parameterization. Set the value of a field. Let's imagine the case that for our tests, we want to keep certain values constant (an ID, a name, an address, etc.) To achieve this, we would have to configure the initialization of objects using "EasyRandomParameters" and locate the parameters by their name. Let's see how: Java EasyRandomParameters params = new EasyRandomParameters(); // Asignar un valor al campo por medio de una función lamba params.randomize(named("age"),()-> 5); EasyRandom easyRandom = new EasyRandom(params); // El objeto tendrá siempre una edad de 5 Person person = easyRandom.nextObject(Person.class); Of course, the same could be done with collections or complex objects. Let's suppose that our class Person, contains an Address class inside and that, in addition, we want to generate a list of two persons. Let's see a more complete example: Java EasyRandomParameters parameters = new EasyRandomParameters() .randomize(Address.class, () -> new Address("Random St.", "Random City")) EasyRandom easyRandom = new EasyRandom(parameters); return Arrays.asList( easyRandom.nextObject(Person.class), easyRandom.nextObject(Person.class) ); Suppose now that a person can have several addresses. This would mean the "Address" field will be a list inside the "Person" class. With this library, we can also make our collections have a variable size. This is something that we can also do using parameters. Java EasyRandomParameters parameters = new EasyRandomParameters() .randomize(Address.class, () -> new Address("Random St.", "Random City")) .collectionSizeRange(2, 10); EasyRandom easyRandom = new EasyRandom(parameters); // El objeto tendrá una lista de entre 2 y 10 direcciones Person person = easyRandom.nextObject(Person.class); Setting Pseudo-Random Fields As we have seen, setting values is quite simple and straightforward. But what if we want to control the randomness of the data? We want to generate random names of people, but still names and not just strings of unconnected characters. This same need is perhaps clearer when we are interested in having randomness in fields such as email, phone number, ID number, card number, city name, etc. For this purpose, it is useful to use other data generation libraries. One of the best-known is Faker. Combining both libraries, we could get a code like this: Java EasyRandomParameters params = new EasyRandomParameters(); //Generar número entre 0 y 17 params.randomize(named("age"), () -> Faker.instance().number().numberBetween(0, 17)); // Generar nombre "reales" aleatorios params.randomize(named("name"), () -> Faker.instance().name().fullName()); EasyRandom easyRandom = new EasyRandom(params); Person person = easyRandom.nextObject(Person.class); There are a multitude of parameters that allow us to control the generation of objects. Closing EasyRandom is a library that should be part of your backpack if you develop unit tests, as it helps maintain unit tests. In addition, and although it may seem strange, establishing some controlled randomness in tests may not be a bad thing. In a way, it is a way to generate new test cases automatically and will increase the probability of finding bugs in code.
"Is it working now?" asked the Product Owner. "Well... I hope so. You know this bug, it can not really be reproduced locally, therefore I can not really test if it works now. The best I can do is deploy it to prod and wait." The answer did not make the Product Owner particularly happy, but he also knew the bug appears only when an API called by his application has a quick downtime exactly at the time when the user clicks on a specific button. The daily stand-up, which was the environment of the small conversation, went on, and nobody wanted to dedicate much time or attention to the bug mentioned - except for Jack, the latest addition to the team, who was concerned about this "hit deploy and roll the dice" approach. Later that day, he actually reached out to Bill - the one who fixed the bug. "Can you tell me some details? Can not we write some unit tests or so?" "Well, we can not. I actually did not really write much code. Still, I have strong faith, because I added @Retryable to ensure the API call is being re-tried if it fails. What's more, I added @Cacheable to reduce the amount of calls fired up against the API in the first place. As I said in the daily, we can not really test it, but it will work on prod." With that Bill wanted to close this topic and focus on the new task he picked up, but Jack was resistant: "I would still love to have automated tests on that," stated Jack. "On what? You should not unit-test Spring. Those guys know what they are doing." "Well, to be honest, I am not worried about Spring not working. I am worried about us not using it the right way." The Challenge This is the point: when we can abandon Jack and Bill, as we arrived at in the main message of this article, I have seen the following pattern multiple times. Someone resolves an issue by utilizing some framework-provided, out-of-the-box functionality. In many cases, it is just applying an annotation to a method or a class and the following sequence happens: The developer argues there is nothing to write automated tests for, as it is a standard feature of the framework that is being used. The developer might or might not at least test it manually (and like in the example above, the manual test might happen on a test environment, or might happen only on a prod environment). At some point, it breaks, and half of the team does not know why it is broken, the other half does not know why it used to work at all. Of course, this scenario can apply to any development, but my observation is that framework-provided features (such as re-try something, cache something, etc.) are really tempting the developers to skip writing automated tests. On a side note, you can find my more generic thoughts on testing in a previous article. Of course, I do not want to argue for testing a framework itself (although no framework is bug-free, you might find actual bugs within the framework). But I am strongly arguing for testing that you are using the framework properly. In many cases it can be tricky, therefore, in this tutorial, you will find a typically hard-to-test code, tips about how to rework and test it, and the final reworked version of the same code. Code That Is Hard To Test Take a look at the following example: Java @Slf4j @Component public class Before { @Retryable @Cacheable("titlesFromMainPage") public List<String> getTitlesFromMainPage(final String url) { final RestTemplate restTemplate = new RestTemplate(); log.info("Going to fire up a request against {}", url); final var responseEntity = restTemplate.getForEntity(url, String.class); final var content = responseEntity.getBody(); final Pattern pattern = Pattern.compile("<p class=\"resource-title\">\n(.*)\n.*</p>"); final Matcher matcher = pattern.matcher(content); final List<String> result = new ArrayList<>(); while (matcher.find()) { result.add(matcher.group(1).trim()); } log.info("Found titles: {}", result); return result; } } It is fair to say it is tricky to test. Probably your best shot would be to set up a mock server to respond to your call (for example, by using WireMock) as follows: @WireMockTest public class BeforeTest { private static final Before BEFORE_INSTANCE = new Before(); @Test public void testCall(final WireMockRuntimeInfo wmRuntimeInfo) { stubFor(get("/test-url").willReturn(ok( "<p class=\"resource-title\">\nFirst Title\n.*</p><p class=\"resource-title\">\nOther Title\n.*</p>"))); final var titles = BEFORE_INSTANCE.getTitlesFromMainPage("http://localhost:"+wmRuntimeInfo.getHttpPort()+"/test-url"); assertEquals(List.of("First Title", "Other Title"), titles); } } Many of the average developers would be happy with this test. Especially, after noticing, that this test generates 100% line coverage. And some of them would entirely forget to add @EnableCaching ... ... or add @EnableRetry ... ... or to create a CacheManager bean That would not only lead to multiple rounds of deployment and manual testing, but if the developers are not ready to admit (even to themselves) that it is their mistake, such excuses like, "Spring does not work," are going to be made. Let’s Make Life Better! Although my plan is to describe code changes, the point is not only to have nicer code but also to help developers be more reliable and lower the number of bug tickets. Let's not forget, that a couple of unforeseen bug tickets can ruin even the most carefully established sprint plans and can seriously damage the reputation of the developer team. Just think about the experience that businesses have: They got something delivered that does not work as expected The new (in progress) features are likely not to be delivered on time because the team is busy fixing bugs from previous releases. Back to the code: you can easily identify a couple of problems like the only method in the class is doing multiple things (a.k.a., has multiple responsibilities), and the test entirely ignores the fact that the class is serving as a Spring bean and actually depending on Spring's features. Rework the class to have more methods with less responsibility. In cases you are depending on something brought by annotations, I would suggest having a method that serves only as a proxy to another method: it will make your life seriously easier when you are writing tests to find out if you used the annotation properly. Step 1 probably led you to have one public method which is going to be called by the actual business callers, and a group of private methods (called by each other and the public method). Let's make them default visibility. This enables you to call them from classes that are in the same package - just like your unit test class. Split your unit test based on what aspect is tested. Although in several cases it is just straightforward to have exactly one test class for each business class, nobody actually restricts you to have multiple test classes for the same test class. Define what you are expecting: for example, in the test methods, when you want to ensure that retry is working, you do not care about the actual call (as for how the business result is created, you will test that logic in a different test anyway). There you have such expectations as: if I call the method and it throws an exception once, it will be called again. If it fails X times, an exception is thrown. You can also define your expectations against cache: you expect subsequent calls on the public method to lead to only one call of the internal method. Final Code After performing Steps 1 and 2, the business class becomes: Java @Slf4j @Component public class After { @Retryable @Cacheable("titlesFromMainPage") public List<String> getTitlesFromMainPage(final String url) { return getTitlesFromMainPageInternal(url); } List<String> getTitlesFromMainPageInternal(final String url) { log.info("Going to fire up a request against {}", url); final var content = getContentsOf(url); final var titles = extractTitlesFrom(content); log.info("Found titles: {}", titles); return titles; } String getContentsOf(final String url) { final RestTemplate restTemplate = new RestTemplate(); final var responseEntity = restTemplate.getForEntity(url, String.class); return responseEntity.getBody(); } List<String> extractTitlesFrom(final String content) { final Pattern pattern = Pattern.compile("<p class=\"resource-title\">\n(.*)\n.*</p>"); final Matcher matcher = pattern.matcher(content); final List<String> result = new ArrayList<>(); while (matcher.find()) { result.add(matcher.group(1).trim()); } return result; } } On a side note: you can, of course, spit the original class even to multiple classes. For example: One proxy class which only contains @Retryable and @Cacheable (contains only getTitlesFromMainPage method) One class that only focuses on REST calls (contains only getContentsOf method) One class that is responsible for extracting the titles from HTML content (contains only extractTitlesFrom method) One class which orchestrates fetching and processing the HTML content (contains only getTitlesFromMainPageInternal method) I am convinced that although in that case, the scope of the classes would be even more strict, the overall readability and understandability of the code would suffer from having many classes with 2-3 lines of business code. Steps 3 and 4 lead you to the following test classes: Java @ExtendWith(MockitoExtension.class) public class AfterTest { @Spy private After after = new After(); @Test public void mainFlowFetchesAndExtractsContent() { doReturn("contents").when(after).getContentsOf("test-url"); doReturn(List.of("title1", "title2")).when(after).extractTitlesFrom("contents"); assertEquals(List.of("title1", "title2"), after.getTitlesFromMainPage("test-url")); } @Test public void extractContent() { final String htmlContent = "<p class=\"resource-title\">\nFirst Title\n.*</p><p class=\"resource-title\">\nOther Title\n.*</p>"; assertEquals(List.of("First Title", "Other Title"), after.extractTitlesFrom(htmlContent)); } } Java @WireMockTest public class AfterWireMockTest { private final After after = new After(); @Test public void getContents_firesUpGet_andReturnsResultUnmodified(final WireMockRuntimeInfo wmRuntimeInfo) { final String testContent = "some totally random string content"; stubFor(get("/test-url").willReturn(ok(testContent))); assertEquals(testContent, after.getContentsOf("http://localhost:" + wmRuntimeInfo.getHttpPort() + "/test-url")); } } Java @SpringBootTest public class AfterSpringTest { @Autowired private EmptyAfter after; @Autowired private CacheManager cacheManager; @BeforeEach public void reset() { after.reset(); cacheManager.getCache("titlesFromMainPage").clear(); } @Test public void noException_oneInvocationOfInnerMethod() { after.getTitlesFromMainPage("any-test-url"); assertEquals(1, after.getNumberOfInvocations()); } @Test public void oneException_twoInvocationsOfInnerMethod() { after.setNumberOfExceptionsToThrow(1); after.getTitlesFromMainPage("any-test-url"); assertEquals(2, after.getNumberOfInvocations()); } @Test public void twoExceptions_threeInvocationsOfInnerMethod() { after.setNumberOfExceptionsToThrow(2); after.getTitlesFromMainPage("any-test-url"); assertEquals(3, after.getNumberOfInvocations()); } @Test public void threeExceptions_threeInvocationsOfInnerMethod_andThrows() { after.setNumberOfExceptionsToThrow(3); assertThrows(RuntimeException.class, () -> after.getTitlesFromMainPage("any-test-url")); assertEquals(3, after.getNumberOfInvocations()); } @Test public void noException_twoPublicCalls_InvocationsOfInnerMethod() { assertEquals(0, ((Map)cacheManager.getCache("titlesFromMainPage").getNativeCache()).size()); after.getTitlesFromMainPage("any-test-url"); assertEquals(1, after.getNumberOfInvocations()); assertEquals(1, ((Map)cacheManager.getCache("titlesFromMainPage").getNativeCache()).size()); after.getTitlesFromMainPage("any-test-url"); assertEquals(1, after.getNumberOfInvocations()); assertEquals(1, ((Map)cacheManager.getCache("titlesFromMainPage").getNativeCache()).size()); } @TestConfiguration public static class TestConfig { @Bean public EmptyAfter getAfter() { return new EmptyAfter(); } } @Slf4j public static class EmptyAfter extends After { @Getter private int numberOfInvocations = 0; @Setter private int numberOfExceptionsToThrow = 0; void reset() { numberOfInvocations = 0; numberOfExceptionsToThrow = 0; } @Override List<String> getTitlesFromMainPageInternal(String url) { numberOfInvocations++; if (numberOfExceptionsToThrow > 0) { numberOfExceptionsToThrow--; log.info("EmptyAfter throws exception now"); throw new RuntimeException(); } log.info("Empty after returns empty list now"); return List.of(); } } } Note that the usage of various test frameworks is separated: the class that actually tests if Spring features are used correctly has SpringRunner, but is not aware of WireMock and vice-versa. There is no "dangling" extra configuration in the test classes, which is used only by a fraction of the test methods in a given test class. On a side note to AfterSpringTest: Usage of @DirtiesContext on the class could be an alternative to manually resetting the cache in reset() method, but doing the clean-up manually is a more performant way. My advice on this question is: Do a manual reset if the scope of what to reset is small (this is normally the case in unit tests). Reset the beans by annotation if many beans are involved or cleaning the context would require complex logic (this is the case in many integration and system tests). You can find the complete code on GitHub. Final Thoughts After all the reworking and creating extra test classes, what would happen now if you delete @EnableRetry or @EnableCaching from the configuration? What would happen if someone would delete or even modify @Retryable or @Cacheable on the business method? Go ahead and try it out! Or trust me when I say unit tests would fail. And what would happen if a new member joins the team to work on such code? Based on the tests, he would know what is the expected behavior of the class. Tests are important. Quality tests can help you to produce code that others can better understand, can help you to be more reliable, and identify bugs faster. Tests can be tricky, and tests can be hard to write. But never forget, that if someone says, "That can not be tested," in the overwhelming majority of cases, it only means "I don't know how to test it and not caring to figure it out."
Justin Albano
Software Engineer,
IBM
Thomas Hansen
CTO,
AINIRO.IO
Soumyajit Basu
Senior Software QA Engineer,
Encora