An overview of software testing



August 03, 2015

"Software testing" is a term used to refer to a variety of methods, tools, and practices for verifying that a software application works, at many different levels.

All of us in the web development industry do some sort of software testing (even if the testing we do is manual and ad hoc, e.g.: refreshing a webpage after making a change to ensure the code you just wrote works). In this blog post, I will attempt to discuss why you might want to move away from ad hoc testing to more-formal testing, try to dispel some myths about formal testing, and give a high-level overview of the different ways that software can be tested, and the different levels of software testing that exist.

First: a disclaimer — I am not a software testing expert, and to a large extent, I wrote this to help wrap my head around software testing. Also, trying to fit complex, multi-faceted things into neat boxes is difficult (and/or controversial), so be aware that others might classify testing differently. If you find out mistakes or agree/disagree with my categorizations, please let me know in the comments below.

This blog post assumes that:

  • you work in the software development industry, and,
  • you've done manual, ad hoc tests at some point (and have an idea of why they are important).

What do you mean by formal tests?

This blog article considers "formal" tests to be tests which are recorded in some way so that other people (i.e.: not the original developer) or software can run them at any time.

Why test formally?

Software testing is an important tool with benefits such as:

  • a "safety net" to ensure each subsequent change to a program doesn't cause a regression or loss-of-functionality,
  • a "sanity check" to ensure that the code written meets the expectations / requirements of the client and your peers,
  • a way to ensure that your code is not too tightly coupled to a particular set of assumptions or use-case (this evolves naturally as you learn to write testable code),
  • a collaboration tool to ensure that someone else's work doesn't inadvertently break your work and vice-versa,
  • a portability check to ensure that code works outside of your specific environment's assumptions, and even,
  • a way to ensure that time is not wasted over-engineering solutions to problems.

Depending on the nature of your work and clients, manual ad hoc testing might be sufficient for a while.

However, as projects, development time, product lifecycles, and/or consequences-of-failure grow in size, complexity, and budget, the benefits listed above become increasingly important to limit the risk of client dissatisfaction, going overbudget, getting stuck in development/bug-fixing hell, and/or the risk of producing software that requires a high level of skilled maintenance to continue functioning.

Formalizing these tests (even manual tests) by recording them in some way allows you to run them the same way every time you run them, ensuring you don't forget something important. It also allows your co-workers to run them if you're busy, or computers to run them if they're automated. And, formal manual test cases can usually be converted to automated tests with little effort.

Myths of formal testing

Myth: Writing formal software tests isn't worth the cost.
While it certainly takes a while for new testers to get used to how testing works, and it is possible to write too many tests, even a little bit of testing can greatly reduce development time later in the development cycle, both before the product is launched, and during maintenance later on. Acceptance and refinement tests specifically (see below) can also help to give all involved parties a sense of what needs to be focused on, reducing project management time, wasted development time, and the risk of producing a product that clients (or in the case of a website, the general public) doesn't want or cannot use.
Myth: Formal software tests must be written only by software developers.
While developers must write certain tests, and are typically capable of writing most (if not all) types of tests, there are certain types of software tests which are better written by product owners, clients, testers, developers, and designers.
Myth: Most tests should be written after the system is (mostly) functionally-complete.
While the Waterfall software development model states this, the reality is that the earlier you start testing, the earlier you find and correct bugs, missing functionality, architecture problems, and usability issues.

A brief note about test-driven development (TDD)

A powerful way to write software is using Test-Driven Development (TDD). Simply put, TDD is the practice of writing tests first (so your intents are clearly stated), then writing the code to meet the requirements of the test. Test-driven development is applicable to all areas of software testing, except, perhaps, certain types of refinement tests.

Test-driven development has two distinct advantages:

  1. it ensures that everything you write from the point when you start using TDD is well-tested ("has good test coverage"), and,
  2. if you follow the TDD principle to stop developing when the tests pass, you can ensure that you don't waste time over-engineering solutions to problems.

I highly recommend reading Test-Driven Development By Example by Kent Beck (Addison-Wesley, 2003).

Classifying tests

All types of software tests can be classified as white-box, black-box, or grey-box:

White-box tests
Clear-box tests
Tests written to test the internal structure of something, as opposed to it's functionality. Named because you typically look at the code you're testing, and write your tests to target it.
Black-box tests
Tests written to test the functionality of something, as opposed to it's internal structure. Named because you typically (pretend that) you know nothing about the implementation of something, and write your tests to do what you'd expect it to do.
Grey-box tests
A combination of white- and black-box testing, used to test for improper structure or usage defects.

Levels of software testing

I group software testing into five main levels,

  1. Unit tests,
  2. Integration tests,
  3. Acceptance tests,
  4. Refinement tests, and,
  5. System tests.

Generally, the number of tests decrease; but the coverage scope, run-time, effort and cost of the tests increase as you go down the list (i.e.: in a well tested system, there are many unit-tests, each covering a small scope (e.g.: a single function), but few system tests, each covering a wide scope (e.g.: many objects, functions, and subsystems working together as-intended)).

To help you keep track of all these things, I've put together a software testing cheat sheet for you to download. The cheatsheet's source code is on Github, so feel free to improve it!

Unit tests

Unit tests are typically white-box tests to cover edge-cases and code branches (code paths) in a limited scope, typically on a per-function level. Each function typically has one or more unit tests that focus on ensuring each code path functions in the expected manner. Small changes to the code under test typically require changes to the unit tests to incorporate the new changes and ensure new code paths are covered.

Unit tests are typically short and quickly-written by the software developer who is writing the code under test, using carefully-chosen test parameters (typically edge-cases); by checking pre- and post-conditions, state-changes, and output; and by mocking dependencies to ensure that the test only covers the function under test.

Ideally, unit tests are written before the function(s) they test (i.e.: following Test Driven Development principles). If writing and performing unit tests is required for new work, your last chance to do so is before committing or submitting your code for code review.

Key technologies for unit testing are:

  • phpUnit, RSpec, Jasmine, QUnit, Unit.js, etc. (sometimes collectively referred to as xUnit because many follow a naming pattern of <language abbreviation>Unit).
    • Note that SimpleTest in Drupal 6 contrib and in Drupal 7 core can be used to do unit tests, but it's not very good at it, so it's been replaced with phpUnit in Drupal 8.
  • Mock objects (e.g.: Hamcrest).

... note that many of the key technologies can also be used for integration and acceptance (behaviour) testing.

Other names for unit tests are:

  • component tests

Some books on unit tests and mock objects that I've read and recommend are:

Integration tests

Integration tests are typically grey-box tests to cover interactions between objects and interfaces, on a limited scope, typically on a per-object or per-subsystem level. Each object and subsystem typically has one or more integration tests that focus on ensuring the object/subsystem interacts with it's dependencies in the expected manner. Small changes to the code under test might require changes to the integration tests, especially when they change the way an object/subsystem interacts with other objects/subsystems.

Integration tests are typically medium-length and quickly-written by the software developer who is writing the object/subsystem code, by checking success and failure states, how they interact with other components (through mock objects), what they accept as input, what they produce as outputs. Mocking dependencies, shared resources and inter-process-communication is common practice at this level.

Ideally, integration tests are written before the object(s) they test (i.e.: following TDD principles). If writing and performing integration tests is required for new work, your last chance to do so is before committing or submitting your code for code review.

Key technologies for integration testing are:

... again, because unit tests and integration tests are so similar, they share a lot of the same technologies.

Other names for integration tests are:

  • integration & testing (I&T)

Some books on unit tests and mock objects that I've read and recommend are:

Acceptance tests

Acceptance tests are typically black-box tests to cover how a user moves through a system on a broader scope, typically at the user-interface level. Each user-interface display typically has a number of acceptance tests that focus on ensuring the user can complete a certain set of tasks using that display, that the display functions correctly, and that it meets the client's requirements. Small changes to the code under test only require changes to the acceptance tests if the changes affect the user interface.

Acceptance tests are typically short and written by the product owner, client, tester, or in some cases, the developer, by looking at the user story / ticket and outlining the steps the user must take to complete the requested functionality.

Ideally, acceptance tests are written by the client, or product owner before the UI is developed and any work is done. If writing and performing acceptance tests is required for new work, your last chance to do so is before committing or submitting your code for code review.

Ideally, parameters for accessibility testing (i.e.: level of conformance) and browser testing (i.e.: list of supported browsers) should be defined by the client before any work is done; and the latest these should be defined is before any work is started.

Key technologies for acceptance testing are:

  • Manual tests (using manual test cases).
  • The Gherkin domain-specific language.
    • Tests written in Gherkin are turned into mouse clicks, keystrokes, etc. by interpreters (e.g.: Behat, Cucumber, SpecFlow).
      • These interpreters forward these mouse clicks, keystrokes, etc. through controllers (e.g.: Mink, Webdriver, CasperJS) to browsers.
        • Controllers like Mink or Webdriver then control:
    • Behavioural output from browsers is usually forwarded back to the Gherkin interpreters for analysis.
    • Visual output from browsers can be forwarded to visual regression tools (e.g.: the Galen framework).
  • SimpleTest in Drupal 6 contrib and in Drupal 7 core.
  • Visual difference tools (e.g.: Shoov).
  • Accessibility testing, both automated (e.g.: Quail) and manual (e.g.: VoiceOver, JAWS).
  • Browser testing (i.e.: ensuring it works in each browser), either automatically (typically performed with the Gherkin stack above), or manually (using a manual test case with local browsers, browsers in virtual machines, or browsers in the cloud).

Other names for acceptance tests are:

  • conformance tests
  • behaviour tests
  • functional tests
  • User Acceptance Tests (UAT)
  • validation tests

Refinement tests

Refinement tests are usually black-box tests to determine if a proposed UI is:

  • useful (does the system do what the user wants to do)
  • usable (can the user figure out how to get their tasks done)
  • aesthetically pleasing (does the UI look good)
  • identifiable (is the purpose of the system clear)
  • inspirational (does the system inspire the users to use it), and,
  • valuable (does the user value the system and it's purpose)

... on a broad scope, at the user-interface level. Each design stage / refinement typically undergoes a series of tests to ensure the design provides a good end-user experience. Changes to the code under test don't normally require changes to the refinement tests unless a desired UI element is deemed too costly to complete.

Refinement tests are typically written by a designer or UX expert by presenting a UI design to users and gathering data on how they experience it, and may be repeated as the design progresses. Beginning refinement testing before any real (non-prototype) code is implemented often reduces the number of design / UI changes later on in the development cycle, as the biggest usability problems can be identified as early as possible.

Ideally, refinement tests are written before the UI is developed and any work is done. If writing and performing refinement tests is required for new work, your last chance to do so is before committing or submitting your code for code review.

Key technologies for refinement testing are:

  • Visual difference tools (e.g.: Shoov).
  • A/B testing frameworks (e.g.: Acquia Lift).
  • Accessibility testing, both automated (e.g.: Quail) and manual (e.g.: VoiceOver, JAWS).
  • Talking to actual users.
  • Observing how users use the system.

Other names for refinement tests are:

  • UX tests
  • Usability tests
  • Business-value tests
  • Return On Investment (ROI) tests
  • End-user tests
  • Field tests

Some books on refinement tests that I've read and recommend are:

System testing

System tests are typically black-box tests to cover the system as a whole and catch regressions, on an extremely broad scope, at the user-interface level. The system usually has a number of system tests, each of which may focus on an important aspect of the system. Small changes to the code under test don't normally require changes to the system tests.

System tests are typically written by the tester, developer, and product owner, to by writing regression tests, setting performance acceptability thresholds, stress-testing the system, ensuring the system is compatible with the systems it's designed to work with (e.g.: where it will be deployed), and whether the system can recover from unexpected, serious problems outside of it's control (recovery tests).

Ideally, system tests are written or their parameters are chosen before any development work is done. If writing and performing system tests is required for new work, your last chance to do this is after integrating code together, usually before a demo to the client or a beta test.

One thing to note about system testing is that the results of the tests aren't always a strict pass/fail, and therefore aren't always actionable. In this case, trends are more important: you should expect software that just got a whole bunch of new features to be a bit slower; but if it's a lot slower, then you should probably examine the new code and schedule some time to refactor.

Key technologies for system testing are:

  • Analysis tools (e.g.: New Relic)
  • Load-testing tools (e.g.: Siege)
  • Manual tests

Other names for system tests are:

  • End-user tests
  • Field tests
  • Regression tests
  • Performance tests
  • Failure tests

Bonus: Smoke tests

Smoke tests are extremely-small, preliminary, typically ad-hoc tests to reveal major defects that would typically preclude someone's ability to run other tests. For example:

  • code that does not compile because of a syntax error prevents all tests from being run, therefore, syntax linters could be considered a type of smoke test;
  • if a file added in the latest deployment does not exist after the deployment, or the file is empty (and it isn't supposed to be), that's usually a good indication that a deployment failed.

Rumor has it that the term "smoke tests" is a calque from the discipline of electrical engineering, where, if you plug something in and it starts to smoke, that typically indicates that something is wrong and would prevent further testing.

Other names for smoke tests are:

  • confidence tests
  • sanity tests
  • intake tests

In conclusion

Hopefully, this blog post will help you to understand some of the terminology that surrounds software testing.

Special thanks to my friend Anand Sharma, who is just as passionate about software quality, and has been a big help and influence in helping me learn software testing. He also reviewed the software testing cheat sheet for me. Also, thanks to Patrick Connelly for the great tools he built and conversations we've had about continuous integration and software quality in general.

Some books on this subject that I've read and strongly recommend are:

If you're looking to implement formal software testing at your organization, don't be discouraged by the fact that it takes a while to learn how to write good tests: it's worth the investment.

Next week, I'll write a bit about automated testing, continuous integration, and continuous delivery.

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.