Checking your app’s paid services without losing money

Published in

Bumble Tech

15 min readJun 30, 2020

Checking paid services is crucial in Badoo and Bumble apps testing process. Our applications are integrated with 70 payment providers in 250 countries. This means that bugs at any integration can seriously affect revenues, and in unpredictable ways.

In this article, you will learn about methods that we employ in testing paid services. You will also discover their various limitations and at which stage the methods are most efficient. The content will be useful for QAs, developers and product managers already handling projects that are integrated with payment providers and those embarking on starting such an integration. If you are also interested in testing iOS apps automation, my colleague Viktar Karanevich will soon publish an article about that — we will keep you updated.

Example connected with Apple are actual for the first biannual of 2020. In the Autumn of 2020, Apple announced the release of StoreKit Testing in Xcode that should eliminate all problems connected with sandbox of this provider. Anyway, the approach of mocks, fakes and stubs is quite fundamental to refuse it, because we don’t know who will be the leader of the industry in the future and how they will deal with sandbox.

Billing testing particularity

The goal of most businesses is usually to generate revenue. The revenue in our applications consists of credits/coins (the app’s internal currency) and premium subscriptions. To maintain our paid features, we have integrated more than 70 payment providers. Choosing the best provider for our needs depends on multiple factors: platform, country, device, mobile operator and many more. Given the sheer number of payment providers we use, the testing of our paid features is unavoidably complex.

There are two reasons why testing paid features is top priority.

1. Bugs in paid services can affect revenue or reputation

Bugs in paid services damage reputation. Paying users are more vulnerable to, and less tolerant of app bugs. Public reviews or comments in the Apple AppStore or Google Play from users encountering a bug in a payment service are usually highly emotional in tone.

Loss of reputation can easily turn into substantial financial loss. A company that started charging users is required by law to respect their legal rights as consumers. There are three scenarios of how a company can lose money due to bugs in pay services.

Refunds. Let’s say that one of your users discovers that the service you are charging them for has a bug. They report it to your Support team. The team investigates and confirms that it is indeed a bug. In such case, the company will likely initiate a refund process which incurs an income reduction. This is the least harmful way the company loses money.

Chargebacks. Suppose, this time, that the user, instead of seeking assistance from your Support team, directly addresses the bank, card issuer or pay provider that initiated the charge. The danger here is greater than a mere loss of profit because after a few chargebacks the company can be fined and suddenly have its credit rating downgraded. This in its turn may lead to the pay provider increasing its service fees.

Claims. In a worst-case scenario, the company might face claims resulting in potentially severe negative consequences. Read about such a claim here.

2. Testing of integrations requires in-depth knowledge

Teams starting integration with payment providers typically face the same set of problems. Lacking knowledge of all possible billing scenarios they run the risk of missing important nuances when developing responses of the app to possible notifications to payment providers. Outcomes can end unpredictably. Possible actions in processing payments are shown in this Figure below.

There are three basic cases types here: error, success, money returns. But every case has details that your app needs to process differently.

Payment processing errors are either minor or critical. i.e. a minor error would be a notification about an insufficient balance; a critical error is when the payment method is blocked. In the first case, you can ask the user to pay later, but in the second case you need to establish why the payment was blocked. It might be because the user transaction was notified as fraudulent in which case you should exercise caution when dealing with this user’s other activities.

Money returns (as described above) can be initiated either by the app owner (refunds) or by the card issuer/provider (chargebacks). It makes sense to exercise caution over a user’s activity following chargebacks as they are a very common means of fraud.

Successful payments have their nuances as well. They can be divided into one-off payments or subscriptions. One-off payments are consumable or non-consumable. i.e. a consumable payment is Badoo Credits (see above). A consumable payment is, for example, superpowers for your game avatar which have been active for a certain time period. Subscriptions, on the other hand, compromise activities such as start, renew and cancel. It might also include a trial period or a grace period; they might also have a partial invoice. Different integrations can be fulfilled as internally managed or externally managed.

Internally managed subscriptions are ones provided by Credit Card payment providers or PayPal. After the initial payment, your system receives the token allowing you to make other payments without requesting user details.

Externally managed subscriptions are made by providers like Apple AppStore, Google Wallet and numerous mobile providers. A provider charges the user and sends notifications to your app only about the purchase or the subscription status.

Below, the purple elements are the most obvious cases the first to be fulfilled in many systems. Other cases get taken into consideration much later due to incorrect usage of iterative approaches in billing. This can cause some problems.

Bugs in pay services need to be identified before release because they can cause a company very serious damage. If you missed a bug in a pay service, immediately it comes to the spotlight. It is essential to highlight, investigate and fix it and, most importantly, to alert users having difficulties and reassure them. This last point is very often forgotten. However, testing of pay services is a much more complex engineering task because it involves interacting with a system with multiple unknown variables.

Technical issues in billing testing

Let’s consider the issues surrounding the integration of Badoo with the Apple AppStore. Given that externally managed AppStore subscriptions are managed completely on the provider side, the App can only request/receive an actual state of the subscription. Choosing this provider for consideration was deliberate, because for us it is the most complex type of subscription and contains the full variety of cases available in integration with payment providers. Let’s look at one type of payment first.

Step 1 — user requests service, app reveals that the user doesn’t have enough credits and must pay.

Step 2 — payment provider starts working.

Step 3 — provider gives the payment form to the user.

Step 4 —the user provides payment details.

Step 5 — provider completes a transaction and issue response receipt containing purchase information (i.e. data, service, status etc).

Step6 — app enriches receipt and sends it to server.

Step 7 — Server processes receipt, generates push and sends it to the app.

Step 8 — app shows push to the user.

The testing problem here is that steps 3, 4 and 5 are all executed on the payment provider side and so, our test cannot check these steps directly. These steps can end in many ways. Thus, the process is not as linear as in the figure. Instead, we need to consider different branches and every branch can generate a different reaction from our app.

Figure 4. Branching of one-off payment processes.

Initial payment for subscription is the same as a one-off payment, but further processing looks different.

Figure 5. Processing externally managed subscription following one-off payment

Step 9. AppStore subscriptions are externally managed. This means that they can be managed by the user asynchronously. Users can cancel it, change the period, or request a chargeback.

Step 10 shows that AppStore can change subscription status: renew, close, or put into grace period.

Step 11 is specific for providers like AppStore and GoogleWallet. Our app sends token to provider server (receipt received on previous payment).

Step 12 provider validates token and for valid token answers according to current subscription state. The outcome of this step depends on steps 9 and 10.

Step 13 The common step for all payment providers. The provider sends notification after changing subscription state (this step obviates steps 11 and 12, but Apple fulfilled this step in autumn 2018 and therefore we need to support them for backwards compatibility)

Step 14 server returns push, based on information on current subscription state.

This diagram shows all the states

Steps in orange blocks show process elements which cannot be checked in our test and contain many unknown variables.

Paid service testing approaches

Given that the process of paid service testing has many unknown variables there are three approaches which enable us to reduce their impact.

As we’ve seen, the main technical challenge in testing paid service is its interaction with systems which cannot be fully checked in our test. In order to reduce the impact of this, we can use a combination of three methods: real payments, sandboxes, and reducing external dependencies.

Real payments

Real payment has only one advantage: it’s clearly evident when integration isn’t working. :) Any bug revealed during real payment is unquestionably a real problem, which real users may well encounter. But there are no other advantages of testing using this method.On the contrary, it’s very expensive: you both spend real money and deplete other resources, such as time (which can often be a greater cost).

It’s a mistake to think that money spent during this kind of payment testing will ever be returned to the company: every test cost up to 40% of the payment in the form of a service charge made by the payment provider. Test payments involving overseas payments incur additional charges due to currency spread (some cards issued by banks charge you in the currency of your testing credit card at the bid price, but make any refund at the ask price). You also have to wait for subscription periods to expire in order to check renewals or expriy.

Sandboxes

Sandboxes are a great idea. Sandboxes provide all the functionality of the real payment provider without the need to spend actual money and they sometimes even allow you to manage subscription times as well. Because sandboxes are maintained by the provider, this also means that this payment method requires no monetary outlay.

To check time-based services, sandboxes use various different options. i.e, AppStore Sandbox uses the following time map:

Google Wallet uses the following time map by default and it can be set up in the merchant console:

Unlike Apple, Google Sandbox is able to check cases like trials and grace-periods using time matching.

Subscription closing is fulfilled in different ways depending on the sandboxes used: AppStore sandbox closes subscriptions after the 5th renewal; Google Wallet allows you to close subscriptions from the merchant console or via Play Store.

Providers also maintain sandboxes differently. Our experience has shown that of the 70 payment providers integrated into Magiclab Apps only 2 sandboxes provide stable and complete functionality — Adyen and PayPal. Other providers have stable but incomplete sandboxes (Google Wallet) or unstable and incomplete (Apple AppStore, Fortumo). And some providers don’t have, nor are they going to have, any sandboxes at all.

Figure 7. Stability and functionality of different sandboxes

Real payment testing has already been shown to be expensive and inefficient and it is hard to cover all scenarios using sandboxes. So, let’s look at three methods of external dependencies reduction — mocks, fakes and stubs — and what they have to offer.

Billing mocks is your system’s responses to requests with predefined parameters i.e. request to SMS payment provider using number +44–1111–111–11–11 should be caught before sending and processed as successful payment, and request using number +44–1111–111–11–12 should be caught and processed as response with error “Insufficient balance”.

A Fake involves imitating notifications such that they appear to have been generated by a real provider. Integration with the provider means there is a limited set notification types. Knowing these types, we can generate notifications that will be treated by our system as if they were notifications from the real provider.

A Stub is when the system redirects to a screen with a set of possible reactions from our system instead of sending a request to the provider. All possible reactions from our system should be presented on this page and the user can choose the system reaction instead of sending a request.

Although all three of these methods avoid real charges being made, they can’t be deemed cheap, this is because fulfilling these methods requires: analytics (creation of a map of all possible stages of our integration); development (making changes to the system code); and maintenance. In modelling real payments, mocks, fakes and stubs clearly have limitations which need to be taken into consideration when taking this route.

Let’s return for a moment to Figure 3 which shows one-off payment. Steps 3, 4 and 5 are key integration steps. When implementing mocks, fakes or stubs our focus is one or other of these 3 steps: request to provider, processing by provider, response of provider.

Mocks and stubs model ‘send request’, while fake models ‘responses. Other steps are cut.

Figure 11. How mocks, fakes and stubs can be depicted on payment diagrams.

While cutting steps in modelling generates risks (e.g. missing a bug in a cut step), modelling every step makes each method more expensive. Therefore, to be practical we use a combination of methods. When using mocks + fakes and sending requests with predefined numbers, the information gets captured and on its entry point the system sends fake notification of successful payment. Or when using stub + fake, when choosing the reaction from the stub we send fake notifications to the entry point that lead to a wish for reaction in our system. We recommend that both these methods are fulfilled in the developer environment and do not go into production for security reasons.

Let’s illustrate the mock, fake and stub idea using some code applied on PHP, JS and HTML samples. These samples are not working pieces of code but simply illustrate the idea.

Let’s assume that our system receives the following notification on its entry point:

Server code processing these notifications comprises the following methods.

In this case, mock means modification of processNotification method.

Fake means addition to the server code method of fake notification generation. To check actions, you need to generate test notification using this method.

Stub means that you replace paywall with a customized form.

None of the methods described are universal. To define the right moment for using these methods involve the consideration of three criteria:

Case coverage — how easy all possible cases can be reproduced using this method
E2e suitability — applicability of the method for end-to-end checks,
Cheapness — full cost evaluation of method including development time and maintenance.

Real payments can cover a restricted number of cases in a reasonable time. Checking an annual subscription takes a year. However, this is the only method which allows checking of E2e cases. It’s quite expensive because it involves spending real money.

Sandboxes are different and provide different coverage, but the average sandbox doesn’t allow us to cover all cases. Sandbox models cover the full e2e flow but the test results cannot be trusted because providers all have different codes for both their production and sandbox. In terms of cost, sandboxes are the cheapest method because we do not maintain them, the provider does.

Fakes, mocks and stubs are the most flexible methods that fully cover all cases. But they are absolutely useless for e2e testing. Method-wise, it’s not cheap because we have to write the code and maintain it.

Choosing the most appropriate method

Let’s consider a classic testing pyramid. At its base, the pyramid will have a large number of small cheap cases. Aside from cheapness is that all cases in our system need to be covered. At the top of the pyramid, the coverage requirement is not so essential, but we should be able to cover e2e cases here.

Thus, we have the following:

the base of pyramid — fakes, mocks, stubs
top of the pyramid — sandboxes and real payments

Figure 12. Match of testing stage and testing methods on the testing pyramid

Choosing anti-patterns methods

What can happen if the ratio of tests does not match the pyramid described here. Here are some patterns we encountered in payment testing here at Bumble

Real payments at base of the pyramid

We launched a real debit card for testing purposes, which was topped up regularly. Only a few people were given the card details. However, the card issuer registered thousands of transactions from this card completing in a minute. Naturally, they immediately blacklisted the card and blocked it. It transpired that the card details had been used by one of QA in autotests of some basic cases. Consequently, we were left without automation and manual testing for some time, because the card had been blocked.

Sandboxes at top and base of the pyramid

The first problem we faced due to dependencies of sandboxes was its functional instability. For example, for Apple Store sandboxes we had two incidents when the sandbox didn’t work for 2 weeks. As a result, we couldn’t test any payments on AppleStore because at that time all testing relied solely on the Sandbox.

The second issue is sandboxes’s limited application. There are certain cases, like partial billing, grace period or refunds, that can’t be checked on most sandboxes. As a result, functionality is only partially covered if you don’t use other methods.

Sandbox usage in tests at the pyramid base revealed infrastructure problems eg. for AppleStore the receipt saves all purchase histories to one account in one application. For one user alone the receipt size was almost 1Gb! Naturally, not every test system managed to process such large receipts.

External dependencies reduction at pyramid peak

For one mobile provider, we used a combination of mocks and fakes without checking real payments. As a result, when one operator from the scope of this provider changed the notification format our tests returned false-positive results. To reproduce real payment, we needed the sim-card of a country from another continent and of course we couldn’t get it in a reasonable time to check the payment.

To cover E2E cases in such instances, it’s essential to evaluate risks and analyse real notifications from payment providers, but this is the subject of another article.

Conclusion

Paid services should be tested much more carefully because even minor bugs can behave unpredictably.
When working on provider integration, you should start by getting a map of all possible responses of that provider. You can iterate the process making the reaction of your system more complex but correctly classifying correctly all the possible signals of the provider is essential.
A payment provider is always going to be a system with many unknown variables, so diversify your testing methods. The best practice is a combination of tests — mocks, stabs, fakes — to cover all functionality using small cheap cases, sandboxes and real payments to check e2e.
If you use methods of external dependencies reduction (fakes, mocks, stubs) be aware that they are only modelling, and they have approximations and usage risks. The latter should be quantified and mediated by real payments or other checks.

Thank you!