Jun 1, 2020 by Alexandru Mocanu
Managing Service Contracts in Microservice Architectures: Part II
Handling Service Contract Changes
In the first part of this series, we have looked at various service contracts from a static perspective, without concerning ourselves with why, when, or how they change. However, as business requirements evolve, so do service contracts. Frequently, a single business requirement translates into contract changes across multiple services. It is important to ensure that these changes are implemented and deployed in the “right” order—one that minimizes the overall downtime of the system, while also maintaining data consistency and the independent deployability of services.
The first thing to keep in mind is that not all changes are created equal; to develop a solid change handling strategy, it is important to distinguish between breaking and non-breaking changes. Semantic versioning, in particular, is a great tool for this job.
Semantic Versioning and Breaking Changes
Semantic versioning is a simple set of rules for constructing version numbers. A semantic version expresses the impact of changes in a succinct, yet easy to grasp manner.
In its most basic form, a semantic version consists of three numbers separated by dots. For example, version 1.2.3 is a semantic version:
- The major version (1) must be incremented when releasing changes that are not backwards compatible (breaking changes);
- The minor version (2) must be incremented when releasing backwards compatible features;
- The patch version (3) must be incremented when releasing backwards compatible bug fixes.
In microservice architectures, each service is versioned independently, using all three version numbers, as follows:
- Always increment the major version of the service when there are breaking changes in the service contract.
- Always increment the minor/patch version of the service when delivering new features and bug fixes that don’t break the existing service contract.
On the other hand, a service contract should be versioned using only the major version of the owner service. This is because, from the point of view of the contract consumers, only breaking changes produce an impact, while non-breaking changes can be safely “ignored.”
This is not to say that the owner of a service contract is excused from documenting and communicating even minor and patch changes to all interested parties. Here are some ways to accomplish this:
- Involve the API consumer teams in the design decisions (API-First approach);
- Provide API documentation in a standardized format (e.g. OpenAPI/Swagger for REST APIs, JSON Schema for Command/Event APIs);
- Provide a sandbox for consumer teams to try out the updated API before release;
- Provide a changelog when the new version is released.
Another thing to note is that the contract version must be, in one way or another, part of the contract itself:
- REST APIs should include their version in the URI path.
- Command APIs should make their version part of the queue identifier.
- Event APIs should publish the API version in a dedicated field inside the message header (envelope).
We have already defined breaking changes as “changes that are not backwards compatible.” For service contracts, this could either mean causing errors in the consumers’ code or changing an expected behavior. Because of their disruptive potential, you should avoid breaking changes altogether if possible. When they become unavoidable, plan ahead and group several breaking changes together in the same release, to minimize overhead.
Here are some examples of breaking changes:
- Removing/renaming a REST endpoint (it breaks the consumers)
- Adding a mandatory field to a command (it breaks expected behavior; in the absence of this field, the command is not executed)
- Removing/renaming an event field (it breaks the consumers)
On the other hand, the following are not breaking changes:
- Adding a REST endpoint
- Adding a new command type to a Command API or a new event type to an Event API
- Adding new fields to REST responses or to existing events
Identifying breaking changes is not always as obvious as in the examples above. Confusing a breaking change with a non-breaking one can be particularly disruptive, so you should always consider each change carefully before classifying it as one or the other.
Handling Breaking Changes
Minor and patch releases have no impact on consumers, so deploying them is just a matter of replacing the old service version with the new one. On the other hand, deploying major releases requires a solid strategy in order to eliminate potential issues. It becomes even more challenging when a service exposes multiple contract types (e.g. REST+Event, REST+Command).
Even though there isn’t a single, one-size-fits-all “recipe,” it’s always good to keep in mind a few general principles:
- Services must remain independently deployable—this is especially important when they are developed by different teams. Consumer services should be able to switch to the new contract version as they see fit and, as they do so, the stability of the system must not be impacted.
- Deploying a new service version must not impact data consistency. One way to achieve this is to have a single source of data for all deployed versions of the same service.
- When deploying a new service contract, the direction of the data flow should always be considered. For example, sending events of a new type before subscribers are prepared to process them can lead to data loss.
Armed with this knowledge, let us now take a look at some breaking change scenarios for the most commonly encountered service types.
REST Provider Service
To illustrate this scenario, we will use the User service from our previous booking platform example. The User service is a REST-only provider with three clients: the Messaging service, the User Alerts service and a front-end application.
Let’s assume an initial state where User v1.2.3 provides REST API v1. At this point, a new major version (v2) of the User service contract is needed.
User v2 is released, but, unlike a minor or patch version, it cannot replace User v1 directly. Instead, User v2 must be deployed alongside User v1—having the two versions run in parallel for a while gives clients the chance to migrate to v2 gradually. To ensure that all clients migrate in a timely fashion, a cutoff date can be negotiated by all impacted teams.
Now that User v2 is up and running, clients can migrate to the new contract one by one, with virtually no downtime for the overall system.
There is a period of time when some clients (User Alerts and front-end) have already migrated to v2, while others (Messaging) continue to use v1. To maintain data consistency, both User service versions should employ the same data sources. Otherwise, data created by a client via the v1 contract would not be available to clients using the v2 contract and vice versa.
Also note that, during this phase, User v1 may still receive patches (v1.2.4, v1.2.5 etc).
Step 4: Decommission the old version
Command Processor Service
The handling of breaking changes in a command processor service follows the same steps as the REST provider case, but it is still worth a quick look as it’s a frequently encountered scenario.
We will use the Email Sender service from the booking platform example. Email Sender receives commands from two producers, the Invoice service and the User Notification service.
Step 1: Plan and implement a new major version
Email Sender v1.2.3 provides Command API v1, when a new major version (v2) of the Email Sender service contract becomes necessary.
Email Sender v2 is deployed alongside Email Sender v1 to give producers the chance to migrate to v2 gradually. Note that the new version has its own inbound queue.
With Email Sender v2 up and running, producers migrate to the new version one by one. During this phase, Email Sender v1 may still receive patches (v1.2.4, v1.2.5 etc).
After all producers have migrated to Command API v2, Email Sender v1 may be safely decommissioned.
REST Provider + Event Publisher Service
A more interesting breaking change scenario is when the service provides both a REST API and an Event API. (The scenario for Command API + Event API is similar).
We will use the Booking service from the booking platform example. This service has two REST clients (front end applications) and two Event API subscribers (the User Alerts service and the User Notification service).
Booking v1.2.3 provides, at the moment, REST/Event API v1. A new major version (v2) of the Booking service contract is needed.
In the previous scenarios, the new service version could be deployed as soon as it was ready. This time, it becomes mandatory to release the contract before deploying the service. Let’s see why. (By “release the contract” we mean make the contract specification known to all consumers).
Like we saw in the REST Provider scenario, REST clients should be allowed to migrate to the new contract version on their own terms. In this case, however, as soon as Booking v2 starts receiving v2 requests, it will also start generating v2 events. To avoid data loss and subscriber errors, no v2 events must be emitted before all subscribers are ready to process them, and no v1 events must be ignored as long as they are still emitted by Booking v1. By delaying the v2 service deployment, we give event subscribers the chance to update their code.
The new version of the Booking service will run alongside the old one. Both of them will use the same event topic. User Alerts and User Notification are going to receive both v1 and v2 events while the migration of the REST clients is taking place, so they will need to handle both v1 and v2 events at the same time. Since this is the case, there is no particular advantage in having a new event topic for v2.
As soon as all the subscribers are ready to process v2 events, Booking v2.0.0 can be deployed.
REST clients can start migrating to the new REST API version. During this phase, the Booking service will be emitting both v1 and v2 events, so all subscribers must be able to handle both types of events.
Finally, after all REST clients have migrated to the new service contract, the old Booking service version can be decommissioned. From now on, the Booking service will produce only v2 events, so User Alerts and User Notification should remove the code for handling v1 events.
When designing a microservice architecture, it is important to identify the employed communication models and assign contract ownership accordingly. However, this is just the first step. To manage such a platform in the long run, you need well-defined practices for dealing with change. Applying these practices takes rigorous discipline and commitment from all service teams, but it has the benefit of reducing team interdependence and making the platform more maintainable and easier to evolve.
My thanks to Cosmin Lazăr and Andrei Mălinaș, for their help in shaping up the ideas presented in this article!