We are always seeking ways to improve the developer experience here at Rapid. In our ongoing search for greater scalability, flexibility, and performance in our development environment, we recently completed a huge undertaking with widespread benefits: making the switch from GraphQL Monolith to Apollo Federation.
Anyone who has been through this transition before knows it’s not for the faint of heart. But the outcome is worth the effort.
In this post, we will share some background information about GraphQL and Apollo Federation, how the experience played out for us, and why we’re happy we made the move.
GraphQL and Apollo Federation, explained
You’ve likely heard of GraphQL and Apollo Federation through the developer grapevine, but may not have a true understanding of its composition and functionalities. What it is, in a nutshell, is a pattern for building GraphQL APIs by combining multiple GraphQL services into a single API schema. It is a powerful open architecture for creating a supergraph that combines multiple GraphQL APIs.
Apollo Federation provides a way to break down large, complex API schemas into smaller, more manageable services that can be developed and maintained independently. That means backend developers get flexibility and service isolation while the end user experience remains unaltered: they continue to consume the subgraph in the same fashion.
GraphQL itself is a query language for APIs — developed by Facebook in 2012 and open source since 2015 — that allows users to request data with a granular level of specificity. As a developer using GraphQL, you can easily:
- Describe your data
- Ask for what you want
- Reduce the number of calls
- Receive predictable data
So, if you’re using the GraphQL query language for APIs, moving to Apollo Federation is a way to simplify the workflow when dealing with complex APIs.
Why did we decide to use Apollo Federation?
Rapid’s previous GraphQL instance was problematic for several reasons. We had one codebase that was shared with multiple teams all working together to develop features — and we were using that same codebase in the same artifact that we are deploying to our environments.
With 30 or so developers working simultaneously on the same code, we ended up with a lot of collisions in our deployments.
If, for example, one of our developers posted a broken code with a bug in it, all of the teams were blocked from making any further progress. In addition, if we were trying to deploy, say, multiple features that all contained that one broken code, we had to roll all of them back to the previous, stable iteration.
And, when it came to domain ownership and standards, we had:
- No clear ownership
- Multiple teams with different coding standards
- Confusion over who is responsible for code review
Overall, our GraphQL instance was a development bottleneck for us.
By contrast, Apollo Federation encourages a design principle called “separation of concerns,” which enables different teams to work on different products and features within a single graph, without interfering with each other. We knew that Apollo Federation would enable our developers to work on independent GraphQL services, each with their own schema and data source. And by separating the codebase, our developers would be better equipped to iterate faster and experience a more seamless workflow.
Moving to Apollo Federation
Moving to Apollo Federation is a complex process, so we knew we needed to tackle it in phases.
Here are the steps we took:
- Create the Federation service and infrastructure: We incorporated new services as part of the GraphQL API configuration which allowed us to start working with Federation. The services each have their own GraphQL schema and data source.
- Refactor the API gateway: We converted our old API gateway to run as a standalone or, when necessary, as a single subgraph, with multiple microservices and composed by the graph router. This allowed us to:
- Break the gateway monolith into multiple services
- Enable teams to continue working seamlessly
- Put the Apollo Federation infrastructure in place
- Deploy without any downtime
- Connect the gateway as a subgraph to Federation: We were able to move some of the logic-related graphs to a separate service by having the API gateway consumed as a subgraph. We also did this incrementally by using the Federation infrastructure to serve both the API gateway graph and the section of the graph that was migrated to another service.
- Release gradually via test and deploy: The first step was to use the Federation gateway in-house only so we could test the process and look for — and fix — any bugs. Then, we did a gradual release: at first, just 1% of our users got to us with Federation, then 5% and then 10%, 20%, and so on, until all users were using the Federation gateway. We kept a close eye at every step, making sure to monitor progress and address any issues. At the same time, we began breaking the API gateway service into subgraphs.
As a big project with a lot of risk, this was a huge team effort. We made sure this core effort to our development environment was handled with the utmost care and we’re thankful that the whole team was willing to join in.
How it’s going now
Since making the switch to Apollo Federation, our development process has vastly improved. Because we broke down a large, complex API into smaller, more manageable services, our developers are able to work more effectively. Developers in one department can work on code without being hindered by what’s going on with the code in a different department. This has been especially helpful for our global workforce — developers in different time zones than our US team no longer need to wait until US-based teams are available.
That means we get greater scalability, flexibility, and performance — which, in turn, helps Rapid to deliver features to its end users more quickly and with a higher quality.
Mission accomplished.
Leave a Reply