Recently, we launched an entire revamp of the RapidAPI Hub. The launch of Hub V2 included redesigning the UI and performance improvements to make the user experience faster. One of the major changes made was implementing GraphQL as our API gateway.
We spent a lot of time weighing the pros and cons of how we should implement our gateway and ultimately decided to use GraphQL. GraphQL definitely isn’t a silver bullet for all your problems, but here’s a breakdown of why we decided to use it, how we implemented it, and our results.
What is GraphQL and How Does it Compare to REST?
GraphQL is an open-source query language used to deliver data to mobile and web applications. Unlike most traditional REST APIs, GraphQL APIs only have a single endpoint to retrieve all the required data for your app.
Facebook developed GraphQL in 2012 and it has continued to grow in popularity since it was first introduced. The biggest strengths of GraphQL include:
- The ability to control exactly what information you receive from the server, and receive data in a predictable structure.
- The ability to request data from multiple sources in a single request.
GraphQL can replace more traditional RESTful APIs. A RESTful API architecture strongly ties the type of resource to the method used to retrieve it, so there is a separate endpoint to request each resource type. In other words, you have less control over exactly what data you are fetching because the object structure is hardcoded into the server. This requires you to over fetch data in many situations.
For example, we might want to fetch data about a company and its founder from an API.
If we wanted to retrieve only the company name, company location, and founder name using REST, we would have to make 2 separate queries to retrieve all the data we need.
The first request would be to GET /company, which would retrieve all the information stored about a company. In the response of the first request, there would be a founder_id, which is then passed to the GET /founder request. Again, the second request would return all the data associated with the founder.
If the only information required was the company name, company location, founder name, then time and resources were wasted fetching the additional data, as seen below:
GET /company?name=RapidAPI { "company": { "id": 10, "name": "RapidAPI", "location": "San Francisco", "industry": "Software" "founder_id":1 } } GET /founder?founder_id=1 { "founder": { "id": 1, "name": "Iddo Gino", "title": "CEO", "twitter": "@iddogino" } }
With GraphQL, we can simplify this quite a lot. The first step is to define the schema of our API:
type Founder { id: Int name: String title: String twitter: String } type Company { id: Int name: String location: String industry: String founder: Founder }
The schema for a GraphQL API defines a structure for how the data — populated from back end data stores — is formatted and nested.
Then, as part of the request to GraphQL, we can decide exactly what data our applications requires from this schema. For our example, we only need the company name, company location, and founder name. This is the structure of our query and response:
query: { company { name, location, founder { name } } } response: { "company": { "name": "RapidAPI", "location": "San Francisco", "founder": { "name": "Iddo Gino" } } }
As you can see, using GraphQL allows us to request data from multiple resources with one query, instead of multiple. Since we only requested the company name, company location, and founder name, that is the only data returned. All the other data (company location, founder twitter, etc.) is excluded since we did not specifically request it.
The way data is requested from an API is one of the fundamental differences between REST and GraphQL.
RapidAPI’s Problem Statement
When we started the redesign of the RapidAPI Hub, we knew that an API gateway was vital to help standardize all our internal microservices and simplify the architecture of our application. The problem that wasn’t quite as clear was whether we should choose REST or GraphQL for our gateway.
The first thing we had to consider was that RapidAPI uses a complex data schema throughout the hub. One example of this is the API object. The API object includes simple information like the name, owner, description, etc. All this information is fetched on the homepage as part of the API collections:
{ api { name, logo, shortDescription, rating, latency, uptime } }
With the simplified object structure above, it would be easy to fetch any number of APIs to display them right on the homepage once. This isn’t the reality of RapidAPI’s data schema though. When we dive deeper into the schema, it includes a lot of nested objects in general, and a large amount of data that needs to be pulled in at once.
For the API resource mentioned above, not only is the simple data presented above part of the schema, but complex information like owner information, endpoints, billing plans and billing plan versions, open discussion topics, and more are also stored. Then, when the API endpoints are all pulled in, there could be hundreds of API endpoints. From the API endpoints, all the header and parameter data also needs to be pulled in.
{ api { name, logo, shortDescription, rating, latency, uptime, endpoints:[ { group method name route description params: [ name, type, body ] } ] } }
This is just a small look into the complex data structure that we use to make RapidAPI work. On top of it being a complex structure, all the data is pulled in from a number of different microservices. These characteristics were vital for us to consider when we were selecting between REST and GraphQL.
GraphQL vs REST Decision-Making Process
Through our research, we generally found RESTful API gateways are great when an application has a simple data structure or only a few sequential requests are needed to fetch all the required data. However, we mentioned earlier that RapidAPI’s data structure is very complex and all the data is pulled in through many requests to our services. If we used a REST API gateway, we would need to make separate requests for each type of data (API version, Endpoints, Parameters, Payload, Billing Plans, Billing Plan Versions, etc.).
In this scenario, if we were using REST, we would opt for a lazy implementation to only fetch the data required at the time. This isn’t the most optimal user experience since there would be some additional loading as they navigate through the website. With GraphQL, we don’t need to make multiple round trips to fetch data. With a single request to the server, we can retrieve all the initial data required for the user’s entire session.
GraphQL also makes API clients less dependent on the server. This means the server does not need to hardcode the size or shape of data, allowing the clients to select what data they should get when they make a particular API call. Considering all these factors, we decided to choose GraphQL for our API gateway.
Our GraphQL Gateway Project Structure
When we first started to look at GraphQL, we discovered lots of starter kits and examples for how we should structure the project. However, the real-world implementation requires custom setup and company-specific infrastructure. For us, it was also a completely new paradigm, so we wanted it to be generic and open for extensions.
At RapidAPI, we typically structure our NodeJS backend code as a set of independent modules and minimize the coupling between them. These modules are imported during the server start/bootstrap time and injected into the system.
Since a modular approach worked the best for us in the past, we decided to go with a similar approach in our GraphQL project. After considering this, we decided to go with GraphQL Yoga, check out the repo here.
Our Basic File Structure:
Module Structure:
Each module has its own entry point – index.js file which exports the following:
.graphql – GraphQL schema that defines an object structure
extend type Query { getApibyId(apiId: String): Api } type Api { id: String! name: String version: ApiVersion ... }
.resolvers.js – Essentially handler functions that know how to resolve data for a specific property defined in .schema
const version = (api, args, ctx) =>{ return ctx.loaders.ApiVersion.apiVersionLoader.load(api.id) } module.exports = { Query: { getApibyId }, Api: { version } }
.services.js – The actual code/business logic. Usually has a bunch of HTTP calls or loaders invocations
.loaders.js – Data loaders for solving the GraphQL n+1 problem, performance improvements, and sometimes monetization
const apiLoader = (cookies, headers) => { return new DataLoader((ApiIds) => { return apiService.getApisByIds(ApiIds).then(apis => { const apiMap = _.keyBy(apis, 'id') return ApiIds.map(apiId => apiMap[apiId]); }) }); } module.exports = (cookies, headers) => { return { apiLoader: apiLoader(cookies, headers) } }
index.js – The entry point that is imported in server bootstrap time and exposes the above files
module.exports = { resolvers: require('./Api.resolvers'), typeDefs: require('../../utils/graphqlSchemaLoader')('Api/Api.graphql'), service: require('./Api.service'), loaders: require('./Api.loaders') }
Server Bootstrap and Dynamic Module Loading
All our modules follow the same naming convention and export the same interface. We can load/require all the modules by simply looping through the folders, taking each folder name as module name, and creating the 4 hash maps: schemas, resolvers, loaders, and services. The key in the map is matching the module/folder name and the value is the actual imported js module/file.
Once we finish mapping them, now we can inject the maps into the GraphQL context (ctx). However, we want to inject it on every request instead of just doing it once during the server start. In this case, we can initialize our loader and pass some more information such as cookies and headers. Most importantly, we create a new instance to make sure that caching is applied only during the request/response cycle and not across all requests.
const server = new GraphQLServer({ typeDefs: typeDefs.join(' '), // typeDefas - concatenated string resolvers: merge({}, resolvers), // mapped resolvers context: async (context) => { // a callback invoked per each request and creates a new ctx object const requestCtx = { services, loaders: Object.keys(loaders).reduce((acc, next) => { // new loader instance with injected cookies abdand headers acc[next] = loaders[next](context.request.cookies, context.request.headers) return acc; }, {}) } return requestCtx }, middlewares: [loggingMiddleware] }) server.start(serverOptions, ({ port }) => console.log(`Started on port ${port} , process ${process.pid}`))
The Trend Towards GraphQL
Since Facebook invented GraphQL, it is no surprise that it continues to be a popular choice for social networking sites. Facebook, Twitter, Instagram, and Pinterest all reported using GraphQL in their stack. Other companies that have explained their choice to use GraphQL include Airbnb and the New York Times.
After our research, we decided to use GraphQL instead of a RESTful API gateway because GraphQL provided a flexible solution and matched our application’s needs. As GraphQL continues to become more popular, it is important to remember that it isn’t the right solution for every situation. We hope this breakdown of our decision-making process and implementation helps illustrate how you can decide for yourself if GraphQL or RESTful API gateways are better for your project.
Am I the only one who finds it weird that the world’s largest API marketplace doesn’t have any documentation for developers to use their own API?
For example, how do API consumers and providers pull data RapidAPIs backend like stats? What API key do we use?
Hi David,
Unfortunately, we don’t have that functionality yet. For more info, please reach out to support@rapidapi.com
You are not alone David. I find it weird too. I can’t think of any downsides.. I believe RapidAPI should provide their own api.