- read

The complete GraphQL Security Guide: Fixing the 13 most common GraphQL Vulnerabilities to make your…

Stefan 50

Hello World! My name is Stefan and I am apart of the growth team here at Wundergraph. The article was written by our CEO / CTO Jens Neuse. Enjoy!

It’s 2021, GraphQL is on its rise to become a big player in the API Ecosystem. That’s perfect timing to talk about how to make your GraphQL APIs secure and ready for production.

So here’s my thesis: GraphQL is inherently insecure. I’ll prove this throughout the article and propose solutions. One of the solutions will require some radical change in the way we’re thinking about GraphQL, but it will come with a lot of benefits that go way beyond just security.

The 13 most common GraphQL Vulnerabilities

1. Parsing a GraphQL Operation vs. parsing a URL

Why? Why is GraphQL so much more vulnerable than e.g. REST? Let’s compare a URL against a GraphQL Operation. According to Wikipedia, the concept of the URL was first published in 1994, that’s 27 years ago. If we search the same source for the birth of GraphQL, we can see, it’s Sep 2014, around 7 years old.

This gives parsing URLs an advantage of 20 years over parsing GraphQL Operations. Quite the headstart!

Next, let’s have a look at the antlr grammar for both.

The grammar for parsing a URL is 86 lines. The grammar for parsing a GraphQL document is 325 lines.

So, it’s fair to say that the GraphQL language is around 4 times more complex than the one defining a URL. If we factor in both variables, it’s obvious that there must be a lot more experience and expertise in parsing URLs than parsing GraphQL operations.

But why is this even a problem? Recently, a friend of mine analyzed some popular libraries to see how fast they are in parsing GraphQL queries. It made me happy to see that my own library was performing quite well. At the same time, I was surprised that some libraries didn’t accept the test Operations while other were able to parse them.

What does this mean for us? The person who performed the benchmarks hand-picked a number of GraphQL libraries and ran a few benchmarks. This was enough to find some bugs. What if we picked all GraphQL libraries and frameworks and test them against numerous GraphQL Operations?

Keep in mind that we’re still talking about simply parsing the Operations. What if we add building a valid AST into the equation? What if we add executing the Operations as well? We almost forgot about validating Operations, a topic in itself.

A few years ago, there was a small group of people who started an amazing open source project: CATS The GraphQL Compatibility Acceptance Test. It’s quite a mouthful, but the idea is brilliant. The idea was to build a tool so that different GraphQL implementations can prove that they work as intended. Unfortunately, the project’s last commit is from 2018.

Alright, parsing a URL seems simple and well understood. Parsing GraphQL Operations is a nightmare. You should not trust any GraphQL library without heavy testing, including fuzzing.

We’re all humans. Building a GraphQL library is complex. I’m the owner of an implementation written in Go. It’s not easy, it’s a lot of code. A lot of code means, a lot of potential for bugs.

And don’t get me wrong, this is not about hand-written parsers vs. generated parsers from a grammar. Turning a string into an AST is just one small piece of the puzzle. There’s plenty of opportunities left for bugs.

2. Normalizing GraphQL Queries can potentially leak fields

You don’t have to normalize a URL. If you can parse it in your language of choice, it’s valid, otherwise it’s not.

A different story with GraphQL. Here’s an example:

A lot of foo! Let’s normalize the Query.

That’s a lot less foo, nice! I could have made it more complicated with more fragments, nesting, etc… What’s the point?

How can we prove that all libraries and frameworks normalize the Query correctly? What happens if something goes wrong here? It might give an attacker an opportunity to ask for fields which he/she is not allowed to use. Maybe there’s a hidden field and by wrapping it with a weird inline fragment @skip combo, we’re able to query it.

As long as we’re not able to prove that it’s impossible, I’d consider it’s possible, prove me wrong!

To summarize: No Normalization for URLs. More nightmares for GraphQL.

4. GraphQL Denial of Service Attacks

If you want to rate-limit a REST API user, all you have to do is store their IP in an in memory story, e.g. Redis, and rate limit them with you algorithm of choice, e.g. a sophisticated window rate limiter. Each request counts as one request, this sounds stupid but matters in the context of GraphQL.

With GraphQL on the other hand, you cannot apply the same pattern. One single Operation is enough to bring the GraphQL Server to a halt.

Here are a few examples of how to build a denial of service attack with GraphQL:

Moving back and forth, forever.

Simply ask for a lot of foos:

How about exploiting N+1 problems?

Each layer of nesting asks for more nested data, hence exponential growth of execution complexity.

A few things you should consider:

Usually, GraphQL operations come in the form of a JSON over an HTTP POST request. This JSON could look like this:

The first thing you should do is to limit the amount of JSON bytes you’re accepting. How large can your larges Operations be? A few Kilobytes? Megabytes?

Next, when parsing the Operation, how many Nodes are too many Nodes? Do you accept any amount of Nodes in a Query? If you have analytics running on your system, maybe take the larges Query, add a margin on top and set the limit there?

Talking about the maximum number of Nodes when parsing an Operation. Does your framework of choice actually allow you to limit the number of Nodes it’ll read?

Next, let’s talk about the options you have when the Operation is parsed.

You can calculate the “complexity” of the Operation. You could “walk” through the AST and apply some sort of algorithm to detect the complexity of the Operation. One way to define complexity is for example the nesting.

Here’s a Query with nesting of 1:

This Query has nesting of 2:

This algorithm is a good start. However, it has some downsides. Nesting alone is not a good indicator of complexity.

To better understand the complexity, you’d have to look at the possible number of nodes, a field can return. This is similar to EXPLAIN ANALYZE in SQL. It gives you some estimates on what the Query Planner thinks, how the Query will be executed. Keep in mind that these estimations can go completely wrong.

So, estimating is not bad, but you should also look at the real number of returned nodes during execution.

Companies with public GraphQL APIs, like e.g. GitHub, have implemented quite sophisticated rate limiting algorithms. They take into account the number of nodes returned by each field and give you some limits based on their calculations.

Here’s an example Query from their explanation:

There’s one important thing we can learn from them in terms of GraphQL Schema design. If you have a field that returns a list, make sure there is a mandatory argument to limit the number of items returned, e.g. first, last, skip, etc... Only then, it's actually possible to calculate the complexity before executing the Operation.

Additionally, you’d also want to think about the user experience of your API. It’s going to be a poor user experience if GraphQL Operations randomly fail because there’s too much data coming back from a list field for some instances.

At the end of the post, we’ll pick up this topic again and talk about an even better approach, an approach that works well for both the API provider and the consumer.

5. GraphQL SQL Injection Vulnerability

This one should be quite known, but it should still be part of the list.

Let’s have a look at a simple resolver using graphql-js:

A Query for this resolver might look like this:

A Query for this resolver might look like this:

In case of a badly written implementation of db.loadHumanByID, the SQL statement could look like this:

In case of the “happy” path, the SQL statement will be rendered like this:

Now, let’s try a simple attack 😈:

In case of our attack, the SQL statement looks slightly different:

As 1=1 is always true, this would return all users. You might have noticed that the function can only return a single user, not a list of users, but for illustration purposes, I think it's clear that we have to deal with the issue.

What can we do about this?

The solution to the problem is not really GraphQL-specific. You should always validate the inputs. For database access, use prepared statements or an ORM that abstracts away the database layer so that you’re not able to inject arbitrary logic into the statement by design.

Either way, don’t trust user inputs. It’s not enough to check if it’s a string.

8. Relay Global Object Identification Vulnerability

Your GraphQL Server Framework might implement the Relay Global Object Identification specification. This spec is an extension to your GraphQL schema to make it compatible with the Relay client, the client developed and used by Facebook.

What’s the problem with this spec? Let’s have a closer look at what it allows us to do:

The Relay spec defines that each Node in a Graph must be accessible through a globally unique identifier. Usually, this ID is the base64 encoded combination of the __typename and the id fields of a node. With the node returned, you're able to use fragments to ask for specific node fields.

This means, even if your Server is completely secure, by enabling the Relay Extension, you’re opening up another attack vector.

At this point, it should be clear that protecting the edges is a cat and mouse game which is not in favor of you.

A better solution to solve the problem is by protecting the node itself. So, whenever we enter the resolver for the type User, we should check if the currently authenticated user is allowed to request the fields.

As you can see, you have to make decisions very early on when designing your GraphQL Schema as well as the Database Schema to be able to protect nodes properly. Whenever you enter a node, you must be able to answer the question if the currently logged in user is allowed to see a field or not.

So, the question arises if this logic should really sit in the resolver. If you ask the creators of GraphQL, their answer would be “no”. As they’ve already solved the problem in a layer below the resolvers, the data access layer or their “Entity (Ent) Framework”, they didn’t address the issue with GraphQL. This is also the reason why authorization is completely missing from GraphQL.

That being said, solving the problem a layer below it not the only valid solution. If done right, it can be completely fine to solve the problem from within the resolvers.

Before we move on, you should have a look at the excellent entgo framework and its architecture. Even if you’re not going to use Golang to build your API layer, you can see how much thought and experience went into the design of the framework. Instead of scattering authorization logic across your resolvers, you’re able to define policies at the data layer and there’s no way to circumvent them. The access policy is part of the data model. You don’t have to use a framework like entgo, but keep in mind that you’d then have to solve this complex problem on your own.

Again, we’ll revisit this vulnerability later to find a much simpler solution.

9. GraphQL Gateway / Proxying Vulnerability

A lot of GraphQL servers are also API Gateways or Proxies to other APIs. Injecting GraphQL arguments into sub-requests is another possible threat we have to deal with.

Let’s recall the schema from above:

Let’s imagine this Schema is implemented using a REST API with the GraphQL API as an API Gateway in front. The resolver for the userByID field could look like this:

Now, let’s not fetch the user but two of their friends! Here’s the Query (totally valid):

This results in the following GET requests:

Why is this possible? The ID Scalar should be serialized as a string. While "7" is a valid string, "7/friends/1" is also.

To solve the problem, you have to validate the input. As the GraphQL type system is only validating if the input is a number or a string, you need to go one step further. If you’re accepting strings as input, e.g. because you’re using a UUID or GUID, you have to make sure you’ve validated them before usage.

How can we fix it?

Again, we need to validate the inputs. WunderGraph offers you a simple way to configure JSON Schema validation for all inputs. This is possible, because WunderGraph is keeping your Operations entirely on the server. But we’ll come to that later.

Anybody else should make sure to validate any input before using it from your resolvers.

10. GraphQL Introspection Vulnerability

GraphQL Introspection is amazing! It’s the ability of the GraphQL to tell clients everything about the GraphQL Schema. Tools like GraphiQL and GraphQL Playground use the introspection Query to then be able to give the user autocompletion functionalities. Without Introspection and the Schema, tools like these wouldn’t exist. At the same time, introspection also has a few downsides.

The GraphQL schema can contain sensitive information. There’s a possibility that your GraphQL schema is leaking internal information or fields that are only used internally. Maybe one of your teams is working on a new MVP which is not yet launched. Your competitors might be scraping your GraphQL API using the introspection Query. Whenever there’s a change in the schema, they could immediately see this using a diff.

What can we do about this? Most guides advise you to disable the Introspection Query in Production. That is, you’ll allow it during development but disallow introspection Queries when deploying to production.

However, due to the friendliness of some GraphQL framework implementations, including the graphql-js reference implementation, disabling introspection doesn’t really solve the issue. Keep in mind that every implementation depending on the graphql-js reference is also affected by this.

So, if disabling introspection doesn’t help, what else can we do about it? If your API is only used by your internal staff, you can the execution of introspection Queries with an authentication middleware. This way, you would add a layer of authentication in front of the GraphQL execution. Obviously, this only works for APIs that always require authentication because otherwise users would not be able to make a single request.

If you’re building an app that can be used by users without authentication, the proposed solution doesn’t work.

To sum up, by disabling introspection at runtime, you’re making it a bit more complicated to introspect the schema, but with most frameworks it’s still possible.

The next vulnerability will also take advantage of this issue. The ultimate catch-all solution will be presented at the end.

13. GraphQL Excessive Errors Vulnerability

This is another common issue with GraphQL APIs. GraphQL has a nice and expressive way of returning errors. However, some frameworks are by default just a bit too informative.

Here’s an example of a response from an API that is automatically generated on top of a database:

This error message is quite expressive, it seems like it’s coming from a SQL database and it’s about a violation of a unique key constraint.

While helpful to the developer of the application, it’s actually giving way too much information to the API consumer.

This message could be written to the logs if any. It seems like an app user is trying to create content with an ID that already existed.

In a properly designed GraphQL API, this actually doesn’t have to be an error at all. A better way to design this API would be to return a union that covers all possible cases, like e.g. success, conflict, etc… But that’s just a general problem with generated APIs.

In any case, if generated or not, There should always be a middleware at the very top of your HTTP Server that catches verbose errors like this and removes them from the reponse. If possible, don’t just use the generic “errors” response object. Instead, make use of the expressive type system and define types for all possible outcomes of an operation.

REST APIs have a rich system of HTTP status codes to indicate the result of an operation. GraphQL allows you to use Interface and Union type definitions so that API consumers can easily handle API responses. It’s very hard to programmatically analyze an error message. It’s just a string which could change any time.

By creating Union and Interface types for responses, you can cover all outcomes of an operation explicitly. An API consumer is then able to switch case over the __typename field and properly handle the "known error".

Summary & Vulnerability Checklist

Another long blog post comes to an end. Let’s recap! We’ve covered 13 of the most common GraphQL vulnerabilities.

Here’s a Checklist if you want to go through all of them.

  • Parsing Vulnerabilities
  • Normalization Issues
  • Operation Validation Errors
  • Denial of Service Attacks
  • GraphQL SQL Injections
  • Authentication Vulnerabilities
  • GraphQL Authorization Traversal Attacks
  • Relay Global Object Identification Vulnerability
  • GraphQL Gateway / Proxying Vulnerability
  • GraphQL Introspection Vulnerability
  • Generated GraphQL APIs Vulnerability
  • GraphQL CSRF Vulnerability
  • GraphQL Excessive Errors Vulnerability

That’s a lot of issues to solve before going to production. Please don’t take this lightly. If you look at HackerOne, you can see the issue is real.

So, we want to get the benefits of GraphQL, but going through this whole list is just way too much work. Is there a better way of doing GraphQL? Is there a way of doing GraphQL differently so that we’re not affected by all the issues.

The answer to this question is Yes! All you have to do is to adjust your view on GraphQL.

Solving the 13 most common GraphQL Vulnerabilities for private APIs

Most of us are using GraphQL APIs internally. This means, the developers who use the GraphQL API are in the same organization as the people who provide the API. Additionally, I’m assuming that we’re not changing our GraphQL Operations at runtime.

All this boils down to the root cause of the problem.

All this is completely avoidable, adds no value and only creates harm. It’s absolutely fine to allow developers within a secured environment to send arbitrary GraphQL Operations. However, most apps don’t change their GraphQL Operations in production, so why allow it at all?

Let’s have a look at the Architecture you’re most familiar with.

A GraphQL client talks GraphQL to a GraphQL Server.

Now, let’s make a small change to the architecture to fix all 13 problems.

Instead of talking GraphQL between Client and Server, we’re talking RPC, JSON-RPC more specifically. The Server then handles Authentication, Authorization, Caching, etc… for us and forwards Requests to the origin servers.

We haven’t invented this though. It’s not something new. Companies like Facebook, Medium, Twitter, and others are doing it.

What we’ve done is not just make it possible and fix the problems listed above. We’ve created an easy-to-use developer experience. We’ve assembled everything in one place. You don’t have to install numerous dependencies.

Let’s break down the solution a bit more, so you can fully understand how we’re able to solve all the vulnerabilities.

Solving Vulnerabilities related to Parsing, Normalizing and Validating GraphQL Operations

The most secure code is the code that doesn’t have to run at all. Every code has bugs. To fix bugs, we have to write more code, which means, we’re introducing even more bugs.

So, how can we replace GraphQL with RPC over the wire?

During development, the developers define all Operations that are required for the Application. At the time when the app is ready to be deployed, we’ll parse, normalize and validate all Operations. We’ll then generate JSON-RPC Endpoints for each Operation.

As mentioned, we’ve normalized the Operations. This allows us to treat all inputs (the variables) like a JSON object. We can then parse the variable types and get a JSON Schema for the input. Additionally, we can parse the response schema of the GraphQL Query. This gives us a second JSON Schema. These two will be quite handy later.

By doing all this, it’s happening automatically, we’ll get two things:

  1. A number of JSON-RPC Endpoints
  2. JSON Schema definitions for the inputs and response objects of all Endpoints

By doing this at “deployment time”, we don’t have to do it during the execution again. We’re able to “pre-compile” an execution tree. All that’s left at runtime is to inject the variables and execute the tree.

We’ve borrowed this idea from SQL Database Systems, it’s quite similar to “Prepared Statements”.

Ok, this means we’ve solved three problems.

Unfortunately, we’ve also introduced a new problem! There are no easy to use clients that could make use of our JSON-RPC API.

Luckily, we’ve extracted two JSON-Schemas per Endpoint. If we feed those into a code-generator, we’re able to generate fully type-safe clients in any language.

These clients are not only very small, but also super efficient as they don’t have to do much.

So, in the end, we’ve not only solved three problems but also made our application more performant.

As another side effect, you’re also able to generate forms from these JSON Schema definitions. It’s fully integrated into WunderGraph.

Solving the 13 most common GraphQL Vulnerabilities for public APIs

Alright, you’ve seen that by changing our architecture and evolving our understanding of GraphQL, we’re able to mitigate almost all of the issues that are coming from GraphQL itself. What’s left is the biggest vulnerability of all systems, the people who build them, us, the developers!

Okay, we’re almost done. We’ve left out one small group of APIs. We’ve been talking about private APIs for almost the entire article, but we did it for a good reason, probably 99% of the APIs are private.

But what about parter and public APIs? What if we don’t want to put an RPC layer in front of our API? What if we want to directly expose GraphQL to our API consumers?

Companies like Shopify and GitHub are doing this. What can we learn from them?

Shopify currently has 1273 reports solved. They’ve paid in bounties $1.656.873 to hackers with a range of $500-$50.000 per bounty.

Twitter resolved a total of 1364 issues with a total of $1.424.389.

Snapchat only paid out $439.067 with 389 reports resolved.

GitLab paid an astounding $1.743.639 with a total of 845 issues resolved.

These bounties are not just related to GraphQL, but all the companies listed above can be found on the list of reported GraphQL issues.

There’s a total of 70 reports on GraphQL with lot’s of bounties paid out. If you search on other bug bounty websites, you’ll probably find more.

Companies like GitHub have people who build specialized infrastructure to better understand how their GraphQL API is being used. I was pleased to meet the amazing Claire Knight and listen to her talk on the last GraphQL Summit, it’s been quite some time…

I’ve presented you all this data to be able to make two points.

First, do you really need to expose your GraphQL API? If not, excellent! You’re able to apply all the solutions from this article.

If, by all means, you absolutely want to expose a GraphQL API, make sure you have the expertise it takes to do so. You should have security experts in house or at least hire them, you should be doing regular audits and do pen-testing. Don’t take this easy!