Software Development
Tomasz Szkaradek
Tomasz Szkaradek
Head of Development
2021-06-30

GraphQL Ruby. What about performance?

GraphQL, like any technology, has its problems, some of them directly result from the architecture and some are identical to what we see in any other application. However, the solutions are completely different.

To present the problem, let's assume the following application architecture:

https://drive.google.com/file/d/1N4sWPJSls0S8FFHbpHCUVHBNBpEuSsyz/view

And here the corresponding query in GraphQL to download the data. We fetch all links, along with the poster and its links added to the system,

{
  allLinks {
    id
    url
    description
    createdAt
    postedBy {
      id
      name
      links {
        id
      }
    }
  }
}

As displayed below, we can see the classic n + 1 problem with relations here.

  Link Load (0.4ms)  SELECT "links".* FROM "links" ORDER BY created_at DESC
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 40], ["LIMIT", 1]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  Link Load (0.3ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" = ?  [["user_id", 40]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 38], ["LIMIT", 1]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  Link Load (0.1ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" = ?  [["user_id", 38]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 36], ["LIMIT", 1]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  Link Load (0.1ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" = ?  [["user_id", 36]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 34], ["LIMIT", 1]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  Link Load (0.2ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" = ?  [["user_id", 34]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 32], ["LIMIT", 1]]

In this case, it works exactly like this piece of code: Link.all.map(&:user).map(&:links).

We seem to know the solution to the problem: Link.includes(user: :links).map(&:user).map(&:links), but will it really work? Let's check it out!

To verify the fix, I changed the GraphQL query to only use a few fields and no relation.

{
  allLinks {
    id
    url
    description
    createdAt
  }
}

Unfortunately, the result shows that, despite the lack of links in relation to the user and their links, we still attach this data to database query. Unfortunately, they are redundant and, with an even more complicated structure, it turns out to be simply inefficient.

Processing by GraphqlController#execute as */*
  Parameters: {"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n  }\n}", "graphql"=>{"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n  }\n}"}}
  Link Load (0.3ms)  SELECT "links".* FROM "links" ORDER BY created_at DESC
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["id", 40], ["id", 38], ["id", 36], ["id", 34], ["id", 32], ["id", 30], ["id", 28], ["id", 26], ["id", 24], ["id", 22], ["id", 20], ["id", 18], ["id", 16], ["id", 14], ["id", 12], ["id", 10], ["id", 8], ["id", 6], ["id", 4], ["id", 2]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  Link Load (0.3ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["user_id", 2], ["user_id", 4], ["user_id", 6], ["user_id", 8], ["user_id", 10], ["user_id", 12], ["user_id", 14], ["user_id", 16], ["user_id", 18], ["user_id", 20], ["user_id", 22], ["user_id", 24], ["user_id", 26], ["user_id", 28], ["user_id", 30], ["user_id", 32], ["user_id", 34], ["user_id", 36], ["user_id", 38], ["user_id", 40]]
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
Completed 200 OK in 39ms (Views: 0.7ms | ActiveRecord: 0.9ms | Allocations: 8730)

In GraphQL, such problems are solved differently,simply by loading data in batches, assuming that the data is needed when it is put in the query. It is such a lazy loading. One of the most popular libraries is https://github.com/Shopify/graphql-batch/.

Unfortunately, its installation is not as hassle-free as it may seem. The data loaders are available here: https://github.com/Shopify/graphql-batch/tree/master/examples, I mean the RecordLoader class and theAssociationLoader class. Let's classically install the gem 'graphql-batch' library and then add it to our schema, as well as loaders:

# graphql-ruby/app/graphql/graphql_tutorial_schema.rb
class GraphqlTutorialSchema < GraphQL::Schema
  query Types::QueryType
  mutation Types::MutationType
  use GraphQL::Batch
  ...
end

And our types:

# graphql-ruby/app/graphql/types/link_type.rb
module Types
  class LinkType < BaseNode
    field :created_at, DateTimeType, null: false
    field :url, String, null: false
    field :description, String, null: false
    field :posted_by, UserType, null: false, method: :user
    field :votes, [Types::VoteType], null: false

    def user
      Loaders::RecordLoader.for(User).load(object.user_id)
    end
  end
end

# graphql-ruby/app/graphql/types/user_type.rb
module Types
  class UserType < BaseNode
    field :created_at, DateTimeType, null: false
    field :name, String, null: false
    field :email, String, null: false
    field :votes, [VoteType], null: false
    field :links, [LinkType], null: false

    def links
      Loaders::AssociationLoader.for(User, :links).load(object)
    end
  end
end

As a result of using the loaders, we batch the data and we query for data in two simple sql queries:

Started POST "/graphql" for ::1 at 2021-06-16 22:40:17 +0200
   (0.1ms)  SELECT sqlite_version(*)
Processing by GraphqlController#execute as */*
  Parameters: {"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n    postedBy {\n      id\n      name\n      links {\n        id\n      }\n    }\n  }\n}", "graphql"=>{"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n    postedBy {\n      id\n      name\n      links {\n        id\n      }\n    }\n  }\n}"}}
  Link Load (0.4ms)  SELECT "links".* FROM "links"
  ↳ app/controllers/graphql_controller.rb:5:in `execute'
  User Load (0.9ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["id", 2], ["id", 4], ["id", 6], ["id", 8], ["id", 10], ["id", 12], ["id", 14], ["id", 16], ["id", 18], ["id", 20], ["id", 22], ["id", 24], ["id", 26], ["id", 28], ["id", 30], ["id", 32], ["id", 34], ["id", 36], ["id", 38], ["id", 40]]
  ↳ app/graphql/loaders/record_loader.rb:12:in `perform'
  Link Load (0.5ms)  SELECT "links".* FROM "links" WHERE "links"."user_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["user_id", 2], ["user_id", 4], ["user_id", 6], ["user_id", 8], ["user_id", 10], ["user_id", 12], ["user_id", 14], ["user_id", 16], ["user_id", 18], ["user_id", 20], ["user_id", 22], ["user_id", 24], ["user_id", 26], ["user_id", 28], ["user_id", 30], ["user_id", 32], ["user_id", 34], ["user_id", 36], ["user_id", 38], ["user_id", 40]]
  ↳ app/graphql/loaders/association_loader.rb:46:in `preload_association'
Completed 200 OK in 62ms (Views: 1.3ms | ActiveRecord: 1.8ms | Allocations: 39887)

There are also other solutions that solve this problem, such as:

https://github.com/exAspArk/batch-loader#basic-example

Complexity of queries

N + 1 queries are not everything, in GraphQL we can freely carry over the next attributes. By default, it set to 1. This can sometimes be too much for the server, especially in a situation where we can freely nest data. How to deal with it? We can limit the complexity of the query, but to do this, we also need to specify their cost in the attributes. By default it set to 1. We set this cost using the complexity: attribute, where we can enter data: field: links, [LinkType], null: false, complexity: 101. If limiting is to actually work, you still need to introduce the maximum limit to your scheme:

class GraphqlTutorialSchema < GraphQL::Schema
  query Types::QueryType
  mutation Types::MutationType
  use GraphQL::Batch
  max_complexity 100
  ...
end

Tracing

GraphQL processes queries differently, and tracing is not that simple if compares to what we can do locally. Unfortunately, the rack mini profiler or a regular SQL log will not tell us everything and will not point which part of the query is responsible for a given time slice. In the case of GraphQL-Ruby, we can use commercial solutions available here: https://graphql-ruby.org/queries/tracing, or try to prepare our own tracing. Below, the snippet looks like a local tracer.

# lib/my_custom_tracer.rb
class MyCustomTracer < GraphQL::Tracing::PlatformTracing
  self.platform_keys = {
    'lex' => 'graphql.lex',
    'parse' => 'graphql.parse',
    'validate' => 'graphql.validate',
    'analyze_query' => 'graphql.analyze_query',
    'analyze_multiplex' => 'graphql.analyze_multiplex',
    'execute_multiplex' => 'graphql.execute_multiplex',
    'execute_query' => 'graphql.execute_query',
    'execute_query_lazy' => 'graphql.execute_query_lazy'
  }

  def platform_trace(platform_key, key, _data, &block)
    start = ::Process.clock_gettime ::Process::CLOCK_MONOTONIC
    result = block.call
    duration = ::Process.clock_gettime(::Process::CLOCK_MONOTONIC) - start
    observe(platform_key, key, duration)
    result
  end

  def platform_field_key(type, field)
    "graphql.#{type.graphql_name}.#{field.graphql_name}"
  end

  def platform_authorized_key(type)
    "graphql.authorized.#{type.graphql_name}"
  end

  def platform_resolve_type_key(type)
    "graphql.resolve_type.#{type.graphql_name}"
  end

  def observe(platform_key, key, duration)
    return if key == 'authorized'

    puts "platform_key: #{platform_key}, key: #{key}, duration: #{(duration * 1000).round(5)} ms".yellow
  end
end

Installation is also extremely simple, you need to include the tracer information in the schema tracer (MyCustomTracer.new) configuration. As in the example below:

# graphql-ruby/app/graphql/graphql_tutorial_schema.rb
class GraphqlTutorialSchema < GraphQL::Schema
  query Types::QueryType
  mutation Types::MutationType
  use GraphQL::Batch
  tracer(MyCustomTracer.new)
  ...
end

The output from such tracing looks like this:

Started POST "/graphql" for ::1 at 2021-06-17 22:02:44 +0200
   (0.1ms)  SELECT sqlite_version(*)
Processing by GraphqlController#execute as */*
  Parameters: {"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n    postedBy {\n      id\n      name\n      links {\n        id\n      }\n    }\n  }\n}", "graphql"=>{"query"=>"{\n  allLinks {\n    id\n    url\n    description\n    createdAt\n    postedBy {\n      id\n      name\n      links {\n        id\n      }\n    }\n  }\n}"}}
platform_key: graphql.lex, key: lex, duration: 0.156 ms
platform_key: graphql.parse, key: parse, duration: 0.108 ms
platform_key: graphql.validate, key: validate, duration: 0.537 ms
platform_key: graphql.analyze_query, key: analyze_query, duration: 0.123 ms
platform_key: graphql.analyze_multiplex, key: analyze_multiplex, duration: 0.159 ms
  Link Load (0.4ms)  SELECT "links".* FROM "links"
  ↳ app/graphql/graphql_tutorial_schema.rb:21:in `platform_trace'
platform_key: graphql.execute_query, key: execute_query, duration: 15.562 ms
  ↳ app/graphql/loaders/record_loader.rb:12:in `perform'
  ↳ app/graphql/loaders/association_loader.rb:46:in `preload_association'
platform_key: graphql.execute_query_lazy, key: execute_query_lazy, duration: 14.12 ms
platform_key: graphql.execute_multiplex, key: execute_multiplex, duration: 31.11 ms
Completed 200 OK in 48ms (Views: 1.2ms | ActiveRecord: 2.0ms | Allocations: 40128)

Summary

GraphQL is not a new technology anymore, but the solutions to its problems are not fully standardized if they are not part of the library. The implementation of this technology in the project gives a lot of opportunities to interact with the frontend and I personally consider it to be a new quality in relation to what REST API offers.

Get free code review

Why you should (probably) use Typescript

How not to kill a project with bad coding practices?

Data fetching strategies in NextJS