Introducing Speakeasy Suggest - Automatic OpenAPI Spec Maintenance

Introducing Speakeasy Suggest - Automatic OpenAPI Spec Maintenance

How we built an OpenAPI LLM agent.

Speakeasy Suggest, our AI-powered tool for automatically improving OpenAPI specs, is available now through the Speakeasy CLI or as a Github workflow. Provide your own OpenAI API key, or simply use ours by default. We’d love to hear your feedback in our Slack community!

Stay tuned for more posts on Suggest output as Github PRs when configuring it as a workflow, interesting results on a variety of specs, and its downstream effects on Speakeasy-generated SDKs!

✨ We plan on open sourcing the core of our LLM-based agent for general-purpose use. Its primary function will be to serve as a JSON document transformer that receives a custom validation function, list of errors, and original document as inputs.

Background

At Speakeasy, we’ve been automating the creation of high-quality developer surfaces for API companies around the world. These include SDKs across multiple languages and even Terraform providers. Today, our generation logic is built on top of OpenAPI, requiring users to supply an OpenAPI schema to effectively use our tooling and generate these surfaces.

Unfortunately, OpenAPI is a complicated specification with lots of potential for misconfiguration and unnecessary repetition. For example, the following schema operations contain duplicate operationIDs (i.e., listDrinks), duplicate inline schema objects, and no examples.

...
/drinks:
    get:
      operationId: listDrinks
      summary: Get a list of drinks.
      description: Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.
      security:
        - {}
      tags:
        - drinks
      parameters:
        - name: drinkType
          in: query
          description: The type of drink to filter by. If not provided all drinks will be returned.
          required: false
          schema:
            $ref: "#/components/schemas/DrinkType"
      responses:
        "200":
          description: A list of drinks.
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
                  properties:
                    name:
                      description: The name of the drink.
                      type: string
                    type:
                      $ref: "#/components/schemas/DrinkType"
                    price:
                      description: The price of one unit of the drink in US cents.
                      type: number
                    stock:
                      description: The number of units of the drink in stock, only available when authenticated.
                      type: integer
                      readOnly: true
                    productCode:
                      description: The product code of the drink, only available when authenticated.
                      type: string
                  required:
                    - name
                    - price
...
/drink/{name}:
  get:
    operationId: listDrinks
    summary: Get a drink.
    description: Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.
    tags:
      - drinks
    parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
    responses:
      "200":
        description: A drink.
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  description: The name of the drink.
                  type: string
                type:
                  $ref: "#/components/schemas/DrinkType"
                price:
                  description: The price of one unit of the drink in US cents.
                  type: number
                stock:
                  description: The number of units of the drink in stock, only available when authenticated.
                  type: integer
                  readOnly: true
                productCode:
                  description: The product code of the drink, only available when authenticated.
                  type: string
              required:
                - name
                - price
...

As a result of this complexity, many companies create a spec to benefit from the vast ecosystem of OpenAPI tooling (e.g. docs providers), but they don’t have resources or processes in place to maintain it as a first-class resource.

At Speakeasy, we’ve worked with many users’ specs to ensure they are compliant, clean, and accurately depict the state of their API. By leveraging our OpenAPI expertise and custom spec extensions, we’ve been able to create specs that produce idiomatic and human-readable SDKs for dozens of companies. But wouldn’t it be great if we could automate this? Enter Speakeasy Suggest.

Building Speakeasy Suggest

What is it?

Speakeasy Suggest is our LLM-based agent that, given an OpenAPI document, automatically suggests fixes, applies them, and outputs the modified spec. Using AI, we’re able to offload the burden of spec management from API producers. Suggest can be invoked through both the Speakeasy CLI (outputs modified spec to the local filesystem) and our GitHub Action (creates PR) today.

Applying Suggest on the invalid spec above, we produce a valid document with unique operationIDs, re-use of common objects where applicable, and better examples.

components:
  schemas:
    ...
    Drink:
      type: object
      properties:
        name:
          description: The name of the drink.
          type: string
        type:
          $ref: "#/components/schemas/DrinkType"
        price:
          description: The price of one unit of the drink in US cents.
          type: number
        stock:
          description: The number of units of the drink in stock, only available when authenticated.
          type: integer
          readOnly: true
        productCode:
          description: The product code of the drink, only available when authenticated.
          type: string
      required:
        - name
        - price
      example:
        name: Old Fashioned
        type: cocktail
        price: 1000
        stock: 50
        productCode: EX123
...
/drinks:
    get:
      operationId: listDrinks
      summary: Get a list of drinks.
      description: Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.
      security:
        - {}
      tags:
        - drinks
      parameters:
        - name: drinkType
          in: query
          description: The type of drink to filter by. If not provided all drinks will be returned.
          required: false
          schema:
            $ref: "#/components/schemas/DrinkType"
      responses:
        "200":
          description: A list of drinks.
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: "#/components/schemas/Drink"
        ...
/drink/{name}:
  get:
    operationId: getDrink
    summary: Get a drink.
    description: Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.
    tags:
      - drinks
    parameters:
      - name: name
        in: path
        required: true
        schema:
          type: string
        example: CocaCola
    responses:
      "200":
        description: A drink.
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/Drink"
      ...
...

For the remainder of this post, we’d like to delve deeper into the decision-making and engineering challenges behind Speakeasy Suggest. To do that, we’ll start from the beginning.

Good Input = Good Output

Just as a polished OpenAPI spec may be used to generate high-quality SDKs, an LLM requires input with sufficient specificity and detail to produce useful suggestions for a document. It became quickly obvious to us that dumping a spec and naively asking an LLM to “make it better” would not be fruitful.

We found that our in-house validator (i.e. speakeasy validate from our CLI) would be effective at adding a more specific, deterministic context to our LLM prompt. Our validator is built using vacuum, enabling us to specify custom rules. Since these rules are defined by us, they output fixed error messages and line numbers for an array of spec issues. Below is a table highlighting a few of the errors, the suggestions we’d hope to receive from the LLM when including them in the prompt, and explanations of the resulting effects on Speakeasy-generated SDKs when applying those suggestions to the OpenAPI spec.

Validation ErrorSuggestionExplanation
duplicate operationId fields within same tagChange value of offending operationId'sCompiles without method names conflicting to get a functional SDK
duplicate inline object schemasMove to shared componentMuch shorter and more human-looking class names
missing examplesAdd examples for parameter, request bodies, responses, etc.Generated usage snippets are more human-looking

Each of the validation errors above is present in our invalid spec. So let’s examine the generated Go SDK code before and after running Speakeasy Suggest on the document.

Before

 // ListDrinks - Get a drink.
// Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.
func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) {
  ...
  switch {
    case httpRes.StatusCode == 200:
      switch {
      case utils.MatchContentType(contentType, `application/json`):
        var out *operations.ListDrinks200ApplicationJSON
        if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil {
          return nil, err
        }

        res.ListDrinks200ApplicationJSONObject = out
      ...
      }
    ...
  }
  return res, nil
}

// ListDrinks - Get a list of drinks.
// Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.
func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) {
  ...
  switch {
    case httpRes.StatusCode == 200:
      switch {
      case utils.MatchContentType(contentType, `application/json`):
        var out []operations.ListDrinks200ApplicationJSON
        if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil {
          return nil, err
        }

        res.ListDrinks200ApplicationJSONObjects = out
      ...
      }
    ...
  }
  return res, nil
}

After

// GetDrink - Get a drink.
// Get a drink by name, if authenticated this will include stock levels and product codes otherwise it will only include public information.
func (s *drinks) GetDrink(ctx context.Context, request operations.GetDrinkRequest) (*operations.GetDrinkResponse, error) {
  ...
  switch {
    case httpRes.StatusCode == 200:
      switch {
      case utils.MatchContentType(contentType, `application/json`):
        var out *shared.Drink
        if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil {
          return nil, err
        }

        res.Drink = out
      ...
      }
    ...
  }
  return res, nil
}

// ListDrinks - Get a list of drinks.
// Get a list of drinks, if authenticated this will include stock levels and product codes otherwise it will only include public information.
func (s *drinks) ListDrinks(ctx context.Context, request operations.ListDrinksRequest) (*operations.ListDrinksResponse, error) {
  ...
  switch {
    case httpRes.StatusCode == 200:
      switch {
      case utils.MatchContentType(contentType, `application/json`):
        var out []shared.Drink
        if err := utils.UnmarshalJsonFromResponseBody(bytes.NewBuffer(rawBody), &out); err != nil {
          return nil, err
        }

        res.Drinks = out
      ...
      }
    ...
  }
  return res, nil
}

The code in “Before” would not even compile due to duplicate method names from the conflicting operationIDs. As such, we wouldn’t even generate the SDK without throwing a validation error but have done so here simply to show the differences in output. Also, notice the stark difference in the generated class names the successful responses are being deserialized to (i.e. ListDrinks200ApplicationJSON vs. Drink).

Let’s also examine the differences in the README’s usage snippets your users will reference in their implementation.

Before

...
func main() {
  s := sdk.New(
      sdk.WithSecurity(shared.Security{
          APIKey: "",
      }),
  )

  ctx := context.Background()
  res, err := s.Drinks.ListDrink(ctx, operations.ListDrinkRequest{
      Name: "Willie Gulgowski DVM",
  })
  if err != nil {
      log.Fatal(err)
  }

  if res.ListDrink200ApplicationJSONObject != nil {
      // handle response
  }
}
...

After

...
func main() {
  s := sdk.New(
      sdk.WithSecurity(shared.Security{
          APIKey: "",
      }),
  )

  ctx := context.Background()
  res, err := s.Drinks.GetDrink(ctx, operations.GetDrinkRequest{
      Name: "CocaCola",
  })
  if err != nil {
      log.Fatal(err)
  }

  if res.Drink != nil {
      // handle response
  }
}
...

In addition to better method names and more human-readable classnames, the example value for the drink name parameter is much more relevant. This makes the generated SDK easier for end-users to consume and integrate into their applications.

The validation errors we’ve applied suggestions for are just a small sample of the improvements we can make to an OpenAPI spec. By including them in the prompt, we’ve enabled a feedback loop where even non-critical errors that don’t block SDK generation are encouraged to be added if addressing them improves the quality of the SDK output.

Let’s delve into how we receive these suggestions from the LLM in a format that allows us to apply them to the spec.

Structured Output Parser

In our initial implementation of Suggest, we were only focused on reporting suggestions instead of applying them. Unfortunately, this requires a human in the loop to copy-paste the suggestions themselves. To avoid potential errors and automate the spec management, we needed to be involved in the process of applying the suggestions to our input document as well. This would enable us to revalidate the document upon each applied fix and ensure that overall document correctness was being improved.

Modifying an OpenAPI schema requires us to receive structured output from the LLM that we can apply to the document. We decided to use JSON patch for the format of the suggestion itself. Using langchain-js, this involved defining a custom agent with an output parser specifying the zod schema we pass in to ensure we get LLM output in the right format.

// zod schema
export const SUGGESTION_SCHEMA = z.object({
  suggested_fix: z.string().describe(
    `a simple English description of the suggested JSON patch.
      For example, the JSON patch [{"op": "replace", "path": "/paths/~1pet/post/operationId", "value": "addPet"}] could be described as "Replace the operationId of the POST /pet endpoint with addPet`
  ),
  json_patch: z.string().describe(`suggested fix in the form of a JSON Patch`),
  reasoning: z
    .string()
    .describe(
      "a brief explanation of why you think the suggested fix will resolve the error"
    ),
});

...

// output parser which verifies LLM's final response is formatted according to the zod schema
class JsonOutputParser<T> extends AgentActionOutputParser {
  lc_namespace = ["langchain", "agents", "custom_llm_agent"];
  schema: z.ZodType<T>;

  constructor(schema: z.ZodType<T>) {
    super();

    this.schema = schema;
  }

  async parse(text: string): Promise<AgentAction | AgentFinish> {
    console.log("\n" + text);

    const answerMatch = /```json(.*)```/gms.exec(text);
    if (answerMatch) {
      try {
        const json = JSON.parse(answerMatch[1].trim());

        // This will throw an error if the json does not match the schema.
        this.schema.parse(json);

        return {
          log: "Got answer",
          returnValues: json,
        };
      } catch (e) {
        console.log(
          "Answer is invalid JSON or does not match schema. Trying again..."
        );
        return {
          tool: "logger",
          toolInput: `Error: ${e}`,
          log: `Final answer is invalid. It does not conform to the requested output schema. Output schema: ${JSON.stringify(
            zodToJsonSchema(this.schema)
          )}`,
        };
      }
    }
    ...
  }
  ...
}

This ensured responses were in a form that was easy for us to extract the JSON patch from. However, we soon found that our agent would hallucinate invalid JSON patches that referenced keys that didn’t exist in the OpenAPI document. This required augmenting our agent with another tool to verify the validity of these patches by checking if we could apply them.

export class JsonPatchValidator extends DynamicTool {
  static description = `Use this to verify that a given JSON patch is valid.
Always use this tool to check the validity of your final suggestion.
If your suggestion is invalid, use the error message to try to fix it.
Do not return anything this tool says is invalid as your answer.
This tool will return "valid" if the patch is valid, or an error message if it is not.

Example input to this tool:
  [{{ "op": "replace", "path": "/baz", "value": "boo" }}]

Example output: "valid", "error: path /baz does not exist"`;

  constructor(spec: JSON) {
    super({
      name: "json_patch_valid",
      func: async (input) => {
        input = removeQuotes(input);

        try {
          apply_patch(spec, input);
          return "valid";
        } catch (e: any) {
          return `error: ${e}`;
        }
      },
      description: JsonPatchValidator.description,
    });
  }
}

💡For a yaml spec, Speakeasy Suggest takes extra caution by converting it to JSON before applying the patch and converting it back to yaml while preserving the original key ordering.

We’ll discuss several of the tools we equipped our agent within the next section, where we detail the decision-making process of moving forward with a custom agent.

Benchmarking Different Approaches

There were two main approaches we tested — Custom Agent vs. Retrieval Augmented Generation (RAG). In order to not exceed GPT-4’s token limits, we provide a small range of lines surrounding the error from the spec to the LLM prompt. As a result, both approaches need to be capable of enabling the LLM to gather information from the full spec in order to return valid JSON patches. The primary differentiator is how this is achieved:

  • An agent utilizes tools to enable the LLM to execute custom logic to traverse the spec

  • In RAG, we use external data to augment our prompts. In our case, this involves the standard method of using a vector store to persist embeddings of the spec, which the LLM can quickly index when trying to answer the prompt.

Agent with Custom Tools

Directly from the langchain docs: “In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.” Each action an agent can take corresponds to one of the tools defined by us that executes a custom function. Since we’re using GPT-4 as our LLM, this takes advantage of OpenAI’s function calling. Our tools take the OpenAPI spec as input and are capable of performing a range of operations on it.

Searching for line numbers that contain a particular string

The LLM may need to locate regions of the spec that are not in close proximity to the validation error but contain a specific key.

WARN validation warn: [line 63] oas3-unused-component - `#/components/schemas/Pets` is potentially unused or has been orphaned

Using model gpt-4-0613
Running agent...

The error message indicates that the schema 'Pets' is potentially unused or has been orphaned. This means that it is not referenced anywhere in the document. To confirm this, I will use the document_text_searcher tool to search for any references to 'Pets' in the document. If there are no references, the schema 'Pets' can be removed from the document. If there are references, they need to be replaced with the correct schema.
Action: document_text_searcher
Action Input: 'Pets'

From the LLM logs, we can see that the document_text_searcher tool searches the document for Pets to see if any usage of this component exists across the schema. The wonderful thing about agents is that the result of this action can be fed as input to other tools before returning a final answer (see get_line_range below).

Retrieving specific line ranges

This is helpful since our validation errors - and the output of other tools - report the offending line number. The LLM uses this to retrieve the region of the spec it needs more information from or where most fixes are likely to be applied.

Following the output of document_text_searcher above, the LLM receives appropriate line numbers to conclude its search and validate the final answer.

The 'Pets' schema is referenced on lines 5 and 99. Line 99 is the definition of the 'Pets' schema itself, so it doesn't count as a usage. I need to check line 5 to see how 'Pets' is used. I will use the get_line_range tool to retrieve lines 1-20 for context.
Action: get_line_range
Action Input: '1-20'

The 'Pets' schema is not used in lines 1-20. It seems that the 'Pets' schema is indeed unused. I will suggest a JSON patch to remove the 'Pets' schema from the document. I will also validate the JSON patch using the json_patch_valid tool.
Action: json_patch_valid
Action Input: [{"op": "remove", "path": "/components/schemas/Pets"}]

The JSON patch is valid. I will now provide the final answer.
Final Answer:
{
  "suggested_fix": "Remove the unused 'Pets' schema from the components.",
  "json_patch": "[{\"op\": \"remove\", \"path\": \"/components/schemas/Pets\"}]",
  "reasoning": "The 'Pets' schema is not referenced anywhere in the document, making it unused and can be safely removed."
}

Resolving $ref definitions

Many OpenAPI specs contain $ref keys that point to definitions living in different parts of the document. The LLM should be able to locate the original definition from this when it needs to use it to suggest a fix. This is useful when we want to suggest an example for a component in the OpenAPI spec.

INFO    validation hint: [line 104] missing-examples - Missing example for component. Consider adding an example

Using model gpt-4-0613
Running agent...

The error message indicates that there is a missing example for a component. Looking at the surrounding context, it seems that the error is on line 162, which corresponds to the "Pets" schema. This schema is an array of "Pet" objects. To fix this error, we need to add an example for the "Pets" schema. The example should be an array of "Pet" objects. We can use the "Pet" schema to create a valid example.

Action: ref_resolver
Action Input: #/components/schemas/Pet

The "Pet" schema requires an "id" and a "name", both of which are strings. An example "Pet" could be {"id": 1, "name": "Fido", "tag": "dog"}. We can use this to create an example for the "Pets" schema, which is an array of "Pet" objects. The JSON patch to add this example would be [{"op": "add", "path": "/components/schemas/Pets/example", "value": [{"id": 1, "name": "Fido", "tag": "dog"}]}]. Let's validate this patch.

Action: json_patch_valid
Action Input: [{"op": "add", "path": "/components/schemas/Pets/example", "value": [{"id": 1, "name": "Fido", "tag": "dog"}]}]

The JSON patch is valid. Therefore, we can suggest this patch as a fix for the error. The patch adds an example to the "Pets" schema, which is an array of "Pet" objects. The example is a valid "Pet" object, as it includes the required "id" and "name" properties.

Final Answer:
{
  "suggested_fix": "Add an example to the 'Pets' schema. The example should be an array of 'Pet' objects. An example 'Pet' object could be {'id': 1, 'name': 'Fido', 'tag': 'dog'}.",
  "json_patch": "[{\"op\": \"add\", \"path\": \"/components/schemas/Pets/example\", \"value\": [{\"id\": 1, \"name\": \"Fido\", \"tag\": \"dog\"}]}]",
  "reasoning": "The error message indicates that there is a missing example for a component. The 'Pets' schema is an array of 'Pet' objects, so the example should also be an array of 'Pet' objects. The 'Pet' schema requires an 'id' and a 'name', so the example 'Pet' object includes these properties."
}

By equipping the agent with the necessary toolkit to explore the spec JSON, we are able to traverse the document without exceeding the TPM (tokens per minute) limit. However, since each initial query to the agent can chain through a sequence of actions before returning a final answer, we wanted to assess another approach that would limit the number of OpenAI calls.

Retrieval Augmented Generation (OpenAPI Spec Embeddings in Chroma)

We use embeddings to store the spec for quick similarity searches without requiring the full contents to be passed into the prompt, enabling us to respect the token limit. Specifically, we used RetrievalQAChain with the OpenAPI spec embedded in Chroma.

Results

We compared the execution time, number of OpenAI calls, and the accuracy of both approaches above. Although RAG seemed to perform fine on most simple errors, complex tasks such as moving duplicate inline schemas to shared components seemed impossible. At first, the returned JSON patch for such a task seems reasonable.

INFO {"error":"validation hint: [line 22] duplicate-schemas - Identical inline object schemas found: [line 22] - [line 31] conflicts with [line 107] - [line 116]. Consider moving to a shared component"}

Asking for a Suggestion!

[{"op": "add", "path": "/components/schemas/Error", "value": {"type": "object", "required": ["code", "message"], "properties": {"code": {"type": "integer", "format": "int32"}, "message": {"type": "string"}}}}, {"op": "replace", "path": "/paths/~1pets/get/parameters/0/schema", "value": {"$ref": "#/components/schemas/Error"}}, {"op": "replace", "path": "/components/schemas/Error", "value": {"$ref": "#/components/schemas/Error"}}]

But upon further review, we found it added the shared component and then swapped out its own schema definition with the $ref, thinking it was one of the duplicate schemas itself despite having a clear line number to check against. Since RAG can’t gather more context via function calls, it would use a similarity search to suggest fixes for same-name fields on the wrong lines. Due to this behavior, this approach wasn’t sufficient for returning JSON patches with the correct paths.

The agent fared much better on all validation errors ranging across a broad span of complexity. Despite the fact that it took 2-3x as long to execute, the agent’s ability to “reason” through a sequence of actions via the execution of custom functions proved to be critical for good suggestion quality. The ability to equip the agent with any new tool we could think of makes it more flexible and future-proof for our use-case than embedding the OpenAPI schema. This is the most important factor in ensuring Suggest is a useful product, as OpenAPI specs vary greatly in shape, purpose, and functionality, much like the tools and people who produce them.

Conclusion

Even after settling on the agent approach, suggestion quality, rate limits, and execution time were all variables that still needed to be addressed. Our team has been tackling these roadblocks by implementing parallelization, passing in additional guidance to the prompt for specific validation errors, and benchmarking various LLMs post fine-tuning.

Want to try it for yourself? Head over to the Speakeasy Suggest docs, or just copy paste this workflow file into your own github repo to get started!