Treediff

フリーミアム
よって treediff | 更新済み il y a 2 mois | Text Analysis
人気

7.6 / 10

レイテンシー

1,003ms

サービスレベル

100%

Health Check

N/A

すべてのチュートリアルに戻る (1)

Diff and Merge Text (and other data)

This API combines simple usage with powerful options to create comparisons that are human readable, or facilitate machine post-processing.

This tutorial walks through all API calls and options using examples:

  • Compare 2 texts
  • Adjust the comparison for your use-case
  • 3-way diffs
  • 3-way merge
  • Compare 2 arbitrary sequences
  • 3-way diff and merge for sequences
  • Treediff and other future plans

Getting started: Compare 2 texts.

Here is a quick browser based set up for our walk-through.

<!DOCTYPE html>
<html>
  <head>
    <script>
      function compare() {
        // API endpoint
        const host = 'treediff.p.rapidapi.com'
        const url = `${host}/diff`

        // Actual data
        const body = JSON.stringify({
          head: 'Current Text',
          compare: 'New Text',
        })

        // Headers and authorization
        const headers = {
          'content-type': 'application/json',
          'x-rapidapi-host': host,
          'x-rapidapi-key': '<YOUR_API_KEY>',
        }

        // API call, and logging the response to the page.
        fetch(url, { method: 'POST', headers, body })
          .then((response) => response.json())
          .then(function render(data) {
            document.getElementById('result').textContent = JSON.stringify(
              data,
              null,
              2
            )
          })
          .catch((error) => {
            console.error(error)
          })
      }
    </script>
  </head>
  <body onload="compare()">
    <pre style="white-space: pre-wrap"><code id="result" /></pre>
  </body>
</html>

When we issue above request, this is the api response.
It is optimized for human readability:

[
  {
    "value": "Current",
    "type": "delete"
  },
  {
    "value": "New",
    "type": "insert"
  },
  {
    "value": " Text",
    "type": "equal"
  }
]

If you’ve just been looking for a text comparison algorithm with human-readable output, you’re all set at this point!

Adjust the comparison

You can adjust the comparison by providing an options object in the request body.

This example request body lists all options and their defaults:

{
  "head": "Quicq skuirrel slow fox",
  "compare": "Quick squirrel quick fire",
  "options": {
    "diff_cleanup": "semantic",
    "diff_edit_cost": 4,
    "diff_timeout": 1
  }
}

The options are:

diff_cleanup

diff_cleanup is the most important option: Select between semantic, efficient, or none.
Default: semantic.

Staying with the same example, using the notation d̶e̶l̶e̶t̶e̶, i̲n̲s̲e̲r̲t̲, equal:

  • semantic produces a human-readable comparison: Quicq̶k̲ sk̶q̲uirrel s̶l̶o̶w̶ ̶f̶o̶x̶q̲u̲i̲c̲k̲ ̲f̲i̲r̲e̲
  • efficient optimizes the comparison’s compute impact for machine processing: Quicq̶ ̶s̶k̶k̲ ̲s̲q̲uirrel s̶l̶o̶w̶ ̶f̶o̶x̶q̲u̲i̲c̲k̲ ̲f̲i̲r̲e̲
    (for details, see ‘diff_edit_cost’)
  • none doesn’t apply any postprocessing: Quicq̶k̲ sk̶q̲uirrel s̶l̶o̶w̶fo̶x̶i̲r̲e̲

diff_edit_cost

The diff_edit_cost option controls how diff_cleanup = 'efficient' works and only affects that mode.

diff_edit_cost sets a cost for creating a new edit as compared to adding extra characters into an existing edit.
For example: A value of 4 means we will prefer to expand the length of an existing edit by three characters if that eliminates another edit.

In other words: Using diff_cleanup = 'efficient', we compute the smallest count of edits under the condition that we accept up to diff_edit_cost characters into a single edit in case that saves us other edits.

? Try setting diff_edit_cost to 8 in the above example to receive less edits with more characters each!

diff_timeout

diff_timeout is available to the ULTRA and MEGA plans.
The default maximum time the diff spends on each phase is limited to one second which is more than enough for most use-cases.
This option lets you increase that time to up to 6 seconds which lets it find its most optimal solution for very long and random diffs.
Diff results are always valid even if a timeout occurs.

Learn more

If you’d like to know more details on how the comparison algorithm works, here is a deep read on the algorithm and these options.

3-way diffs

Three-way diffs (“diff3”) are requested in the exact same way as the 2-way diffs described above. Just add a base parameter to the request body:

{
  "head": "The fox jumps over the lazy dog",
  "compare": "The quick brown fox jumps over the dog",
  "base": "The fox jumps over the dog",
  "options": {
    "diff_cleanup": "semantic"
  }
}

The meaning of the parameters is:

  • head: the current value
  • compare: the value to compare against
  • base: the common ancestor value that both head and compare have originated from.

Here is the result of a diff call with this body:

[
  {
    "content": {
      "base": "The ",
      "head": "The ",
      "compare": "The "
    }
  },
  {
    "content": {
      "base": "",
      "head": "",
      "compare": "quick brown "
    }
  },
  {
    "content": {
      "base": "fox jumps over the ",
      "head": "fox jumps over the ",
      "compare": "fox jumps over the "
    }
  },
  {
    "content": {
      "base": "",
      "head": "lazy ",
      "compare": ""
    }
  },
  {
    "content": {
      "base": "dog",
      "head": "dog",
      "compare": "dog"
    }
  }
]

The previous 2-way diff algorithm will run using the diff_* options, and then pass its results to the diff3 algorithm.

3-way diffs can have conflicts. If one of the objects in the response is a conflict, it will be labeled with conflict: true:

{
  "conflict": true,
  "content": {
    "base": "fix",
    "head": "fax",
    "compare": "fox"
  }
}

A word on 3-way diffs

diff3 is a straightforward algorithm, but it does depend on the two-way diffs passed to it and is sensitive to changes in diff options - as everyone who has dealt with surprising git merge conflicts can tell!

3-way merge

Most use-cases won’t require a 3-way diff in its raw form, except for troubleshooting.
Instead, most of the time we’re interested in the 3-way merge.

Starting from the previous diff3 call, just direct that same request to the /merge endpoint.
In our snippet from the top this is achieved by changing this line:

const url = `${host}/merge`

The 3-way merge result for the jumping fox example is then simply:

[{ "content": "The quick brown fox jumps over the lazy dog" }]

But 3-way diffs can also have conflicts.
Let’s produce a conflict by using this request body:

{
  "head": "Our version",
  "compare": "Your version",
  "base": "Original version"
}

The response has the conflicted merge as follows:

[
  {
    "conflict": true,
    "content": { "base": "Original", "head": "Our", "compare": "Your" }
  },
  {
    "content": " version"
  }
]

? Try the same body with added option to show how this conflict depends on the diff algorithm:

{ "options": { "diff_cleanup": "none" } }

You can see how the merge turns out “wrong” when less semantic meaning is maintained in the diff! (Post this to the /diff endpoint to see why the “iginal” went missing.)

Compare arbitrary sequences

Going back to our /diff endpoint

const url = `${host}/diff`

Comparing lists of arbitrary values works just like comparing text, though without cleanup functions or options.

Item equality is assessed using the lodash/isEqual function.

An example request body:

{
  "head": [{ "id": 4 }, { "something": "else" }, 0],
  "compare": [{ "id": 4 }, { "something": "else" }, { "id": 7 }]
}

3-way sequence comparison

Of course, comparing and merging works for arbitrary values as well. Using the merge endpoint:

const url = `${host}/merge`

merging this example:

{
  "head": [{ "a": "b" }, { "add": "this" }, "a text", 1, 2],
  "compare": [{ "a": "b" }, "a text", 1, 2],
  "base": [{ "a": "b" }, "a text", 100]
}

leads to this result:

[
  {
    "content": [
      { "a": "b" },
      {
        "add": "this"
      },
      "a text",
      1,
      2
    ]
  }
]

Treediff and other future plans

Did you come here for our “treediff” name?

We internally run a three-way diff algorithm for deeply structured text (ordered trees).

The diff part implements concepts similar to ChangeDistiller (paper), and our own additional concepts to make it more robust for structured free text.
The merge part implements concepts related to 3DM (paper).

It will take a lot of work to generalize and expose this algorithm here. You can let us know that it’s worth it by sponsoring our MEGA plan and sending us your use-case, so we can check if we could support it.