Metrx Factory

免费增值
通过 metrx21 | 已更新 לפני יומיים | Sports
人气

9.2 / 10

延迟

567ms

服务等级

100%

Health Check

95%

返回全部教程 (3)

Plotting a scores probability matrix

_

The last few years Expected Goals have been declared by media to be one of the most significant metrics when evaluating match results or predicting upcoming games. The xG stat has proved to be a good indicator of how many goals a team will likely score and a benchmark for estimating total number of goals and team handicaps.

But in fact it remains a pretty rough approximation as it simply represents a sum of probabilities. Football (soccer) is a game that underlies a multitude of actions happening by chance. So when we had an xG value of 1.5 one might assume chances for a team to score one or two goals in a game are equal. Though, this is commonly not the case since it depends on the underlying scores distribution which stands in an interdependency with the opponent’s rating.

An insight into this estimation is provided by exact scores probabilities. These allow us to determine the range of results most likely to happen and the chances a certain number of goals will be scored, respectively.

_

Let’s demonstrate it by means of a sample whose final result met the overall expectation. Such as Paris SG’s 3-0 victory against Bordeaux in the French Ligue 1 2021/22 season. In order to fetch score details about this match we’ll have to call the API’s Get Match Metrics endpoint which is the central operation for match based figures.

Step 1

We define the overall procedure for our use case and request the team we’re interested in. This is accomplished with a trivial call of the Get Teams endpoint.


BASE_URL = 'https://metrx-factory.p.rapidapi.com/v1';
HEADERS = {
  x-rapidapi-host: 'metrx-factory.p.rapidapi.com'
  x-rapidapi-key: 'your-app-key'
}

START
   team = get_team('Paris S')
   probs = get_probabilities(team['id'], DATE('20220313'))
   plot_results(probs)
END

FUNCTION get_team(nameLike)
  uri = HTTP.uri(BASE_URL, '/teams', '?nameLike=', nameLike)
  json = PARSER.json(HTTP.request('GET', uri, HEADERS))
  IF json['success']
    return json['result'][0]
  ELSE
    PRINTER.error('Failed to get team: ', json['error'])
    return
  ENDIF
END

Step 2

We query result probabilities using the SP metric projection. A projection represents a content filter that gives control which data should be returned by the endpoint.

As we only know the date when the match took place we set a match start range of one day. This should uniquely identify the favored match. Alternatively, we might call the operation with a unique match identifier but this would require us an additional query of the Get Matches endpoint first. So we may save an API call at this point.

Finally, the probabilities collection is part of goals expectancy within the response model:


FUNCTION get_probabilities(teamId, date)
  uri = HTTP.uri(BASE_URL, '/match-metrics', '?teamId=', teamId, '&minStart=', FORMAT(date), '&maxStart=', FORMAT(ADD(date,'d', 1)), '&projection=SP')
  json = PARSER.json(HTTP.request('GET', uri, HEADERS))
  IF json['success']
    return json['result'][0]['scores']['goals']['probabilities']
  ELSE
    PRINTER.error('Failed to get scores probabilities: ', json['error'])
    return
  ENDIF
END

Step 3

The above API call returned all possible results for team goals not greater than eight (and whose probability is not below 0.1%, by default). Obviously, we may build up a grid with nine columns and nine rows, each entry representing a correct score. We may plot these numbers as a heat map in terms of result probability. If a result is not covered in the response then we easily print a placeholder knowing well that its value is below the default minimum.


FUNCTION plot_results(probabilities)
  FOR h = -1 TO 8
    FOR a = -1 TO 8
      IF h>0 AND a>0
        p = probabilities[home=h, away=a]['probability']
        IF p
          PRINTER.cell(FORMAT(p, '%'))
        ELSE
          PRINTER.cell('<0.1%')
        ENDIF
      ELSEIF h>=0
        PRINTER.cell(h)
      ELSEIF a>=0
        PRINTER.cell(a)
      ELSE
        PRINTER.cell('Result')
      ENDIF
    ENDFOR
    PRINTER.lb()
  ENDFOR
END

Scores probability matrix

_

More statistics

Further on we are able to derive distributions for individual team goals and total goals from these figures.

In this case we have to accumulate even low numbers for higher scores to get exact results. Therefore we set the SPM (= probability minimum) configuration parameter to zero and we’ll get all scores below the default threshold as well.


FUNCTION get_probabilities(teamId, date)
  uri = HTTP.uri(BASE_URL, '/match-metrics',
	       '?teamId=', teamId, '&minStart=', FORMAT(date), '&maxStart=', FORMAT(ADD(date,'d', 1)), '&projection=SP', '&configuration=SPM:0')
  ...
END

A) Team goals

Plotting team goals is a straightforward task as we only need to sum up probabilities per team score (e.g. Bordeaux’s expectancy to score exactly one goal equals to the sum of p(0-1), p(1-1), p(2-1), p(3-1) and so on). You will likely argue that this approach does not include results like 9-1 or 10-1 since the team score is limited by the endpoint. That’s true but for most games high team scores like these show insignificant values.


FUNCTION plot_team_goals(probabilities)
  teamNames = ARRAY['Home team', 'Away team']
  teamProbs = ARRAY[2][9] of FLOAT
  FOR p IN probabilities
    teamProbs[0][p['home']] += p['probability']
    teamProbs[1][p['away']] += p['probability']
  ENDFOR
  PRINTER.cell('Team goals')
  FOR i = 0 TO 8
    PRINTER.cell(i)
  ENDFOR
  PRINTER.lb()
  FOR i = 0 TO 1
    PRINTER.cell(teamNames[i])
    FOR p IN teamProbs[i]
      PRINTER.cell(FORMAT(p, '%'))
    ENDFOR
    PRINTER.lb()
  ENDFOR
END

B) Total goals

We should consider these thoughts when accumulating total goals as they sum up both teams. That’s why we include the probabilities uncovered by the API response which equals to 1 minus the sum of those covered. As a consequence, calculating the likelihood that more or equal a certain number of goals will be scored equals to 1 minus the probabilities sum of total scores below that threshold (e.g. for two that is 1 - (p(0-0) + p(1-0) + p(0-1)).


FUNCTION plot_total_goals(probabilities)
  lowEqProbs = ARRAY[9] of FLOAT
  FOR p IN probabilities
    total = p['home'] + p['away']
    FOR t = total TO 8
      lowEqProbs[t] += p['probability']
    ENDFOR
  ENDFOR
  PRINTER.cell('Total goals')
  FOR i = 0 TO 8
    PRINTER.cell(i)
  ENDFOR
  PRINTER.lb()
  PRINTER.cell('greater equals')
  PRINTER.cell(FORMAT(1.0, '%'))
  FOR i = 1 TO 8
    p = 1 - lowEqProbs[i - 1]
    PRINTER.cell(FORMAT(p, '%'))
  ENDFOR
  PRINTER.lb()
  PRINTER.cell('lower equals')
  FOR i = 0 TO 8
    p = lowEqProbs[i]
    PRINTER.cell(FORMAT(p, '%'))
  ENDFOR
  PRINTER.lb()
END

And here are all the stats we got so far only from the SP metric projection:

Scores probability matrix

_

In contrast to the example above we have a look at a match with an extraordinary result: Liverpool’s win over Dortmund in 2016 which ended in a 4-3 victory after the Reds recovered from a 1-3 deficit. When plotting result probabilities we discover this can be expected only once in 400 games.

Scores probability matrix

_

Of course no distribution will show this result to be the most probable one. At least not in football/soccer since it’s a low scoring sport. The good thing about it is you have all these number at your fingertips now for properly assessing matches not even started.