Extract Table - DocumentDev

GRATIS CON POSSIBILITÀ DI UPGRADE
Verified
Da documentdev | Aggiornamento 2 महीने पहले | Text Analysis
Popolarità

4.9 / 10

Latenza

8,851ms

Livello di servizio

100%

Health Check

N/A

Torna a tutti i tutorial (1)

Extracting Tables from PDFs

How to Extract Tables from a PDF using Extract Table by Document.dev

Requirements:

  • Python 3
  • requests library (install via python -m pip install requests)
  1. For the purposes of this tutorial and demo you will need a file from this dataset. The file we will be using is called “us-002.pdf”, go ahead and copy that file to your working directory.

Example Input Table from us-002.pdf

  1. Create a blank file called test.py in your working directory
  2. Insert the following code and replace YOUR_KEY with your x-rapidapi-key which can be found here under Endpoints->Headers->X-RapidAPI-Key
import requests

url = "https://extract-table-documentdev.p.rapidapi.com/extracttable"

payload =open('us-002.pdf', 'rb')
headers = {
    'content-type': "application/octet-stream",
    'pages': "2",
    'x-rapidapi-key': "YOUR_KEY",
    'x-rapidapi-host': "extract-table-documentdev.p.rapidapi.com"
    }

response = requests.request("POST", url, data=payload, headers=headers)

print(response.text)
  1. That’s it! Run py test.py in your terminal at the root of the workspace. You will receive a response structured similar to the one below. Note: Pages is an optional header, by default set to 1. Allows a number between 1-10 inclusive as input or the word “all”.
{"tables": [{"stats": {"accuracy": 99.18, "whitespace": 4.44, "order": 1, "page": 2}, "titleEstimate": " Table 2: NNI Budget, by Agency, 2009\u20132011\n(dollars in millions)", "data": {"Agency": {"DOE": {"2009 Actual": "332.6", "2009 Recovery": "293.2", "2010 Estimated": "372.9", "2011 Proposed": "423.9"}, "NSF": {"2009 Actual": "408.6", "2009 Recovery": "101.2", "2010 Estimated": "417.7", "2011 Proposed": "401.3"}, "HHS/NIH": {"2009 Actual": "342.8", "2009 Recovery": "73.4", "2010 Estimated": "360.6", "2011 Proposed": "382.4"}, "DOD": {"2009 Actual": "459.0", "2009 Recovery": "0.0", "2010 Estimated": "436.4", "2011 Proposed": "348.5"}, "DOC/NIST": {"2009 Actual": "93.4", "2009 Recovery": "43.4", "2010 Estimated": "114.4", "2011 Proposed": "108.0"}, "EPA": {"2009 Actual": "11.6", "2009 Recovery": "0.0", "2010 Estimated": "17.7", "2011 Proposed": "20.0"}, "HHS/NIOSH": {"2009 Actual": "6.7", "2009 Recovery": "0.0", "2010 Estimated": "9.5", "2011 Proposed": "16.5"}, "NASA": {"2009 Actual": "13.7", "2009 Recovery": "0.0", "2010 Estimated": "13.7", "2011 Proposed": "15.8"}, "HHS/FDA": {"2009 Actual": "6.5", "2009 Recovery": "0.0", "2010 Estimated": "7.3", "2011 Proposed": "15.0"}, "DHS": {"2009 Actual": "9.1", "2009 Recovery": "0.0", "2010 Estimated": "11.7", "2011 Proposed": "11.7"}, "USDA/NIFA": {"2009 Actual": "9.9", "2009 Recovery": "0.0", "2010 Estimated": "10.4", "2011 Proposed": "8.9"}, "USDA/FS": {"2009 Actual": "5.4", "2009 Recovery": "0.0", "2010 Estimated": "5.4", "2011 Proposed": "5.4"}, "CPSC": {"2009 Actual": "0.2", "2009 Recovery": "0.0", "2010 Estimated": "0.2", "2011 Proposed": "2.2"}, "DOT/FHWA": {"2009 Actual": "0.9", "2009 Recovery": "0.0", "2010 Estimated": "3.2", "2011 Proposed": "2.0"}, "DOJ": {"2009 Actual": "1.2", "2009 Recovery": "0.0", "2010 Estimated": "0.0", "2011 Proposed": "0.0"}, "TOTAL": {"2009 Actual": "1,701.5", "2009 Recovery": "511.3", "2010 Estimated": "1,781.1", "2011 Proposed": "1,761.6"}}}}]} 

Full schema can be found below or found here:

{
	"type": "object",
	"properties": {
		"tables": {
			"type": "array",
			"items": {
				"type": "object",
				"properties": {
					"stats": {
						"type": "object",
						"properties": {
							"accuracy": {
								"type": "number"
							},
							"whitespace": {
								"type": "number"
							},
							"order": {
								"type": "integer"
							},
							"page": {
								"type": "integer"
							}
						}
					},
					"titleEstimate": {
						"type": "string"
					},
					"data": {
						"type": "object"
					}
				}
			}
		}
	}
}