Microburbs (Test)

FREEMIUM
By Microburbs 2 | Updated a month ago | Location
Health Check

N/A

Back to All Tutorials (1)

Gathering Real Estate sale prices with the Microburbs API using Python and Pandas

Introduction

The Microburbs API is a great tool for collecting data on a range of topics relating to areas within Australia. One such use could be gathering the sale prices of different types of properties in various suburbs. In this tutorial you shall be shown how to get these sale prices for all suburbs within the local government area (lga) of Melbourne.

Getting Started

Before beginning to use the Microburbs API you must ensure you are subscribed to the API on Rapid API. To subscribe go to the Pricing page on our Rapid API page for the API and select a plan that best suits your needs. The basic plan should be sufficient to get some data although a plan allowing access to Level 2 data gives you access to far more of the real estate data.
For the purposes of this tutorial I shall be running my code on a Jupyter Notebook, although the code can run altogether as a normal Python script and work just fine.
For this tutorial we shall be using two python libraries:

  • pandas: A data analysis tool which we will us to store our data in a dataframe.
  • requests: A library used to make GET requests to the Microburbs API.
    Ensure both libraries are imported into your code Notebook at the top…
import pandas as pd
import requests

Next, you need to have the required headers in the script so your requests to the API can go through, add the headers:

headers = {
	"X-RapidAPI-Key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
	"X-RapidAPI-Host": "microburbs-test.p.rapidapi.com"
}

Replacing the Xs in X-RapidAPI-Key with your key to the API, this key can be found on the endpoints page under “Header Parameters”.

Making The First Request

The first request we shall make to the API is to get the names of the suburbs we want to get the sale prices for. I shall be getting the names of all suburbs in the lga of Melbourne, although your program could get the sale data of suburbs in any lga you wish to investigate. To do this we shall use the areas endpoint which takes in three parameters:

  • area_name: The name of the area we want to get the areas within it. In this case we shall pass in Melbourne
  • region: The region level the passed in area is a part of. This can be very important as area names may be repeated across multiple region levels. In this case we are looking at the lga level.
  • scope: The region_levels we want the results to be a part of, we are wanting to get the sals (suburb and locality) that are within the lga Melbourne.

You can copy the code snippet from the areas endpoint, paste it into your code then change the query parameters to the appropriate values. You can delete the headers variable as we have already included it above,

url = "https://microburbs-test.p.rapidapi.com/areas/"
querystring = {"scope":"sal","area_name":"Melbourne","region":"lga"}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.json())

To extract just the sal names from the response use :
``areas = [x[“resulting_area”] for x in response.json()[“results”]]```
Now we have got the list of suburbs we wish to look at we need to find out what figures we can request from the API

Topics, Section and Fields

The Microburbs database is structured as follows:

  1. Every individual figure, such as the median price of a 4 bedroom house in Sydney, is of a specific field, in this case it may be the median_sale_price_house_4_bedrooms field.
  2. Every field is a part of a section which includes multiple fields that all relate together. median_sale_price_house_all_bedrooms and median_sale_price_house_3_bedrooms are apart of the “Median sale price” section.
  3. Every section is a part of a topic, which is a broader categorisation. The “Median sale price” section is a part of the “Real Estate” Topic

Hence when querying the API initially we need to be able to get the names of the topics, sections and fields so we can extract the relevant figures.
Firstly, to get the names of the topics we can use the topics endpoint.

url = "https://microburbs-test.p.rapidapi.com/topics"
response = requests.request("GET", url, headers=headers)
print(response.json())

The response will show us the names of all the topics Microburbs uses, as we wish to gather data on real estate prices we shall look deeper into the topic “Real Estate”.
Now to get the sections within that topic we can use the section endpoint with a query parameter to filter by topic.

url = "https://microburbs-test.p.rapidapi.com/sections/"
querystring = {"topic":"Real Estate"}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.json())

Scanning this response we can see plenty of potentially useful sections but in particular the “Median sale price” section is where we will find the data we are interested in. So now we have to get the field names that are within this section, which we can use the fields endpoint for and filter by section. A topic parameter is required in the fields endpoint so include “Real Estate” as the topic in addition to “Median sale price” as the section.

url = "https://microburbs-test.p.rapidapi.com/fields/"
querystring = {"topic":"Real Estate", "section": "Median sale prices"}
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)

The response includes a list of the various details of each field returned, including its name, id, region level and description. Importantly a field is unique to a specific region level, meaning for example the median_sale_price_house_all_bedrooms figures are available for areas of region level sal but not for areas of region level lga. The field’s id is what we shall use when querying the API for figures and the field’s name we shall store to use as the names of the columns of our dataframe.
Notice though how not all the fields returned relate to median prices, rather some are the growth of prices. In order to filter out these fields, and also fields not of region level sal, we can do:

fields = list(response.json())
fields = [field for field in fields if field["region_level"] == "sal" and field["name"].find("Growth") == -1]

We shall also extract the field_ids into their own list to use later:
field_ids = [field["id"] for field in fields]

Getting Figures

When querying for real estate figures we shall use the Figures latest endpoint as we are only interested in the latest median sale prices.
Now that we have the ids of the fields and names of areas we want to get the figures for we can start getting them into a dataframe. First to create the dataframe use:
real_estate_df = pd.DataFrame(columns=["sal"]+ [name["name"] for name in fields])
This creates a dataframe where the first column will be the name of the suburb the figures relate to and the remaining columns are the various values of the median prices in that suburb.
We now need to query the API to get each for figure for every field and every area name. To do this we use for loops and requests to build each row of our dataframe.

url_levels = ["https://microburbs-test.p.rapidapi.com/figures/latest/1/" , "https://microburbs-test.p.rapidapi.com/figures/latest/2/", "https://microburbs-test.p.rapidapi.com/figures/latest/3/"]

for area in areas:
    area_details = [area]
    for field in fields:
        
				    
        querystring = {"area_name":area,"field":field["id"]}
        if field["premium"] == 2 or field["premium"] == 1 or field["premium"] == 3:
          
            level = field["premium"] - 1
						
        response = requests.request("GET", url_levels[level], headers=headers, params=querystring)
        if response.status_code != 200: 
       
            value = None
            area_details.append(value)
            continue
						
        value = response.json()["result"]["value"]
        area_details.append(value)
        
    real_estate_df = pd.concat([real_estate_df, pd.DataFrame([area_details], columns=["sal"]+ [name["name"] for name in fields])])
print(real_estate_df)

Lets break down these sections of code into parts
We start by specifying which urls we can use to query the API. Depending on the level of figures your plan provides you may only want to include the level 1 figures url.
url_levels = ["https://microburbs-test.p.rapidapi.com/figures/latest/1/" , "https://microburbs-test.p.rapidapi.com/figures/latest/2/", "https://microburbs-test.p.rapidapi.com/figures/latest/3/"]

Next is looping through each of our different suburbs. The area_details list will temporarily store all the figures data we get for each suburb before we append it to the rows of our dataframe.
for area in areas:` area_details = [area]
Now we need to go through each field and query the database for a figure. We also need to check the level of the field we are going to query and make sure we use the correct url for that query. The parameters of our query are the area name and the field id.

for field in fields:
        querystring = {"area_name":area,"field":field["id"]}
        if field["premium"] == 2 or field["premium"] == 1 or field["premium"] == 3:
          
            level = field["premium"] - 
        response = requests.request("GET", url_levels[level], headers=headers, params=querystring)

After getting a response we need to account for the fact that at times data can be missing or unavailable. In this case we can look at response code of the response, if it is not 200, indicating data was successfully found, we know we need an alternative plan. To ensure the final dataframe has the correct format, we can append None to area_details to indicate no data was available for this column. Then continue onto the next field to pull a figure from.

if response.status_code != 200: 
            value = None
            area_details.append(value)
            continue

If we did get a successful response code we can proceed by adding the value of the figure to area_details

 value = response.json()["value"]
 area_details.append(value)

Finally, once every field has been requested for an area we add the area_details as a row to our dataframe.
real_estate_df = pd.concat([real_estate_df, pd.DataFrame([area_details], columns=["sal"]+ [name["name"] for name in fields])])

Congratulations you now have a dataframe containing sale prices in Melbourne. This code can be pretty easily changed to get heaps of data from all across Australia, such as population, education and environmental data just by changing the sections, topics and areas we were looking at.