Binubuo

FREEMIUM
By codemonth | Updated 14일 전 | Data
Popularity

8.3 / 10

Latency

945ms

Service Level

100%

Health Check

N/A

Back to All Tutorials (4)

Creating custom datasets

Moving beyond fetching single generator data points

Perhaps you have already tried out using Binubuo to generate random/synthetic data by calling one of the 100+ generators available. So what is the next step?

Well, if you are like most people using synthetic data, you are probably to start creating some more advanced datasets, that involve multiple data types, so they match either your objects, schema or table structure.

Luckily that is also possible by using either the Quick Fetch endpoint immediately, or if you want to make more customisations and have reusability by POST’ing the JSON schema definition to the /data/ endpoint.

Creating your first dataset

Let us imagine that we need synthetic order data for a webshop. Looking at our requirements, we find out that we need the following data for each row:

  1. order id
  2. order date
  3. customer id
  4. customer country
  5. customer name
  6. email
  7. product name
  8. product price

If we map that to individual generators it could look like this:

  1. order id = /generator/generics/large_number
  2. order date = /generator/generics/near_date
  3. customer id = /generator/generics/medium_number
  4. customer country = /generator/location/country
  5. customer name = /generator/person/full_name
  6. email = /generator/computer/email
  7. product name = /generator/consumer/nonfood_item
  8. product price = /generator/generics/small_amount

So if you just want your data right now, using Quick Fetch is by far the fastest option. Since we already know the generators we need, we can simply call the Quick Fetch endpoint with the cols parameter being a comma separated list of the generators.

# First we setup the headers with the correct key. 
# REMEMBER to change the key to your actual RapidAPI key for the example to work.
$headers=@{}
$headers.Add("x-rapidapi-host", "binubuo.p.rapidapi.com")
$headers.Add("x-rapidapi-key", "your_own_key_here")
$response = Invoke-RestMethod -Uri 'https://binubuo.p.rapidapi.com/data/custom/quick_fetch?cols=large_number,near_date,medium_number,country,full_name,email,nonfood_item,small_amount' -Method GET -Headers $headers

And here is an example of how that data could look like:

{
  "quick_fetch": [
    {
      "C1": 7774407178,
      "C2": "2021-08-26T05:58:35Z",
      "C3": 15225,
      "C4": "TR",
      "C5": "Christian Bennett",
      "C6": "kihog@bepgu.org",
      "C7": "Oviyon Printed Mens V-neck T-Shirt",
      "C8": 84.14
    },
    {
      "C1": 5416998195,
      "C2": "2022-06-27T08:03:52Z",
      "C3": 15954,
      "C4": "FO",
      "C5": "Alexandra Brown",
      "C6": "pe@cesa.com",
      "C7": "WikkiStix Big Count Box Molding & Sculpting Sticks",
      "C8": 56.99
    },
    {
      "C1": 1147845835,
      "C2": "2021-10-08T10:26:03Z",
      "C3": 64448,
      "C4": "SO",
      "C5": "Hailey James",
      "C6": "kem@kaha.gov",
      "C7": "Marula Oil by Leven Rose Pure Organic, Extra Virgin, Cold Pressed, All Natural Face, Dry Skin and Body Moisturizer and Damaged Hair Treatment 1 oz",
      "C8": 204.25
    },
    {
      "C1": 2427007865,
      "C2": "2022-07-31T13:28:06Z",
      "C3": 60909,
      "C4": "SM",
      "C5": "Allison Perry",
      "C6": "jetin@mopdoko.mil",
      "C7": "Stile Collection Bella IV Shoulder Bag (Black)",
      "C8": 636.16
    },
    {
      "C1": 3307108395,
      "C2": "2022-04-10T22:58:48Z",
      "C3": 12126,
      "C4": "KY",
      "C5": "Nolan Sanchez",
      "C6": "bup@cifhi.gov",
      "C7": "estheSKIN No.102 Aloe Vera Modeling Mask Powder for Professional Facial Treatment, 35 Oz.",
      "C8": 6.25
    }
  ]
}

if you want to have more rows (here an example of 50), simply add the rows parameter to your call:

# First we setup the headers with the correct key. 
# REMEMBER to change the key to your actual RapidAPI key for the example to work.
$headers=@{}
$headers.Add("x-rapidapi-host", "binubuo.p.rapidapi.com")
$headers.Add("x-rapidapi-key", "your_own_key_here")
$response = Invoke-RestMethod -Uri 'https://binubuo.p.rapidapi.com/data/custom/quick_fetch?cols=large_number,near_date,medium_number,country,full_name,email,nonfood_item,small_amount&rows=50' -Method GET -Headers $headers

So as you can see, Quick Fetch is a very fast way to get grouped data in larger quantities, but as you can perhaps also see there is a few things we could make even better. We could have the column names (C1,C2…) as the actual column names (order_id,order_date…) and perhaps it would be more cool if the email addresses had some resemblence to the names, so they were more realistic.

For this we need to create an actual custom dataset endpoint. We do this by defining our dataset according to a specific JSON format.

Binubuo JSON dataset schema

As you can see from the documentation above there are many different ways to create the data (automatic incremental of number, date and time for instance), but for now we will just build a simple example using the generators.

If you want to see one more demo, you can look at this blog entry Custom Datasets - Creating your own data

So if we use the JSON schema as documented our schema will look like this:

{
    "columns": [
        {
            "column_name": "order_id"
            , "column_datatype": "number"
            , "column_type": "generated"
            , "generator": "large_number"
        }, {
            "column_name": "order_date"
            , "column_datatype": "date"
            , "column_type": "generated"
            , "generator": "near_date"
        }, {
            "column_name": "customer_id"
            , "column_datatype": "number"
            , "column_type": "generated"
            , "generator": "medium_number"
        }, {
            "column_name": "customer_country"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "country"
        }, {
            "column_name": "customer_name"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "full_name"
        }, {
            "column_name": "email"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "email"
        }, {
            "column_name": "product_name"
            , "column_datatype": "string"
            , "column_type": "generated"
            , "generator": "nonfood_item"
        }, {
            "column_name": "product_price"
            , "column_datatype": "number"
            , "column_type": "generated"
            , "generator": "small_amount"
        }
    ]
}

So now we have the JSON schema. Let us save that in a file called order_schema.json. So to create the dataset we need to supply the name of the dataset as a parameter and the schema definition in the body. Let us call the dataset for orders

# First we setup the headers with the correct key. 
# REMEMBER to change the key to your actual RapidAPI key for the example to work.
$headers=@{}
$headers.Add("x-rapidapi-host", "binubuo.p.rapidapi.com")
$headers.Add("x-rapidapi-key", "your_own_key_here")
$body=(Get-Content ".\order_schema.json")
$response = Invoke-Webrequest -Uri 'https://binubuo.p.rapidapi.com/data/?schemaname=orders' -Method POST -Body $body -Headers $headers -ContentType 'application/octet-stream'
Write-Host $response.Content

So if we execute the above script in powershell, we will create the dataset and should get a response that looks like below:

{“dataset_name”: “orders”, “dataset_url_path”: “custom/[some_random_hashed_string]/orders”}

What is important to noe here is that every account has a unique part of the url string to call custom datasets. This value does not change, but has to be used when you are calling custom datasets.

So we now have a full dataset available to call and create the synthetic data for us:

# First we setup the headers with the correct key. 
# REMEMBER to change the key to your actual RapidAPI key for the example to work.
$headers=@{}
$headers.Add("x-rapidapi-host", "binubuo.p.rapidapi.com")
$headers.Add("x-rapidapi-key", "your_own_key_here")
$response = Invoke-RestMethod -Uri 'https://binubuo.p.rapidapi.com/data/custom/[unique_hash_replace_with_own]/orders?rows=5' -Method GET -Headers $headers

And here is an example of what the data looks like:

[
	{
		"order_id": 4836921792,
		"order_date": "2021-09-04T06:51:29Z",
		"customer_id": 51772,
		"customer_country": "BJ",
		"customer_name": "Ayden Bryant",
		"email": "Ayden_Bryant@tilid.net",
		"product_price": 59.86
	},
	{
		"order_id": 6249277253,
		"order_date": "2021-11-16T10:06:55Z",
		"customer_id": 20945,
		"customer_country": "NI",
		"customer_name": "Scarlett Jackson",
		"email": "Scarlett.Jackson@juswo.com",
		"product_name": "Logitech G933 Artemis Spectrum – Wireless RGB 7.1 Dolby and DST Headphone Surround Sound Gaming Headset – PC",
		"product_price": 624.37
	},
	{
		"order_id": 7752968889,
		"order_date": "2021-10-06T17:36:37Z",
		"customer_id": 34485,
		"customer_country": "VC",
		"customer_name": "Tyler Rivera",
		"email": "Tyler.Rivera@lockot.int",
		"product_name": "Body Wave Brazilian Hair 3 Bundles with Closure(18 20 22+16 Inch Middle Part) 9A Unprocessed Human hair Bundles Body Wave Hair Extensions 4X4 Lace Natural Black Hair For Black Women Double Weft",
		"product_price": 771.69
	},
	{
		"order_id": 2567227373,
		"order_date": "2021-10-16T05:55:53Z",
		"customer_id": 90676,
		"customer_country": "BJ",
		"customer_name": "Serenity Rogers",
		"email": "Serenity.Rogers@hudwe.int",
		"product_name": "DreamWave - Tremor Portable Bluetooth Speaker - Green,Black",
		"product_price": 662.76
	},
	{
		"order_id": 831253201,
		"order_date": "2022-04-02T10:40:37Z",
		"customer_id": 47672,
		"customer_country": "BZ",
		"customer_name": "Amelia Miller",
		"email": "Amelia_Miller@sokulni.net",
		"product_price": 801.45
	}
]

As you can see we now have the correct column names, and there is a relation between the names and the email address.

Lots more of documentation on the Binubuo Documentation pages and the Binubuo Blog pages.

If you have any issues or you want some help. Just ping Binubuo at support@binubuo.com