Upload, Refit And Attribute

Upload, Refit, and Attribute Tutorial

This tutorial covers a basic scenario for continuous updates in Alviss AI: uploading new data, creating a new dataset by extending the active one, refitting an existing model with the updated dataset, and creating an attribution. This is useful for maintaining and updating models with fresh data in ongoing projects. We'll walk through the process step by step using Python with the requests library to interact with the Alviss AI API. For each step, the relevant valid Python code snippet is provided.

Prerequisites

You need a valid access token from the Alviss AI platform (see the Authentication section in the main API docs).
Know your team ID, project ID, and an existing model ID to refit.
Have the file with new data ready (e.g., a CSV or other supported format for datasets like "Sales").
Define model-specific details like country, region, and grouping based on your project.
Install the required Python libraries if not already present: pip install requests (though it's standard).

Step 1: Import Necessary Libraries

Import time for handling delays in polling and requests for making HTTP API calls.

import time
import requests

Step 2: Set Up Variables

Define the base API URL, your access token, team ID, project ID, and the local path to the file with new data. Also, specify the existing model ID and model details like country, region, and grouping. Replace placeholders like <SET ME> with actual values.

url = "https://app.alviss.io/api/v1/api"
token = "<SET ME>"
team_id = "<SET ME>"
project_id = "<SET ME>"
file_path = "<SET ME>"

# Model refit specifics
existing_model_id = 3
model_country = "SWE"
model_region = "all"
model_grouping = "all"

Step 3: Prepare Authentication Headers

Create a headers dictionary with the Authorization Bearer token to authenticate all API requests.

headers = {"Authorization": "Bearer " + token}

Step 4: Construct the Project URL

Build the base URL for the team and project-specific endpoints by formatting the team ID and project ID into the URL string.

team_project_url = url + f"/projects/{team_id}/{project_id}"

Step 5: Upload the New Data File

Send a POST request to the /datauploads endpoint. Include the dataset name (e.g., "Sales") as a query parameter and attach the file using the files parameter. This initiates the upload and returns an upload_id in the response.

response = requests.post(
    team_project_url + "/datauploads",
    headers=headers,
    params={"dataset_name": "Sales"},
    files={"file": open(file_path, "rb")},
)
upload_id = response.json().get("upload_id")

Step 6: Poll for Upload Completion

Use a while loop to repeatedly send GET requests to check the status of the upload using the upload_id. Print progress messages and the response for debugging, and sleep for 2 seconds between checks. Break when the status is "complete".

while True:
    print("Waiting for upload to complete")
    response = requests.get(
        team_project_url + f"/datauploads/{upload_id}",
        headers=headers,
    )
    if response.json().get("Status") == "complete":
        break
    print(response.json())
    time.sleep(2)

Step 7: Get the Active Dataset ID

Send a GET request to the /datasets/active endpoint to retrieve the ID of the currently active dataset, which will be extended with the new upload.

response = requests.get(
    team_project_url + "/datasets/active",
    headers=headers,
)
active_dataset_id = response.json()["IId"]

Step 8: Create a New Dataset

Send a POST request to the /datasets endpoint to create a new dataset by combining the new upload ID with the active dataset ID. This extends the existing data. The response returns a new dataset_id.

response = requests.post(
    team_project_url + "/datasets",
    headers=headers,
    json={"upload_ids": [upload_id], "dataset_ids": [active_dataset_id]},
)
dataset_id = response.json().get("IId")

Step 9: Poll for Dataset Completion

Use a while loop to repeatedly send GET requests to check the status of the new dataset. Print progress messages and the response for debugging, and sleep for 2 seconds between checks. Break when the status is "complete".

while True:
    print("Waiting for dataset to complete")
    response = requests.get(
        team_project_url + f"/datasets/{dataset_id}",
        headers=headers,
    )
    if response.json().get("Status") == "complete":
        break
    print(response.json())
    time.sleep(2)

Step 10: Retrieve Dataset Dates

Send a POST request to the /datasets/{dataset_id}/dates endpoint with the dataset name as a parameter and model-specific filters in the JSON payload. This returns a list of data_dates for training and evaluation splitting.

response = requests.post(
    team_project_url + f"/datasets/{dataset_id}/dates",
    headers=headers,
    params={"dataset_name": "Sales"},
    json=[
        {
            "country_code": model_country,
            "region_code": model_region,
            "grouping": model_grouping,
        }
    ],
)
data_dates = response.json()

Step 11: Split Dates for Training and Evaluation

Calculate a separator index at 75% of the data dates length, then slice the list to create train_dates and eval_dates for model refitting.

separator_date = int(len(data_dates) * 0.75)
train_dates = data_dates[:separator_date]
eval_dates = data_dates[separator_date:]

Step 12: Refit the Model

Send a POST request to the /models/{existing_model_id}/refit endpoint with details about the new dataset, dates, and model parameters in the JSON payload. This starts the refit process and returns a new model_id.

response = requests.post(
    team_project_url + f"/models/{existing_model_id}/refit",
    headers=headers,
    json={
        "model_detail": {
            "dataset_id": dataset_id,
            "train_dates": train_dates,
            "eval_dates": eval_dates,
        },
        "model_param": {"epochs": 500, "learning_rate": 0.001, "samples": 3},
    },
)
model_id = response.json()["IId"]

Step 13: Poll for Model Refit Completion

Use a while loop to repeatedly send GET requests to check the status of the refitted model. Print progress messages and the response for debugging, and sleep for 2 seconds between checks. Break when the status is "completed".

while True:
    print("Waiting for model to complete")
    response = requests.get(
        team_project_url + f"/models/{model_id}",
        headers=headers,
    )
    if response.json().get("Status") == "completed":
        break
    print(response.json())
    time.sleep(2)

Step 14: Create an Attribution

Send a POST request to the /attributions endpoint with the new model ID, dataset ID, date range, and samples in the JSON payload. This creates an attribution and returns an attribution_id.

response = requests.post(
    team_project_url + "/attributions",
    headers=headers,
    json={
        "model_id": model_id,
        "dataset_id": dataset_id,
        "start_date": data_dates[0],
        "end_date": data_dates[-1],
        "samples": 20,
    },
)
attribution_id = response.json()["IId"]

Step 15: Poll for Attribution Completion

Use a while loop to repeatedly send GET requests to check the status of the attribution. Print progress messages and the response for debugging, and sleep for 2 seconds between checks. Break when the status is "completed".

while True:
    print("Waiting for attribution to complete")
    response = requests.get(
        team_project_url + f"/attributions/{attribution_id}",
        headers=headers,
    )
    if response.json().get("Status") == "completed":
        break
    print(response.json())
    time.sleep(2)

Full Example Code

import time
import requests

url = "https://app.alviss.io/api/v1/api"
token = "<SET ME>"
team_id = "<SET ME>"
project_id = "<SET ME>"
file_path = "<SET ME>"

headers = {"Authorization": "Bearer " + token}
team_project_url = url + f"/projects/{team_id}/{project_id}"

# Model refit with new dataset
existing_model_id = 3
model_country = "SWE"
model_region = "all"
model_grouping = "all"

response = requests.post(
    team_project_url + "/datauploads",
    headers=headers,
    params={"dataset_name": "Sales"},
    files={"file": open(file_path, "rb")},
)

upload_id = response.json().get("upload_id")
while True:
    print("Waiting for upload to complete")
    response = requests.get(
        team_project_url + f"/datauploads/{upload_id}",
        headers=headers,
    )
    if response.json().get("Status") == "complete":
        break
    print(response.json())
    time.sleep(2)

# get active dataset
response = requests.get(
    team_project_url + "/datasets/active",
    headers=headers,
)
active_dataset_id = response.json()["IId"]

response = requests.post(
    team_project_url + "/datasets",
    headers=headers,
    json={"upload_ids": [upload_id], "dataset_ids": [active_dataset_id]},
)
dataset_id = response.json().get("IId")
while True:
    print("Waiting for dataset to complete")
    response = requests.get(
        team_project_url + f"/datasets/{dataset_id}",
        headers=headers,
    )
    if response.json().get("Status") == "complete":
        break
    print(response.json())
    time.sleep(2)

response = requests.post(
    team_project_url + f"/datasets/{dataset_id}/dates",
    headers=headers,
    params={"dataset_name": "Sales"},
    json=[
        {
            "country_code": model_country,
            "region_code": model_region,
            "grouping": model_grouping,
        }
    ],
)

data_dates = response.json()
separator_date = int(len(data_dates) * 0.75)
train_dates = data_dates[:separator_date]
eval_dates = data_dates[separator_date:]

response = requests.post(
    team_project_url + f"/models/{existing_model_id}/refit",
    headers=headers,
    json={
        "model_detail": {
            "dataset_id": dataset_id,
            "train_dates": train_dates,
            "eval_dates": eval_dates,
        },
        "model_param": {"epochs": 500, "learning_rate": 0.001, "samples": 3},
    },
)
model_id = response.json()["IId"]
while True:
    print("Waiting for model to complete")
    response = requests.get(
        team_project_url + f"/models/{model_id}",
        headers=headers,
    )
    if response.json().get("Status") == "completed":
        break
    print(response.json())
    time.sleep(2)

# run attribution
response = requests.post(
    team_project_url + "/attributions",
    headers=headers,
    json={
        "model_id": model_id,
        "dataset_id": dataset_id,
        "start_date": data_dates[0],
        "end_date": data_dates[-1],
        "samples": 20,
    },
)
attribution_id = response.json()["IId"]

while True:
    print("Waiting for attribution to complete")
    response = requests.get(
        team_project_url + f"/attributions/{attribution_id}",
        headers=headers,
    )
    if response.json().get("Status") == "completed":
        break
    print(response.json())
    time.sleep(2)

Upload, Refit And Attribute

On this page