Datasets
Organize data in Alviss AI with Datasets that group compatible uploads into immutable collections for models, simulations, predictions etc.
Datasets
Overview
Datasets in Alviss AI are collections that merge selected uploads into a unified structure, serving as the foundation for building models, running simulations, predictions, and generating insights. By grouping related data files (e.g., Sales, Media, Brand), Datasets enable consistent analysis and ensure all actions—like training models or creating attributions—are traceable back to the exact data used.
Any platform action (e.g., model fitting, simulation runs) logs the associated Dataset, allowing you to review what a Dataset has been used for. This traceability supports auditing, reproducibility, and collaboration within your team.
We treat Datasets as immutable—any update (e.g., adding new uploads or variables) creates a new Dataset rather than modifying the existing one. This immutability ensures strong traceability, so you always know the precise data behind each action or insight.
To make tracking and usage of uploaded data easier, Alviss AI uses the concept of Datasets. In a Dataset, the chosen uploaded files are grouped together and then used for training models and drawing the insights.
New Dataset
After uploading your data files via Uploads, create a Dataset to organize and activate them for use. In the Dataset creation flow, you can:
- Select which uploads to include, based on their data types and compatibility.
- Extend an existing Dataset with additional uploads, which is especially useful for incorporating new data sources or updating with recent observations (this creates a new Dataset per the immutability rule).
- Add external variables, such as macro indicators from Macro or Weather. Choose the variables of interest, and they will be added to your Dataset automatically.
To create a new Dataset:
- Navigate to
Data > Datasetsin the side menu. - Click
Create New Dataset. - Choose files from your Data Uploads list.
- Optionally extend an existing Dataset or add external variables.
- Review and confirm—the platform validates for consistency (e.g., matching periodicity).
It is possible to extend an existing Dataset, which is especially useful when you want to add a new data source or update your data with new observations.
Active Dataset
For a Dataset to serve as the default in your project (e.g., for visualizations or new models), it must be activated. The Active Dataset is automatically used in features like the Activities dashboard and selected by default for model training, attributions, simulations, and other insights.
To activate a Dataset:
- During creation: Enable the activation option in the flow.
- From the Dataset list: Click the lightning bolt icon next to the desired Dataset.
- From the Dataset details page: Select the activation button.
You can switch the Active Dataset at any time, but only one can be active per project. View the details page for a Dataset to see its usage history, included uploads, variables, and traceability logs.
For more on using Datasets in workflows, see Models, Simulations, or Predictions. If validation fails during creation, use the [File Debugger](./Uploads#debugging uploads) on underlying uploads.