Dataset Versions

Dataset Immutability

Changes to Open Data Blend Datasets are always published as new Dataset Versions. Once a dataset version has been created, it remains unchanged. It is immutable.

Every data file change is versioned. This means that all data file updates are reflected through new data file versions, and new data file versions result in new dataset versions.

The URLs for data files always point to a specific data file version. This allows you to share these links knowing that they will always download the same data. The dataset version URLs are currently only surfaced through the Open Data Blend Dataset API and are visible in the snapshot_path property.

Example of the Snapshot Path for a Dataset

{
    "profile": "data-package",
    "name": "open-data-blend-prescribing",
    "title": "Prescribing",
    "description": "NHS England prescriptions that have been dispensed in the UK",
     ...
     "snapshot_path": "https://packages.opendatablend.io/v1/open-data-blend-prescribing/20210715T154432Z/datapackage.json",
     ...
}

The Open Data Blend Datasets page and Open Data Blend Catalogue endpoint always point to the latest versions of datasets.

Reproducible Analysis and Research

A pleasant side-effect of Dataset Immutability is that you can produce analysis or research using an Open Data Blend Dataset knowing that others can reproduce the same result at a later point in time.

Bundling the data with the analysis or research output isn't always practical, especially when the data files are very large. If you are using the Open Data Blend Dataset UI, you can do the following to get the dataset and data file version URLs:

Click the 'Get metadata' button on the Dataset page.

Then save the Open Data Blend Dataset API response (datapackage.json).

Note that the versioned URL value can be seen in the snapshot_path property.

Thesnapshot_pathproperty value is like a permalink but for Open Data Blend Datasets. Data file URLs always point to a specific version.

If you are using the Open Data Blend Dataset API, simply make a dataset request as usual and write-down the response to a long-term data store for future reference.

Dataset versions will remain available for at least 24 months from the date they are superseded. We strongly recommend that you download and keep a local copy of any dataset versions and data file(s) where you need to guarantee that your analysis or research remains reproducible beyond this.

Last updated