Datasets
An Open Data Blend Dataset is a collection of analytics-ready data files packaged with rich metadata.
You can access Open Data Blend Datasets in two ways:
Open Data Blend Dataset UI:
Targets the broader community of information workers
For ad-hoc dataset acquisition and evaluation
Reading the documentation is optional
Open Data Blend Dataset API:
Targets technical individuals who are comfortable with code
For integrating datasets with a broader solution
Reading the documentation is recommended
Data Files
To maximise the accessibility and usefulness of the data that we publish, we support the three most popular open data file formats:
Compressed (Gzip) CSV
Apache ORC
Apache Parquet
The choice of which data file formats you use will often be driven by the set of tools and platforms that you intend to use them with.
Below are some examples of the types of file format choices that may be made:
Platform or Tool
Supported Formats
Chosen Format
Apache Spark
CSV, ORC, Parquet
Parquet
Apache Hive
CSV, ORC, Parquet
ORC
Presto
CSV, ORC, Parquet
ORC
Power BI Desktop
CSV, Parquet
CSV
Python
CSV, ORC, Parquet
Parquet
R
CSV, Parquet
Parquet
Tableau Desktop
CSV
CSV
Last updated
Was this helpful?