# Datasets

An Open Data Blend Dataset is a collection of analytics-ready data files packaged with rich metadata.

You can access Open Data Blend Datasets in two ways:

* **Open Data Blend Dataset UI:**&#x20;

  * Targets the broader community of information workers
  * For ad-hoc dataset acquisition and evaluation
  * Reading the documentation is optional

* **Open Data Blend Dataset API:**&#x20;
  * Targets technical individuals who are comfortable with code&#x20;
  * For integrating datasets with a broader solution
  * Reading the documentation is recommended

## Data Files

To maximise the accessibility and usefulness of the data that we publish, we support the three most popular open data file formats:

* Compressed (Gzip) CSV
* Apache ORC
* Apache Parquet

{% hint style="info" %}
Only previews of the top 100 rows are available in uncompressed CSV data files. The full versions of CSV data files are always Gzip compressed to reduce download times and save disk space.
{% endhint %}

The choice of which data file formats you use will often be driven by the set of tools and platforms that you intend to use them with.&#x20;

{% hint style="info" %}
A Gzip CSV or ORC data file is typically 40-50% smaller than the corresponding Parquet version. Only the ORC and Parquet data files are optimal for interactive analytical workloads.
{% endhint %}

Below are some examples of the types of file format choices that may be made:

| Platform or Tool | Supported Formats | Chosen Format |
| ---------------- | ----------------- | ------------- |
| Apache Spark     | CSV, ORC, Parquet | Parquet       |
| Apache Hive      | CSV, ORC, Parquet | ORC           |
| Presto           | CSV, ORC, Parquet | ORC           |
| Power BI Desktop | CSV, Parquet      | CSV           |
| Python           | CSV, ORC, Parquet | Parquet       |
| R                | CSV, Parquet      | Parquet       |
| Tableau Desktop  | CSV               | CSV           |

{% hint style="info" %}
In the table above 'CSV' refers to both uncompressed and compressed (Gzip) CSV data files.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatablend.io/open-data-blend-datasets/datasets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
