# Loading Data Files in R

## Supported Formats

R can load the following data files formats:

* Compressed CSV (`.csv.gz`)
* Parquet (`.parquet`)

## Download the Data Files

Download and save the data files to a suitable location. In the examples that follow, the data has been saved to `C:\data`.&#x20;

{% hint style="info" %}
Although you could load data files directly from the data file URLs, this is not recommended because you may quickly hit usage limits or incur additional costs. We always recommend saving the files locally or to cloud storage first using the Open Data Blend Dataset UI, Open Data Blend Dataset API, or [Open Data Blend for Python](https://github.com/opendatablend/opendatablend-py).
{% endhint %}

## Loading Compressed (Gzip) CSV Data Files

You can use the below steps as a guide on how you can load compressed (Gzip) data files in R.

Reading the entire compressed (Gzip) CSV data file directly into a data frame.

```r
df_date <- read.csv("C:\\data\\date\\date.csv.gz")
```

###

## Loading Parquet Data Files

You can use the below steps as a guide on how you can load Parquet data files in R.

Install the `arrow` package.

```r
install.packages("arrow")
```

Import the `arrow` library.

```r
library(arrow)
```

Read the Parquet data file into a data frame.

```r
df_date <- read_parquet("C:\\data\\date\\date.parquet")
df_anonymised_mot_test_result_info <- read_parquet("C:\\data\\anonymised_mot_test_result_info\\anonymised_mot_test_result_info.parquet")
```

Read a subset of the columns from the Parquet data file into a data frame.

```r
df_mot_results_2017 <- read_parquet("C:\\data\\anonymised_mot_test_result\\anonymised_mot_test_result_2017.parquet", col_select = c("drv_anonymised_mot_test_date_key", "drv_anonymised_mot_test_result_info_key"))
```

{% hint style="info" %}
When working with larger data files, it is a good practice to only read the required columns because it will reduce the read times, memory footprint, and processing times.
{% endhint %}

## Using R for Data Analysis

Guidance on how to analyse data in R is beyond the scope of this documentation.&#x20;

You may find the following helpful:

* [The R Manuals](https://cran.r-project.org/manuals.html)
* [Introduction to dplyr](https://dplyr.tidyverse.org/articles/dplyr.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.opendatablend.io/open-data-blend-datasets/loading-data-files-in-r.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
