Open Data Blend Docs
  • Introduction
  • Open Data Blend Datasets
    • Datasets
    • Dataset Versions
    • Dataset UI
    • Dataset API
    • Frictionless Data Compatibility
    • Modelling Conventions
    • Loading Data Files in Excel
    • Loading Data Files in Power BI Desktop
    • Loading Data Files in Tableau Desktop
    • Loading Data Files in Python
    • Loading Data Files in R
    • Loading Data Files in Other Tools
  • Open Data Blend Analytics
    • Analytics
    • Analytics Queries
    • Analytics Users
    • Connecting from Excel
    • Connecting from Power BI Desktop
    • Connecting from Tableau Desktop
    • Connecting from Other Tools
  • Open Data Blend Insights
    • Insights
    • Report Drill Throughs
    • Report Drill Downs
  • Subscription Management
    • Subscription Portal
    • Managing Analytics Users
    • Managing Access Keys
    • Updating Payment Details
Powered by GitBook
On this page
  • Supported Formats
  • Download the Data Files
  • Loading Compressed (Gzip) CSV Data Files
  • Loading Parquet Data Files
  • Using R for Data Analysis

Was this helpful?

  1. Open Data Blend Datasets

Loading Data Files in R

PreviousLoading Data Files in PythonNextLoading Data Files in Other Tools

Last updated 1 year ago

Was this helpful?

Supported Formats

R can load the following data files formats:

  • Compressed CSV (.csv.gz)

  • Parquet (.parquet)

Download the Data Files

Download and save the data files to a suitable location. In the examples that follow, the data has been saved to C:\data.

Although you could load data files directly from the data file URLs, this is not recommended because you may quickly hit usage limits or incur additional costs. We always recommend saving the files locally or to cloud storage first using the Open Data Blend Dataset UI, Open Data Blend Dataset API, or .

Loading Compressed (Gzip) CSV Data Files

You can use the below steps as a guide on how you can load compressed (Gzip) data files in R.

Reading the entire compressed (Gzip) CSV data file directly into a data frame.

df_date <- read.csv("C:\\data\\date\\date.csv.gz")

Loading Parquet Data Files

You can use the below steps as a guide on how you can load Parquet data files in R.

Install the arrow package.

install.packages("arrow")

Import the arrow library.

library(arrow)

Read the Parquet data file into a data frame.

df_date <- read_parquet("C:\\data\\date\\date.parquet")
df_anonymised_mot_test_result_info <- read_parquet("C:\\data\\anonymised_mot_test_result_info\\anonymised_mot_test_result_info.parquet")

Read a subset of the columns from the Parquet data file into a data frame.

df_mot_results_2017 <- read_parquet("C:\\data\\anonymised_mot_test_result\\anonymised_mot_test_result_2017.parquet", col_select = c("drv_anonymised_mot_test_date_key", "drv_anonymised_mot_test_result_info_key"))

When working with larger data files, it is a good practice to only read the required columns because it will reduce the read times, memory footprint, and processing times.

Using R for Data Analysis

Guidance on how to analyse data in R is beyond the scope of this documentation.

You may find the following helpful:

Open Data Blend for Python
The R Manuals
Introduction to dplyr