Modelling Conventions
Last updated
Was this helpful?
Last updated
Was this helpful?
All Open Data Blend Datasets are created by using to optimise the data for data analysis. We carefully transform the source data into facts (i.e. the tables with columns containing measurable values such as the number of prescription items) and dimensions (i.e. tables with the columns containing descriptive values such as the name of a prescribed medicine).
There are a vast number of resources that explain what these modelling practices are and why they exist. Rather than reinventing the wheel, we recommend looking through this comprehensive by the Kimball Group.
We use the following column prefixes in our data files:
Prefix
Description
nlm_
Metadata column*
nll_
Licence information column*
src_
Source column
srk_
Source column that is also a natural key
drv_
Derived column*
drk_
Derived column that is also a natural key*
*The values in these columns either originate from Open Data Blend or are derived from source columns by Open Data Blend.
All column names are in lower-case, alpha-numeric (no special characters), and spaced using underscores. This is to ensure the broadest level of compatibility across all tools and platforms.
Acronyms are expanded to their corresponding word parts except for when the acronym is very well known and non-ambiguous. Even then, it may still be expanded.
Foreign key references are implied. It's safe to assume that whenever two tables each have a column name that starts with drv_
and ends with _key
(e.g. drv_prescribing_practice_key
), these columns are intended to be joined.