This vignette presents an example analysis that might resemble a real-world study in pharmacoepidemiology. For a quick look at the functions and utilities available in doseminer, see the Introduction to doseminer vignette.
Let’s import an example dataset containing prescriptions in free-text
form. The data include product codes (prodcode
) identifying
the drugs prescribed; patient identifiers (patid
); the date
of the prescription start (event_date
); the total quantity
of drug prescribed (qty
) and the actual free text
(text
) containing the dosage instructions for the
medication.
Technically, the package doseminer uses the latter, but combined with the other variables we can make inferences about drug exposure for patients.
data(cprd, package = 'doseminer')
str(cprd)
#> 'data.frame': 714 obs. of 6 variables:
#> $ id : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ patid : int 1359598 1359598 1359598 1359598 1359598 1359598 1359598 1359598 1359598 1359598 ...
#> $ date : Date, format: "2006-06-05" "2007-01-29" ...
#> $ prodcode: int 53 86 86 86 86 86 86 86 86 86 ...
#> $ qty : int 60 100 100 100 100 100 100 100 100 100 ...
#> $ text : chr "TAKE 1 OR 2 3 TIMES/DAY" "" "" "" ...
Extract dosage information from the text. To avoid redundant
computation, we remove duplicates, so each unique text string is only
processed once. The results can then be joined back with the original
prescriptions data, using the raw
column from the output
data frame.
The doseminer function
extract_from_prescription()
only takes a character vector
as input (not a single-column data frame, yet) so should
pull()
the text data out as a vector before processing.
library(doseminer)
free_text <- with(cprd, text[!duplicated(text) & nchar(text) > 0])
extracted <- extract_from_prescription(free_text)
head(extracted)
#> raw freq itvl dose unit optional
#> 1 TAKE 1 OR 2 3 TIMES/DAY 3 1 1-2 <NA> 0
#> 2 TAKE 1 OR 2 FOUR TIMES A DAY WHEN REQUIRED 4 1 1-2 <NA> 1
#> 3 TAKE 1 OR 2 AS DIRECTED <NA> <NA> 1-2 <NA> 0
#> 4 TAKE 1 OR 2 4 TIMES/DAY AS REQUIRED 4 1 1-2 <NA> 1
#> 5 TAKE ONE TWICE DAILY 2 1 1 <NA> 0
#> 6 TAKE 1 OR 2 EVERY 6HRLY WHEN REQUIRED 4 1 1-2 <NA> 1
Now, we can relate the extracted prescription information back to the original dataset.
dosages <- merge(extracted, cprd, by.x = 'raw', by.y = 'text', all.x = TRUE)
head(dosages)
#> raw freq itvl dose
#> 1 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> 2 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> 3 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> 4 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> 5 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> 6 1 OR 2 FOUR TIMES A DAY FOR FOR FOR FOR PAIN WHEN REQUIRED 4 1 1-2
#> unit optional id patid date prodcode qty
#> 1 <NA> 1 625 17339378 2012-11-28 86 100
#> 2 <NA> 1 629 17339378 2013-04-02 86 100
#> 3 <NA> 1 627 17339378 2013-01-23 86 100
#> 4 <NA> 1 624 17339378 2012-10-05 86 100
#> 5 <NA> 1 633 17339378 2013-09-20 86 100
#> 6 <NA> 1 618 17339378 2012-04-12 86 100
The original data provided the total quantity of drug and the start date, but not an end date. Using the information that doseminer infers about daily dose, we can estimate the number of days the patient can go at that average dose before they run out of medication. Hence we estimate a window of time that a patient was taking (exposed to) the drug, which can be used to determine if adverse events (e.g. fractures, given as separate data) occurred during drug exposure or not.
You might notice that some data are missing, either because it isn’t explicitly mentioned in the prescription text or because the text itself was missing. In general, there are a range of methods one might use to impute or exclude such values, and the topic is beyond the scope of doseminer, but the focus of an upcoming package called DrugPrepCPRD, which explores the ‘multiverse’ of possible imputation decisions.
For now, we will either (a) ignore incomplete prescriptions (complete case analysis) or (b) replace missing values with the mean for that patient and drug.
In other scenarios, you might see a range of dose, frequency or interval: for example “take 1-2” or “every 2-3 hours”. Again, you can choose how to summarise these values: taking the minimum, maximum or mean. If a dose is optional, you might want to include the value zero in this range. You should ensure your results are robust to this decision (again: see DrugPrepCPRD).
The length of a prescription, in days, is defined as the total
quantity of drug (qty
) divided by the average number of
units administered per day. In turn, the average number of units per day
is calculated as the dose
in each sitting, multiplied by
the daily frequency (freq
) and divided by the interval
between ‘dose-days’ (itvl
).
Here is one way of estimating drug exposure windows for these data.
library(dplyr)
library(tidyr)
library(ggplot2)
dosages %>%
separate(dose, c('min_dose', 'max_dose'), sep = '-',
convert = TRUE, fill = 'right') %>%
mutate(dose = coalesce((min_dose + max_dose) / 2, min_dose),
itvl = replace_na(as.numeric(itvl), 1),
freq = as.numeric(freq),
daily_dose = freq * dose / itvl,
end_date = date + qty / daily_dose) %>%
ggplot() +
aes(y = as.factor(patid), xmin = date, xmax = end_date) +
geom_errorbarh(height = .5) +
ylab('patient ID')