Title: | Extract Drug Dosages from Free-Text Prescriptions |
---|---|
Description: | Utilities for converting unstructured electronic prescribing instructions into structured medication data. Extracts drug dose, units, daily dosing frequency and intervals from English-language prescriptions. Based on Karystianis et al. (2015) <doi:10.1186/s12911-016-0255-x>. |
Authors: | David Selby [aut, cre] |
Maintainer: | David Selby <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-02-21 03:29:19 UTC |
Source: | https://github.com/selbosh/doseminer |
Clean up raw prescription freetext
clean_prescription_text(txt)
clean_prescription_text(txt)
txt |
a character vector |
a character vector the same length as txt
clean_prescription_text(example_prescriptions)
clean_prescription_text(example_prescriptions)
A dataset containing product codes, patient identifiers, quantities, dates and free-text dose instructions, similar to data provided by the Clinical Practice Research Datalink (CPRD).
cprd
cprd
An object of class data.frame
with 714 rows and 6 columns.
Variables in the data include
record identifier
patient identifier
date of start of prescription
product code; identifier for the prescribed medication
total quantity of medication prescribed
free text prescribing instructions
A named character vector. Names represent patterns to match dose units and values represent standardised names for those units.
drug_units
drug_units
An object of class character
of length 28.
Use with a function like str_replace_all
to standardise a freetext prescription. Used internally in
extract_from_prescription
.
Adapted from CPRD common dosages
example_cprd
example_cprd
An object of class character
of length 28.
Various examples of how prescription data may be represented in free text.
example_prescriptions
example_prescriptions
An object of class character
of length 27.
A function used internally in extract_from_prescription
to
parse the dosage units, such as millilitres, tablets, grams and so on.
If there are multiple units mentioned in a string, only the first is returned.
extract_dose_unit(txt)
extract_dose_unit(txt)
txt |
a character vector |
A character vector the same length as txt
, containing
standardised units, or NA
if no units were found in the prescription.
A simple wrapper around str_replace_all
and
str_extract
.
Based on add_dose_unit.py
from original Python/Java algorithm.
This is the main workhorse function for the doseminer
package.
Pass in a character vector of prescribing instructions and it will extract
structured dosage information.
extract_from_prescription(txt)
extract_from_prescription(txt)
txt |
A character vector of freetext prescriptions |
To avoid redundant computation, it is recommended to remove duplicate
elements from the input vector. The results can be joined back to the
original data using the raw
column.
A data.frame
with seven columns:
the input character vector
a residual character vector of 'non-extracted' text. For debugging.
number of doses administered per day
number of days between doses
quantity of medication in each dose
unit of measurement of medication, if any
integer. Can the dose be zero? 1 if yes, otherwise 0
extract_from_prescription(example_prescriptions)
extract_from_prescription(example_prescriptions)
Convert hourly to daily frequency
hourly_to_daily(txt)
hourly_to_daily(txt)
txt |
String of the form 'every n hours' |
An equivalent string of the form 'x / day'
A named character vector. Names represent Latin terms and values the English translations. Used for converting terms like "q4h" into "every 4 hours", which can then be parsed into a dosage frequency/interval.
latin_medical_terms
latin_medical_terms
An object of class character
of length 47.
Use with a function like str_replace_all
to translate a prescription from Latin to English (thence to numbers).
https://en.wikipedia.org/wiki/List_of_abbreviations_used_in_medical_prescriptions
stringr::str_replace_all('Take two tablets q4h', latin_medical_terms)
stringr::str_replace_all('Take two tablets q4h', latin_medical_terms)
Replaces written phrases like "2 x 5" with their arithmetic result (i.e. 10)
multiply_dose(axb)
multiply_dose(axb)
axb |
An string expression of the form 'A x B' where A, B are numeric |
An equivalent string giving the product of A
and B
.
If A
is a range of values, a range of values is returned.
Used internally within extract_from_prescription
For internal use in words2number
. When passed as a replacement
to a function like
str_replace_all
, it turns the
string into an arithmetic expression that can be evaluated to give an integer
representation of the named number.
numb_replacements
numb_replacements
An object of class character
of length 49.
Lifted from Ben Marwick's words2number
package and converted into
a named vector (previously a chain of gsub
calls).
Does not yet fully support decimals, fractions or mixed fractions. Some limited support for 'half' expressions, e.g. 'one and a half'.
https://github.com/benmarwick/words2number
## Not run: stringr::str_replace_all('one hundred and forty-two', numb_replacements) ## End(Not run)
## Not run: stringr::str_replace_all('one hundred and forty-two', numb_replacements) ## End(Not run)
A regex pattern to identify natural language English number phrases, such as
"one hundred and fifty" or "thirty-seven". Used internally by
replace_numbers
to identify substrings to replace with their
decimal representation.
regex_numbers
regex_numbers
An object of class character
of length 1.
This is a PCRE (Perl type) regular expression, so it must be used with
perl = TRUE
in base R regex functions. The packages stringr
and stringi
are based on the alternative ICU regular expression
engine, so they cannot use this pattern.
There is limited support for fractional expressions like "one half". The original pattern did not support expressions like "a thousand", but it has been adapted to offer (experimental) support for this. Phrases like "million" or "thousand" with no prefix will not match.
https://www.rexegg.com/regex-trick-numbers-in-english.html
Uses numb_replacements
to match parts of a string corresponding
to numbers, then invokes words2number
to convert these
substrings to numeric. The rest of the string (the non-number words) is
left intact.
replace_numbers(string)
replace_numbers(string)
string |
A character vector. Can contain numbers and other text |
Works on non-negative integer numbers under one billion (one thousand million). Does not support fractions or decimals (yet).
A character vector the same length as string
, with words
replaced by their decimal representations.
words2number
, for use on cleaned text that does not contain
any non-number words
replace_numbers('Two plus two equals four') replace_numbers('one hundred thousand dollars!') replace_numbers(c('A vector', 'containing numbers', 'like thirty seven'))
replace_numbers('Two plus two equals four') replace_numbers('one hundred thousand dollars!') replace_numbers(c('A vector', 'containing numbers', 'like thirty seven'))
Convert weekly interval to daily interval
weekly_to_daily(Dperweek)
weekly_to_daily(Dperweek)
Dperweek |
String of the form 'n / week' |
An equivalent string of the form 'x / day'
Convert English names of numbers to their numerical values
words2number(txt)
words2number(txt)
txt |
A character vector containing names of numbers (only). |
A named numeric vector of the same length as phrase
.
Originally adapted from the
words2number
package by
Ben Marwick.
words2number('seven') words2number('forty-two') words2number(c('three', 'one', 'twenty two thousand'))
words2number('seven') words2number('forty-two') words2number(c('three', 'one', 'twenty two thousand'))