Package 'doseminer' reference manual

Title:	Extract Drug Dosages from Free-Text Prescriptions
Description:	Utilities for converting unstructured electronic prescribing instructions into structured medication data. Extracts drug dose, units, daily dosing frequency and intervals from English-language prescriptions. Based on Karystianis et al. (2015) <doi:10.1186/s12911-016-0255-x>.
Authors:	David Selby [aut, cre] , Belay Birlie Yimer [ctb], Ben Marwick [ctb]
Maintainer:	David Selby <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.3
Built:	2025-02-21 03:29:19 UTC
Source:	https://github.com/selbosh/doseminer

Clean up raw prescription freetext

Description

Clean up raw prescription freetext

Usage

clean_prescription_text(txt)
clean_prescription_text(txt)

Arguments

txt

a character vector

Value

a character vector the same length as txt

Examples

clean_prescription_text(example_prescriptions)

clean_prescription_text(example_prescriptions)

Sample electronic prescribing dataset

Description

A dataset containing product codes, patient identifiers, quantities, dates and free-text dose instructions, similar to data provided by the Clinical Practice Research Datalink (CPRD).

Usage

cprd
cprd

Format

An object of class data.frame with 714 rows and 6 columns.

Details

Variables in the data include

id: record identifier
patid: patient identifier
date: date of start of prescription
prodcode: product code; identifier for the prescribed medication
qty: total quantity of medication prescribed
text: free text prescribing instructions

Medication dosage units

Description

A named character vector. Names represent patterns to match dose units and values represent standardised names for those units.

Usage

drug_units
drug_units

Format

An object of class character of length 28.

Details

Use with a function like str_replace_all to standardise a freetext prescription. Used internally in extract_from_prescription.

Example freetext prescriptions

Description

Adapted from CPRD common dosages

Usage

example_cprd
example_cprd

Format

An object of class character of length 28.

Example freetext prescriptions

Description

Various examples of how prescription data may be represented in free text.

Usage

example_prescriptions
example_prescriptions

Format

An object of class character of length 27.

Extract units of dose from freetext prescriptions.

Description

A function used internally in extract_from_prescription to parse the dosage units, such as millilitres, tablets, grams and so on. If there are multiple units mentioned in a string, only the first is returned.

Usage

extract_dose_unit(txt)
extract_dose_unit(txt)

Arguments

txt

a character vector

Value

A character vector the same length as txt, containing standardised units, or NA if no units were found in the prescription.

A simple wrapper around str_replace_all and str_extract. Based on add_dose_unit.py from original Python/Java algorithm.

Extract dosage information from free-text English-language prescriptions

Description

This is the main workhorse function for the doseminer package. Pass in a character vector of prescribing instructions and it will extract structured dosage information.

Usage

extract_from_prescription(txt)
extract_from_prescription(txt)

Arguments

txt

A character vector of freetext prescriptions

Details

To avoid redundant computation, it is recommended to remove duplicate elements from the input vector. The results can be joined back to the original data using the raw column.

Value

A data.frame with seven columns:

raw: the input character vector
output: a residual character vector of 'non-extracted' text. For debugging.
freq: number of doses administered per day
itvl: number of days between doses
dose: quantity of medication in each dose
unit: unit of measurement of medication, if any
optional: integer. Can the dose be zero? 1 if yes, otherwise 0

Examples

extract_from_prescription(example_prescriptions)

extract_from_prescription(example_prescriptions)

Convert hourly to daily frequency

Description

Convert hourly to daily frequency

Usage

hourly_to_daily(txt)
hourly_to_daily(txt)

Arguments

txt

String of the form 'every n hours'

Value

An equivalent string of the form 'x / day'

List of Latin medical and pharmaceutical abbreviations

Description

A named character vector. Names represent Latin terms and values the English translations. Used for converting terms like "q4h" into "every 4 hours", which can then be parsed into a dosage frequency/interval.

Usage

latin_medical_terms
latin_medical_terms

Format

An object of class character of length 47.

Details

Use with a function like str_replace_all to translate a prescription from Latin to English (thence to numbers).

Source

https://en.wikipedia.org/wiki/List_of_abbreviations_used_in_medical_prescriptions

Examples

stringr::str_replace_all('Take two tablets q4h', latin_medical_terms)

stringr::str_replace_all('Take two tablets q4h', latin_medical_terms)

Evaluate a multiplicative plaintext expression

Description

Replaces written phrases like "2 x 5" with their arithmetic result (i.e. 10)

Usage

multiply_dose(axb)
multiply_dose(axb)

Arguments

axb

An string expression of the form 'A x B' where A, B are numeric

Value

An equivalent string giving the product of A and B. If A is a range of values, a range of values is returned.

Dictionary of English names of numbers

Description

For internal use in words2number. When passed as a replacement to a function like str_replace_all, it turns the string into an arithmetic expression that can be evaluated to give an integer representation of the named number.

Usage

numb_replacements
numb_replacements

Format

An object of class character of length 49.

Details

Lifted from Ben Marwick's words2number package and converted into a named vector (previously a chain of gsub calls).

Note

Does not yet fully support decimals, fractions or mixed fractions. Some limited support for 'half' expressions, e.g. 'one and a half'.

Source

https://github.com/benmarwick/words2number

Examples

## Not run: 
stringr::str_replace_all('one hundred and forty-two', numb_replacements)

## End(Not run)

## Not run: 
stringr::str_replace_all('one hundred and forty-two', numb_replacements)

## End(Not run)

Regular expression to match numbers in English

Description

A regex pattern to identify natural language English number phrases, such as "one hundred and fifty" or "thirty-seven". Used internally by replace_numbers to identify substrings to replace with their decimal representation.

Usage

regex_numbers
regex_numbers

Format

An object of class character of length 1.

Details

This is a PCRE (Perl type) regular expression, so it must be used with perl = TRUE in base R regex functions. The packages stringr and stringi are based on the alternative ICU regular expression engine, so they cannot use this pattern.

Note

There is limited support for fractional expressions like "one half". The original pattern did not support expressions like "a thousand", but it has been adapted to offer (experimental) support for this. Phrases like "million" or "thousand" with no prefix will not match.

Source

https://www.rexegg.com/regex-trick-numbers-in-english.html

Replace English number phrases with their decimal representations

Description

Uses numb_replacements to match parts of a string corresponding to numbers, then invokes words2number to convert these substrings to numeric. The rest of the string (the non-number words) is left intact.

Usage

replace_numbers(string)
replace_numbers(string)

Arguments

string

A character vector. Can contain numbers and other text

Details

Works on non-negative integer numbers under one billion (one thousand million). Does not support fractions or decimals (yet).

Value

A character vector the same length as string, with words replaced by their decimal representations.

Examples

replace_numbers('Two plus two equals four')
replace_numbers('one hundred thousand dollars!')
replace_numbers(c('A vector', 'containing numbers', 'like thirty seven'))

replace_numbers('Two plus two equals four')
replace_numbers('one hundred thousand dollars!')
replace_numbers(c('A vector', 'containing numbers', 'like thirty seven'))

Convert weekly interval to daily interval

Description

Convert weekly interval to daily interval

Usage

weekly_to_daily(Dperweek)
weekly_to_daily(Dperweek)

Arguments

Dperweek

String of the form 'n / week'

Value

An equivalent string of the form 'x / day'

Convert English names of numbers to their numerical values

Description

Convert English names of numbers to their numerical values

Usage

words2number(txt)
words2number(txt)

Arguments

txt

A character vector containing names of numbers (only).

Value

A named numeric vector of the same length as phrase.

Source

Originally adapted from the words2number package by Ben Marwick.

Examples

words2number('seven')
words2number('forty-two')
words2number(c('three', 'one', 'twenty two thousand'))

words2number('seven')
words2number('forty-two')
words2number(c('three', 'one', 'twenty two thousand'))

Package 'doseminer'

Help Index

Clean up raw prescription freetext

Description

Usage

Arguments

Value

Examples

Sample electronic prescribing dataset

Description

Usage

Format

Details

Medication dosage units

Description

Usage

Format

Details

Example freetext prescriptions

Description

Usage

Format

See Also

Example freetext prescriptions

Description

Usage

Format

See Also

Extract units of dose from freetext prescriptions.

Description

Usage

Arguments

Value

See Also

Extract dosage information from free-text English-language prescriptions

Description

Usage

Arguments

Details

Value

Examples

Convert hourly to daily frequency

Description

Usage

Arguments

Value

List of Latin medical and pharmaceutical abbreviations

Description

Usage

Format

Details

Source

Examples

Evaluate a multiplicative plaintext expression

Description

Usage

Arguments

Value

See Also

Dictionary of English names of numbers

Description

Usage

Format

Details

Note

Source

Examples

Regular expression to match numbers in English

Description

Usage

Format

Details

Note

Source

Replace English number phrases with their decimal representations

Description

Usage

Arguments

Details

Value