Package 'doseminer'

Title: Extract Drug Dosages from Free-Text Prescriptions
Description: Utilities for converting unstructured electronic prescribing instructions into structured medication data. Extracts drug dose, units, daily dosing frequency and intervals from English-language prescriptions. Based on Karystianis et al. (2015) <doi:10.1186/s12911-016-0255-x>.
Authors: David Selby [aut, cre] , Belay Birlie Yimer [ctb], Ben Marwick [ctb]
Maintainer: David Selby <[email protected]>
License: MIT + file LICENSE
Version: 0.1.3
Built: 2024-08-25 03:24:59 UTC
Source: https://github.com/selbosh/doseminer

Help Index


Clean up raw prescription freetext

Description

Clean up raw prescription freetext

Usage

clean_prescription_text(txt)

Arguments

txt

a character vector

Value

a character vector the same length as txt

Examples

clean_prescription_text(example_prescriptions)

Sample electronic prescribing dataset

Description

A dataset containing product codes, patient identifiers, quantities, dates and free-text dose instructions, similar to data provided by the Clinical Practice Research Datalink (CPRD).

Usage

cprd

Format

An object of class data.frame with 714 rows and 6 columns.

Details

Variables in the data include

id

record identifier

patid

patient identifier

date

date of start of prescription

prodcode

product code; identifier for the prescribed medication

qty

total quantity of medication prescribed

text

free text prescribing instructions


Medication dosage units

Description

A named character vector. Names represent patterns to match dose units and values represent standardised names for those units.

Usage

drug_units

Format

An object of class character of length 28.

Details

Use with a function like str_replace_all to standardise a freetext prescription. Used internally in extract_from_prescription.


Example freetext prescriptions

Description

Adapted from CPRD common dosages

Usage

example_cprd

Format

An object of class character of length 28.

See Also

example_prescriptions


Example freetext prescriptions

Description

Various examples of how prescription data may be represented in free text.

Usage

example_prescriptions

Format

An object of class character of length 27.

See Also

example_cprd


Extract units of dose from freetext prescriptions.

Description

A function used internally in extract_from_prescription to parse the dosage units, such as millilitres, tablets, grams and so on. If there are multiple units mentioned in a string, only the first is returned.

Usage

extract_dose_unit(txt)

Arguments

txt

a character vector

Value

A character vector the same length as txt, containing standardised units, or NA if no units were found in the prescription.

A simple wrapper around str_replace_all and str_extract. Based on add_dose_unit.py from original Python/Java algorithm.

See Also

extract_from_prescription


Extract dosage information from free-text English-language prescriptions

Description

This is the main workhorse function for the doseminer package. Pass in a character vector of prescribing instructions and it will extract structured dosage information.

Usage

extract_from_prescription(txt)

Arguments

txt

A character vector of freetext prescriptions

Details

To avoid redundant computation, it is recommended to remove duplicate elements from the input vector. The results can be joined back to the original data using the raw column.

Value

A data.frame with seven columns:

raw

the input character vector

output

a residual character vector of 'non-extracted' text. For debugging.

freq

number of doses administered per day

itvl

number of days between doses

dose

quantity of medication in each dose

unit

unit of measurement of medication, if any

optional

integer. Can the dose be zero? 1 if yes, otherwise 0

Examples

extract_from_prescription(example_prescriptions)

Convert hourly to daily frequency

Description

Convert hourly to daily frequency

Usage

hourly_to_daily(txt)

Arguments

txt

String of the form 'every n hours'

Value

An equivalent string of the form 'x / day'


List of Latin medical and pharmaceutical abbreviations

Description

A named character vector. Names represent Latin terms and values the English translations. Used for converting terms like "q4h" into "every 4 hours", which can then be parsed into a dosage frequency/interval.

Usage

latin_medical_terms

Format

An object of class character of length 47.

Details

Use with a function like str_replace_all to translate a prescription from Latin to English (thence to numbers).

Source

https://en.wikipedia.org/wiki/List_of_abbreviations_used_in_medical_prescriptions

Examples

stringr::str_replace_all('Take two tablets q4h', latin_medical_terms)

Evaluate a multiplicative plaintext expression

Description

Replaces written phrases like "2 x 5" with their arithmetic result (i.e. 10)

Usage

multiply_dose(axb)

Arguments

axb

An string expression of the form 'A x B' where A, B are numeric

Value

An equivalent string giving the product of A and B. If A is a range of values, a range of values is returned.

See Also

Used internally within extract_from_prescription


Dictionary of English names of numbers

Description

For internal use in words2number. When passed as a replacement to a function like str_replace_all, it turns the string into an arithmetic expression that can be evaluated to give an integer representation of the named number.

Usage

numb_replacements

Format

An object of class character of length 49.

Details

Lifted from Ben Marwick's words2number package and converted into a named vector (previously a chain of gsub calls).

Note

Does not yet fully support decimals, fractions or mixed fractions. Some limited support for 'half' expressions, e.g. 'one and a half'.

Source

https://github.com/benmarwick/words2number

Examples

## Not run: 
stringr::str_replace_all('one hundred and forty-two', numb_replacements)

## End(Not run)

Regular expression to match numbers in English

Description

A regex pattern to identify natural language English number phrases, such as "one hundred and fifty" or "thirty-seven". Used internally by replace_numbers to identify substrings to replace with their decimal representation.

Usage

regex_numbers

Format

An object of class character of length 1.

Details

This is a PCRE (Perl type) regular expression, so it must be used with perl = TRUE in base R regex functions. The packages stringr and stringi are based on the alternative ICU regular expression engine, so they cannot use this pattern.

Note

There is limited support for fractional expressions like "one half". The original pattern did not support expressions like "a thousand", but it has been adapted to offer (experimental) support for this. Phrases like "million" or "thousand" with no prefix will not match.

Source

https://www.rexegg.com/regex-trick-numbers-in-english.html


Replace English number phrases with their decimal representations

Description

Uses numb_replacements to match parts of a string corresponding to numbers, then invokes words2number to convert these substrings to numeric. The rest of the string (the non-number words) is left intact.

Usage

replace_numbers(string)

Arguments

string

A character vector. Can contain numbers and other text

Details

Works on non-negative integer numbers under one billion (one thousand million). Does not support fractions or decimals (yet).

Value

A character vector the same length as string, with words replaced by their decimal representations.

See Also

words2number, for use on cleaned text that does not contain any non-number words

Examples

replace_numbers('Two plus two equals four')
replace_numbers('one hundred thousand dollars!')
replace_numbers(c('A vector', 'containing numbers', 'like thirty seven'))

Convert weekly interval to daily interval

Description

Convert weekly interval to daily interval

Usage

weekly_to_daily(Dperweek)

Arguments

Dperweek

String of the form 'n / week'

Value

An equivalent string of the form 'x / day'


Convert English names of numbers to their numerical values

Description

Convert English names of numbers to their numerical values

Usage

words2number(txt)

Arguments

txt

A character vector containing names of numbers (only).

Value

A named numeric vector of the same length as phrase.

Source

Originally adapted from the words2number package by Ben Marwick.

Examples

words2number('seven')
words2number('forty-two')
words2number(c('three', 'one', 'twenty two thousand'))