Negation Detection

Usage:
    medtext-neg negbio [--regex-patterns FILE --ngrex-patterns FILE --overwrite --sort-anns] -i FILE -o FILE
    medtext-neg prompt [--model-dir DIR --overwrite] -i FILE -o FILE
    medtext-neg download negbio [--regex-patterns FILE --ngrex-patterns FILE]
    medtext-neg download prompt [--model FILE --model-dir DIR]

Options:
    -i FILE                 Inpput file
    -o FILE                 Output file
    --overwrite             Overwrite the existing file
    --regex-patterns FILE   Regular expression patterns [default: ~/.medtext/resources/patterns/regex_patterns.yml]
    --ngrex-patterns FILE   Nregex-based expression patterns [default: ~/.medtext/resources/patterns/ngrex_patterns.yml]
    --sort-anns             Sort annotations by its location
    --model FILE            Pretrained model file [default: ~/.medtext/resources/medtext_neg_prompt/models/negation_detection_model_checkpoint.zip]
    --model-dir DIR         [default: ~/.medtext/resources/medtext_neg_prompt/models/negation_detection_model_checkpoint]

Prompt-based model

This model uses a prompt-based learning approach to identify the assertion status of an entity in the unstructured clinical notes. The outcomes are Present, Absent, Possible, Conditional, Hypothetical, and Not Associated.

from medtext_neg.models.prompt.neg_prompt import BioCNegPrompt

model_dir = Path(argv['--model-dir']).expanduser()
neg_actor = BioCNegPrompt(pretrained_model_dir=model_dir)

NegBio

For negation detection, medtext employs NegBio, which utilizes universal dependencies for pattern definition and subgraph matching for graph traversal search so that the scope for negation/uncertainty is not limited to the fixed word distance.

from medtext_neg.models.match_ngrex import NegGrexPatterns
from medtext_neg.models.neg import NegRegexPatterns
from medtext_neg.models.neg import NegCleanUp
from medtext_neg.models.neg import BioCNeg

regex_actor = NegRegexPatterns()
regex_actor.load_yml2(argv['--regex_patterns'])
ngrex_actor = NegGrexPatterns()
ngrex_actor.load_yml2(argv['--ngrex_patterns'])
neg_actor = BioCNeg(regex_actor=regex_actor, ngrex_actor=ngrex_actor)
cleanup_actor = NegCleanUp(argv['--sort_anns'])

Nregex

A Nregex pattern is a regular expression-like pattern that is designed to match node and edge configurations within a graph. The Nregex pattern allows matching on the attributes of nodes (e.g., lemma) and edges (e.g., dependency type). The Nregex follows Semgrex but only supports “immediate domination” operations (> and<).

Warning

Like Tregex, there is no pre-indexing of the data to be searched. Rather there is a linear scan through the all nodes in the graph. As a result, matching is slower.

Nodes and relations

A node or relation is represented by a set of attributes and their values contained by curly braces: {attr1:value1;attr2:value2;...}. {} represents any node in the graph. Attributes must be plain strings; values can ONLY be regular expressions blocked off by “/”. Regular expressions must match the whole attribute value. For example, {lemma:/structure/} matches any nodes with “structure” as their lemma, while {lemma:/structure.*/} matches “structure” and “structures”.

Warning

Currently, supported node attribute is lemma. Supported relation attribute is dependency.

Nregex pattern language

Symbol

Meaning

A <reln B

A is the dependent of a relation reln with B

A >reln B

A is the governor of a relation reln with B

Boolean relational operators

Relations can be combined using the ‘&’ and ‘|’ operators

Naming nodes

Nodes can be given names (a.k.a. handles) using ‘=’. A named node will be stored in a map that maps names to nodes so that if a match is found, the node corresponding to the named node can be extracted from the map. For example, {lemma:/no/}=k2 will match a node with lemma “no” and assign the name “k2”. After a match is found, the map can be queried with the name to retrieved the matched node using match.node('k2')