Negation Detection
Usage:
medtext-neg negbio [--regex-patterns FILE --ngrex-patterns FILE --overwrite --sort-anns] -i FILE -o FILE
medtext-neg prompt [--model-dir DIR --overwrite] -i FILE -o FILE
medtext-neg download negbio [--regex-patterns FILE --ngrex-patterns FILE]
medtext-neg download prompt [--model FILE --model-dir DIR]
Options:
-i FILE Inpput file
-o FILE Output file
--overwrite Overwrite the existing file
--regex-patterns FILE Regular expression patterns [default: ~/.medtext/resources/patterns/regex_patterns.yml]
--ngrex-patterns FILE Nregex-based expression patterns [default: ~/.medtext/resources/patterns/ngrex_patterns.yml]
--sort-anns Sort annotations by its location
--model FILE Pretrained model file [default: ~/.medtext/resources/medtext_neg_prompt/models/negation_detection_model_checkpoint.zip]
--model-dir DIR [default: ~/.medtext/resources/medtext_neg_prompt/models/negation_detection_model_checkpoint]
Prompt-based model
This model uses a prompt-based learning approach to identify the assertion status of an entity in the unstructured clinical notes. The outcomes are Present, Absent, Possible, Conditional, Hypothetical, and Not Associated.
from medtext_neg.models.prompt.neg_prompt import BioCNegPrompt
model_dir = Path(argv['--model-dir']).expanduser()
neg_actor = BioCNegPrompt(pretrained_model_dir=model_dir)
NegBio
For negation detection, medtext employs NegBio, which utilizes universal dependencies for pattern definition and subgraph matching for graph traversal search so that the scope for negation/uncertainty is not limited to the fixed word distance.
from medtext_neg.models.match_ngrex import NegGrexPatterns
from medtext_neg.models.neg import NegRegexPatterns
from medtext_neg.models.neg import NegCleanUp
from medtext_neg.models.neg import BioCNeg
regex_actor = NegRegexPatterns()
regex_actor.load_yml2(argv['--regex_patterns'])
ngrex_actor = NegGrexPatterns()
ngrex_actor.load_yml2(argv['--ngrex_patterns'])
neg_actor = BioCNeg(regex_actor=regex_actor, ngrex_actor=ngrex_actor)
cleanup_actor = NegCleanUp(argv['--sort_anns'])
Nregex
A Nregex pattern is a regular expression-like pattern that is designed to match node and edge configurations within a
graph. The Nregex pattern allows matching on the attributes of nodes (e.g., lemma) and edges (e.g., dependency type).
The Nregex follows Semgrex but only supports “immediate domination”
operations (> and<).
Warning
Like Tregex, there is no pre-indexing of the data to be searched. Rather there is a linear scan through the all nodes in the graph. As a result, matching is slower.
Nodes and relations
A node or relation is represented by a set of attributes and their values contained by curly braces:
{attr1:value1;attr2:value2;...}. {} represents any node in the graph. Attributes must be plain strings;
values can ONLY be regular expressions blocked off by “/”. Regular expressions must match the whole attribute
value. For example, {lemma:/structure/} matches any nodes with “structure” as their lemma, while
{lemma:/structure.*/} matches “structure” and “structures”.
Warning
Currently, supported node attribute is lemma. Supported relation attribute is dependency.
Nregex pattern language
Symbol |
Meaning |
|---|---|
A <reln B |
A is the dependent of a relation reln with B |
A >reln B |
A is the governor of a relation reln with B |
Boolean relational operators
Relations can be combined using the ‘&’ and ‘|’ operators
Naming nodes
Nodes can be given names (a.k.a. handles) using ‘=’. A named node will be stored in a map that maps names to nodes so
that if a match is found, the node corresponding to the named node can be extracted from the map. For example,
{lemma:/no/}=k2 will match a node with lemma “no” and assign the name “k2”. After a match is found, the map can be
queried with the name to retrieved the matched node using match.node('k2')