Section Split

This module splits the report into sections. We provide two options for section split.

Usage:
    medtext-secsplit regex [--section-titles FILE --overwrite] -i FILE -o FILE
    medtext-secsplit medspacy [--overwrite] -i FILE -o FILE
    medtext-secsplit download [--section-titles FILE]

Options:
    -o FILE                 Input file
    -i FILE                 Output file
    --overwrite             Overwrite the existing file
    --section-titles FILE   List of section titles [default: ~/.medtext/resources/medspacy_section_titles.txt]

regex

This rule-based module uses a list of section titles to split the notes.

from medtext_secsplit.models.section_split_regex import BioCSectionSplitterRegex, combine_patterns

with open(argv['--section-titles']) as fp:
    section_titles = [line.strip() for line in fp]
pattern = combine_patterns(section_titles)
processor = BioCSectionSplitterRegex(regex_pattern=pattern)

We provide two sets of section titles

  • medspacy_section_titles: rules from medspacy, which are adapted from SecTag and expanded through practice

  • cxr_section_titles: rules from the analysis of radiology reports in MIMIC-CXR

medspacy

MedSpaCy is a spaCy tool for performing clinical NLP and text processing tasks. It includes an implementation of clinical section detection based on rule-based matching of the section titles with default rules adapted from SecTag and expanded through practice.

import medspacy
from medtext_secsplit.models.section_split_medspacy import BioCSectionSplitterMedSpacy
nlp = medspacy.load()
nlp.add_pipe("medspacy_sectionizer")
processor = BioCSectionSplitterMedSpacy(nlp)