# Section Split This module splits the report into sections. We provide two options for section split. ```shell Usage: medtext-secsplit regex [--section-titles FILE --overwrite] -i FILE -o FILE medtext-secsplit medspacy [--overwrite] -i FILE -o FILE medtext-secsplit download [--section-titles FILE] Options: -o FILE Input file -i FILE Output file --overwrite Overwrite the existing file --section-titles FILE List of section titles [default: ~/.medtext/resources/medspacy_section_titles.txt] ``` ## regex This **rule-based** module uses a list of section titles to split the notes. ```python from medtext_secsplit.models.section_split_regex import BioCSectionSplitterRegex, combine_patterns with open(argv['--section-titles']) as fp: section_titles = [line.strip() for line in fp] pattern = combine_patterns(section_titles) processor = BioCSectionSplitterRegex(regex_pattern=pattern) ``` We provide two sets of section titles * `medspacy_section_titles`: rules from [medspacy](https://github.com/medspacy/medspacy/blob/master/resources/section_patterns.json), which are adapted from [SecTag](https://www.vumc.org/cpm/cpm-blog/sectag-tagging-clinical-note-section-headers) and expanded through practice * `cxr_section_titles`: rules from the analysis of radiology reports in MIMIC-CXR ## medspacy [**MedSpaCy**](https://github.com/medspacy/medspacy) is a spaCy tool for performing clinical NLP and text processing tasks. It includes an implementation of clinical section detection based on rule-based matching of the section titles with default rules adapted from [SecTag](https://pubmed.ncbi.nlm.nih.gov/18999303/) and expanded through practice. ```python import medspacy from medtext_secsplit.models.section_split_medspacy import BioCSectionSplitterMedSpacy nlp = medspacy.load() nlp.add_pipe("medspacy_sectionizer") processor = BioCSectionSplitterMedSpacy(nlp) ```