Utilities

Reference notes for the utility files stored under public/util. Each entry links to the source file for quick inspection.


ACSNSQIPUtil.py

Python module

Reusable utilities for the ACS NSQIP benign lung resection study, extracted to avoid duplicating constants and helpers across analyses.

  • Configuration constants for years and output directories (2010-2024).
  • CPT metadata for lung resection procedures plus ICD prefix sets for benign, cancer, and structural diagnoses.
  • Column detection helpers, NSQIP field candidate lists, and normalization utilities (ASA, Yes/No, BMI).
  • ICD classification, cohort prep, 30-day outcomes, and stratified outcome tables.
  • Regression prep, imputation, and optional risk-adjusted rate analysis with statsmodels/scipy support.

View file


CharlsonDeyo.py

Python module

Python implementation of the Charlson-Deyo comorbidity index using ICD-9 and ICD-10 rules.

  • Rule table for ICD-9/ICD-10 prefixes and ranges for each comorbidity group.
  • Normalization and matching helpers for claims diagnoses and ICD version inference.
  • Long-format conversion for claims data and time-window filtering around diagnosis date.
  • Outputs per-condition counts, validity flags, Charlson score, binned score, and NCI index.

View file


CharlsonDeyo.R

R script

R implementation of the Charlson-Deyo comorbidity index using tidyverse workflows.

  • Same ICD-9/ICD-10 rule set as the Python version for parity.
  • Tidyverse helpers for normalization, prefix/range matching, and long-format DX data.
  • Calculates comorbidity flags, Charlson score, and NCI index for downstream analysis.

View file


ingest.py

Python script

Document ingestion script that builds a FAISS vector store for retrieval workflows.

  • Recursively loads files with type-specific loaders (text, Markdown, HTML, PDF, Office).
  • Chooses a faster PDF loader for larger files based on PDF_FAST_SIZE_MB.
  • Splits documents into overlapping chunks for fine-grained retrieval.
  • Creates OpenAI embeddings and saves the FAISS index to ./vectorstore.

View file


resection_cpt_by_extent.json

JSON reference

CPT code groupings for lung procedures, organized by resection extent.

  • Maps extent labels (pneumonectomy, lobectomy, segmentectomy, wedge, biopsy, tumor resection) to CPT code lists.
  • Includes open and thoracoscopic (VATS) codes for consistent grouping.
  • Useful for collapsing procedure codes into analysis-ready categories.
  • Contains overlapping codes across years where applicable.

View file


resection_cpt_by_year.json

JSON reference

Year-range lookup for CPT codes with descriptions and procedure types.

  • Organizes codes by time windows (2010-2012, 2013-2023).
  • Each CPT entry includes a human-readable description and extent type.
  • Captures coding changes over time (biopsy, wedge, and resection updates).
  • Supports time-aware classification in longitudinal analyses.

View file


ICDO3topography.csv

CSV reference

Lookup table mapping ICD-O-3 topography codes to site descriptions.

  • Columns include icdo3_code and description.
  • Covers anatomic sites across head/neck, GI, respiratory, skin, bone, and more.
  • Useful for decoding SEER or registry tumor site fields.

View file


Table of Contents