Utilities

Reference notes for the utility files stored under public/util. Each entry links to the source file for quick inspection.

← Back to home


ACSNSQIPUtil.py

Python module

Reusable utilities for the ACS NSQIP benign lung resection study, extracted to avoid duplicating constants and helpers across analyses.

  • Configuration constants for years and output directories (2010-2024).
  • CPT metadata for lung resection procedures plus ICD prefix sets for benign, cancer, and structural diagnoses.
  • Column detection helpers, NSQIP field candidate lists, and normalization utilities (ASA, Yes/No, BMI).
  • ICD classification, cohort prep, 30-day outcomes, and stratified outcome tables.
  • Regression prep, imputation, and optional risk-adjusted rate analysis with statsmodels/scipy support.

View file


CharlsonDeyo.py

Python module

Python implementation of the Charlson-Deyo comorbidity index using ICD-9 and ICD-10 rules.

  • Rule table for ICD-9/ICD-10 prefixes and ranges for each comorbidity group.
  • Normalization and matching helpers for claims diagnoses and ICD version inference.
  • Long-format conversion for claims data and time-window filtering around diagnosis date.
  • Outputs per-condition counts, validity flags, Charlson score, binned score, and NCI index.

View file


CharlsonDeyo.R

R script

R implementation of the Charlson-Deyo comorbidity index using tidyverse workflows.

  • Same ICD-9/ICD-10 rule set as the Python version for parity.
  • Tidyverse helpers for normalization, prefix/range matching, and long-format DX data.
  • Calculates comorbidity flags, Charlson score, and NCI index for downstream analysis.

View file


ingest.py

Python script

Document ingestion script that builds a FAISS vector store for retrieval workflows.

  • Recursively loads files with type-specific loaders (text, Markdown, HTML, PDF, Office).
  • Chooses a faster PDF loader for larger files based on PDF_FAST_SIZE_MB.
  • Splits documents into overlapping chunks for fine-grained retrieval.
  • Creates OpenAI embeddings and saves the FAISS index to ./vectorstore.

View file


resection_cpt_by_extent.json

JSON reference

CPT code groupings for lung procedures, organized by resection extent.

  • Maps extent labels (pneumonectomy, lobectomy, segmentectomy, wedge, biopsy, tumor resection) to CPT code lists.
  • Includes open and thoracoscopic (VATS) codes for consistent grouping.
  • Useful for collapsing procedure codes into analysis-ready categories.
  • Contains overlapping codes across years where applicable.

View file


resection_cpt_by_year.json

JSON reference

Year-range lookup for CPT codes with descriptions and procedure types.

  • Organizes codes by time windows (2010-2012, 2013-2023).
  • Each CPT entry includes a human-readable description and extent type.
  • Captures coding changes over time (biopsy, wedge, and resection updates).
  • Supports time-aware classification in longitudinal analyses.

View file


ICDO3topography.csv

CSV reference

Lookup table mapping ICD-O-3 topography codes to site descriptions.

  • Columns include icdo3_code and description.
  • Covers anatomic sites across head/neck, GI, respiratory, skin, bone, and more.
  • Useful for decoding SEER or registry tumor site fields.

View file


Table of Contents