Model Installation
Prerequisites
Ensure you have Python 3.8+ installed along with the necessary dependencies. Install them using:
pip install -r requirements.txt
Running the Project
-
Download ICD-11 Data
python icddownloader.pyThis script fetches ICD-11 classifications from an API and stores the data in JSON and CSV format.
-
Process ICD-11 Data
python ICD_processing.pyThis script processes and cleans ICD-11 data, preparing it for use in the pipeline.
-
Split DSM-5 Cases
python dsmsplit.pyExtracts and structures DSM-5 case data from a text file into a structured CSV.
-
Generate Vector Embeddings
python langchainbuilder.pyBuilds Chroma vector stores for efficient retrieval.
-
Process Medical Cases with AI
python app.pyRuns the main pipeline to generate diagnoses for clinical cases.
-
Expose LLM-based API
python api.pyProvides a Flask API endpoint (
/askLLM) that allows querying an LLM.
File Descriptions
api.py
- Implements a Flask-based REST API for querying an LLM using LangChain.
- Retrieves data using Chroma and LangChain's RAG capabilities.
- Accepts a JSON request with an
input_stringand returns the LLM response.
app.py
- Reads DSM-5 cases from CSV.
- Uses Chroma and LLMs to generate diagnostic outputs.
- Saves results in a structured format.
dsmsplit.py
- Extracts individual DSM-5 cases from a text file.
- Parses cases into structured sections:
Introduction,Discussion,Diagnosis. - Saves the structured data into a CSV file.
ICD_processing.py
- Reads ICD-11 classifications from a CSV.
- Extracts ICD-11 disorder descriptions from a PDF.
- Cleans and structures data for use in LangChain-based queries.
icddownloader.py
- Fetches ICD-11 classification data from an API.
- Saves data in JSON and CSV formats for further processing.
langchainbuilder.py
- Creates vector embeddings using LangChain’s Chroma module.
- Loads ICD-11 prompts for efficient retrieval-augmented generation (RAG).
API Usage
To query the API, send a POST request to http://127.0.0.1:5000/askLLM with JSON data:
{
"input_string": "Describe the symptoms of schizophrenia."
}
The response will contain the AI-generated diagnosis:
{
"output_string": "Schizophrenia is characterized by..."
}
Acknowledgments
This project leverages LangChain, Chroma, and various open medical datasets to improve automated clinical case processing and diagnostics.