Find Specialist Flow¶
Overview¶
The "Find Specialist" feature matches medical cases to appropriate doctors using AI-powered analysis. This document describes the complete flow, architecture, and how MedGemma model capabilities are leveraged.
The system supports two input methods:
- Select Existing Case: Choose from a dropdown of pre-created medical cases, or from the Search Existing Cases modal (click "Use this case"). Choosing "Use this case" navigates to the match page for that case so Match Doctors runs against the existing case and does not create a duplicate.
- Enter Case Information: Direct text input with optional case search and reuse functionality
Note: All references in this document reflect the current model version as configured in application-local.yml.
Benefits¶
- AI-powered matching – MedGemma for medical reasoning and case analysis, FunctionGemma for tool orchestration; matching is driven by clinical context and structured tools.
- Hybrid scoring – Semantic Graph Retrieval combines vector similarity (40%), graph relationships (30%), and historical performance (30%) for optimal doctor-case fit.
- Graph-aware discovery – Apache AGE graph is used for relationship-based discovery (direct doctor-case links, condition expertise, specialty, similar case experience).
- Flexible input – Select an existing case from the dropdown or enter case information as text; supports handwritten text (after OCR), direct form input, and case search/reuse.
- End-to-end flow – Case analysis, tool execution (match_doctors_to_case), scoring, and result interpretation in one flow; matched doctors with rationale returned to the UI.
Architecture¶
The Find Specialist flow uses a hybrid AI approach combining:
- MedGemma for medical reasoning and case analysis
- FunctionGemma (
functiongemma) for tool calling (MedGemma doesn't support tools) - Agent Skills for modular knowledge management
- Java @Tool methods for data retrieval and matching
- Hybrid GraphRAG Scoring: Combines vector similarity (PgVector), graph relationships (Apache AGE), and historical performance for optimal doctor-case matching
Scoring Architecture¶
The matching uses Semantic Graph Retrieval (Semantic Graph Retrieval) which combines three signals:
- Vector Similarity (40%): PgVector embeddings comparison for semantic matching
- Graph Relationships (30%): Apache AGE graph traversal for relationship-based discovery
- Historical Performance (30%): Past outcomes, ratings, and success rates
Graph Integration: Apache AGE graph is actively used in the Find Specialist flow to calculate relationship scores, including direct doctor-case relationships, condition expertise, specialization matching, and similar case experience. See Apache AGE Graph Usage section for details.
Text Input Support: The system supports direct text input for cases, enabling:
- Handwritten text support (after OCR/transcription)
- Direct form input without pre-creating cases
- Case search and reuse functionality via modal interface
- Automatic case creation and embedding generation
Flow Diagram (Hybrid Approach)¶
Detailed Flow Steps (Hybrid Approach)¶
1. User Initiates Search¶
Entry Points:
- Web UI:
GET /match- Displays the match page with case selection dropdown - Match Request:
POST /match/{caseId}- Initiates matching for selected case - REST API:
POST /api/v1/agent/match/{caseId}- Direct API endpoint
Flow:
- User navigates to
/matchpage (Thymeleaf template) - User selects a medical case from the dropdown
- User clicks "Match Doctors" button
- Form submits
POST /match/{caseId}toMatchController MatchControllerforwards request toMedicalAgentController.matchDoctors()- Session ID is generated for log streaming (SSE)
2. Step 1: Case Analysis with MedGemma¶
Implementation: analyzeCaseWithMedGemma(caseId)
- Model: MedGemma (configured via
CHAT_MODEL) - Prompt Template:
medgemma-case-analysis.st - Configuration: Uses
primaryChatModelfromSpringAIConfig(configured viaCHAT_MODEL) - Purpose: Extract medical information needed for expert matching
- Output: JSON with:
- Required medical specialty
- Urgency level (CRITICAL, HIGH, MEDIUM, LOW)
- Clinical findings
- ICD-10 codes
- Case summary
Why MedGemma: Uses medical domain expertise to accurately analyze the case
3. Step 2: Tool Orchestration with FunctionGemma¶
Implementation: buildToolCallingPrompt(caseId, caseAnalysis)
- Model: FunctionGemma (
functiongemma) - Purpose: Orchestrate tool calls to find matching doctors
- Key: Receives MedGemma case analysis as context (doesn't analyze itself)
- Prompt Engineering: Framed as "expert matching" not "medical diagnosis" to avoid safety guardrails
Prompt Structure:
You are an expert matching assistant (NOT a diagnostic system).
Medical analysis is already completed by MedGemma.
Your task: Use tools to find matching doctors.
Case Analysis (from MedGemma): {caseAnalysis}
Prompt Building Implementation (buildToolCallingPrompt()):
The tool calling prompt is constructed programmatically to frame the task appropriately:
-
Base Prompt: Establishes role and context:
-
Safety Framing: Explicitly avoids medical diagnosis terminology:
-
Task Description: Three-step process outline:
-
Case Analysis Injection: Appends MedGemma case analysis:
-
Available Tools List: Describes available tools and parameters:
-
Task Instruction: Specific instruction for the current case:
Full Prompt Construction (buildPrompt() method):
The complete prompt combines multiple components:
-
Agent Skills Loading: Loads relevant skills from
.claude/skills/:case-analyzer: Provides guidance for case analysisdoctor-matcher: Provides guidance for doctor matching
-
Skill Integration: Combines skills with task description:
-
Case Analysis Append: Adds MedGemma case analysis:
Why FunctionGemma: Supports tool calling (MedGemma doesn't yet)
Prompt Template Details¶
The system uses StringTemplate (.st) files for prompt management. These templates are located in
src/main/resources/prompts/ and are loaded via Spring AI PromptTemplate beans.
MedGemma Case Analysis Prompt (medgemma-case-analysis.st):
This prompt template is used in Step 1 to extract medical information from cases. The template includes:
Medical Disclaimer: HIPAA compliance notice and research/educational purpose disclaimer
Case Information Variables:
<caseId>: Medical case identifier<chiefComplaint>: Patient's primary complaint<symptoms>: Reported symptoms<additionalNotes>: Additional clinical notes
Task Description: Five-part analysis requirement:
- Required medical specialty determination
- Urgency level classification (CRITICAL, HIGH, MEDIUM, LOW)
- Clinical findings extraction
- ICD-10 code identification
- Case summary generation
Response Format: Structured JSON with fields:
requiredSpecialty: String (specialty name)urgencyLevel: String (CRITICAL|HIGH|MEDIUM|LOW)clinicalFindings: Array of stringsicd10Codes: Array of ICD-10 code stringscaseSummary: String (brief summary)
Template Content:
You are MedGemma, a specialized medical AI model designed for medical case analysis and clinical reasoning.
**IMPORTANT MEDICAL DISCLAIMER:**
This AI system is for research and educational purposes only. It is NOT certified for clinical use and should NOT be used for diagnostic decisions without human-in-the-loop verification. Always consult qualified healthcare professionals for medical decisions.
**Medical Case Information:**
Case ID: <caseId>
Chief Complaint: <chiefComplaint>
Symptoms: <symptoms>
Additional Notes: <additionalNotes>
**Task:**
Analyze this medical case comprehensively and extract key information needed for expert matching:
1. **Required Medical Specialty**: Determine the most appropriate medical specialty for this case (e.g., Cardiology, Pulmonology, Neurology, etc.)
2. **Urgency Level**: Classify the urgency as CRITICAL, HIGH, MEDIUM, or LOW based on:
- Life-threatening conditions
- Risk of complications
- Time-sensitive interventions needed
3. **Clinical Findings**: Extract key clinical findings, signs, and symptoms
4. **ICD-10 Codes**: Identify relevant ICD-10 diagnosis codes if applicable
5. **Case Summary**: Provide a concise summary suitable for matching with appropriate specialists
**Response Format:**
Provide a structured analysis in JSON format:
{
"requiredSpecialty": "Specialty name",
"urgencyLevel": "CRITICAL|HIGH|MEDIUM|LOW",
"clinicalFindings": ["finding1", "finding2", ...],
"icd10Codes": ["code1", "code2", ...],
"caseSummary": "Brief summary for specialist matching"
}
**Analysis:**
MedGemma Result Interpretation Prompt (medgemma-result-interpretation.st):
This prompt template is used in Step 4 to generate the final response with medical reasoning. The template includes:
Medical Disclaimer: Same HIPAA compliance notice as case analysis prompt
Input Variables:
<caseAnalysis>: Original case analysis from Step 1 (MedGemma output)<toolResults>: Tool execution results from Step 3 (doctor matches)
Task Description: Four-part response generation:
- Case summary
- Matching rationale explanation (specialty alignment, experience, complexity, availability)
- Matched doctors presentation (name, specialty, score, experience)
- Recommendations for next steps
Response Format Requirements:
- Clear, well-structured, actionable
- Appropriate medical terminology
- Suitable for healthcare administrators or case coordinators
Critical Instructions:
- Provide only one response (no repetition)
- Do not include prompt instructions in response
- Do not generate multiple identical JSON structures
- Do not echo tool results verbatim (interpret and summarize)
- Maximum length: 1500 words
- Stop after providing response
Template Content:
You are MedGemma, a specialized medical AI model designed for medical case analysis and clinical reasoning.
**IMPORTANT MEDICAL DISCLAIMER:**
This AI system is for research and educational purposes only. It is NOT certified for clinical use and should NOT be used for diagnostic decisions without human-in-the-loop verification. Always consult qualified healthcare professionals for medical decisions.
**Original Case Analysis:**
<caseAnalysis>
**Tool Execution Results:**
<toolResults>
**Task:**
Based on the original case analysis and the tool execution results (matched doctors), generate a comprehensive response that:
1. **Summarizes the Case**: Provide a brief medical case summary
2. **Explains Matching Rationale**: Explain why these doctors were matched to this case based on:
- Specialty alignment
- Clinical experience relevance
- Case complexity match
- Availability and capabilities
3. **Presents Matched Doctors**: List the matched doctors with:
- Doctor name and specialty
- Match score/reasoning
- Relevant experience or qualifications
4. **Provides Recommendations**: Offer clear next steps for case routing or consultation
**Response Format:**
Provide a clear, well-structured response suitable for healthcare administrators or case coordinators. Use medical terminology appropriately but ensure the response is actionable.
**CRITICAL INSTRUCTIONS:**
- Provide ONLY ONE response (do NOT repeat the same content)
- Do NOT include the prompt instructions in your response
- Do NOT generate multiple identical JSON structures or repeated sections
- Do NOT echo back the tool results verbatim - interpret and summarize them
- Keep the response focused, concise, and actionable
- Maximum response length: 1500 words
- If tool results are empty or minimal, provide a brief summary based on case analysis only
- Stop after providing the response - do not continue generating
**Response:**
Template Configuration:
Templates are configured in PromptTemplateConfig as Spring beans with @Qualifier annotations:
medgemmaCaseAnalysisPromptTemplate: Loadsmedgemma-case-analysis.stmedgemmaResultInterpretationPromptTemplate: Loadsmedgemma-result-interpretation.st
Both templates use Spring AI's PromptTemplate with StTemplateRenderer for variable substitution.
4. Step 3: Tool Execution¶
Primary Tool: match_doctors_to_case
Purpose: Comprehensive doctor matching with scoring and ranking
Implementation: Calls MatchingService.matchDoctorsToCase()
Process:
- Finds candidate doctors based on case requirements and filters
- Scores each candidate using
SemanticGraphRetrievalService.score()(Semantic Graph Retrieval) - Applies filters (minScore, preferredSpecialties, requireTelehealth, preferredFacilityIds)
- Returns
List<DoctorMatch>sorted by overall score (best match first)
Candidate Doctor Finding Algorithm¶
The findCandidateDoctors() method in MatchingServiceImpl implements a multi-step algorithm to identify potential
doctor matches:
Algorithm Flow:
Preferred Specialties (if options.preferredSpecialties() is provided):
- Iterates through each preferred specialty
- Queries
DoctorRepository.findBySpecialty(specialty, maxResults * 2)for each specialty - The
maxResults * 2multiplier allows for filtering while ensuring sufficient candidates - Aggregates all results into a single candidate list
Case Required Specialty (if no preferred specialties but medicalCase.requiredSpecialty() exists):
- Queries
DoctorRepository.findBySpecialty(medicalCase.requiredSpecialty(), maxResults * 2) - Uses the case's own required specialty as the search criterion
Fallback (if no specialty is specified):
- Queries
DoctorRepository.findAllIds(maxResults * 2)to get all doctor IDs (limited) - Loads full doctor entities via
DoctorRepository.findByIds(doctorIds) - This ensures some candidates are always available even without specialty information
Telehealth Filtering (if options.requireTelehealth() is true):
- Filters candidates using Java Stream API:
candidates.stream().filter(Doctor::telehealthEnabled) - Only doctors with
telehealthEnabled == trueare retained
Deduplication:
- Removes duplicate doctors using
distinct()stream operation - Ensures each doctor appears only once in the candidate list
Database Queries:
-
findBySpecialty: SQL query loaded from/sql/doctor/findBySpecialty.sql- Uses
NamedParameterJdbcTemplatefor parameterized queries - Filters doctors by specialty name
- Limits results to specified maximum
- Uses
-
findAllIds: SQL query loaded from/sql/doctor/findAllIds.sql- Returns list of doctor IDs (not full entities) for performance
- Limits results to specified maximum
- Used for fallback when no specialty criteria available
Implementation Location: MatchingServiceImpl.findCandidateDoctors() (lines 174-207)
Alternative Tools (available but less commonly used):
query_candidate_doctors: Finds doctors matching case requirements (without scoring)score_doctor_match: Scores a single doctor-case match
Scoring Algorithm (Semantic Graph Retrieval - Semantic Graph Retrieval):
The Semantic Graph Retrieval scoring combines three complementary signals to produce the final doctor-case match score:
Vector Similarity (40% weight): PgVector embeddings comparison for semantic matching
- Compares case embeddings with doctor's historical case embeddings
- Uses cosine similarity via PgVector
<=>operator - See Vector Similarity Score Calculation for details
Graph Relationships (30% weight): Apache AGE graph traversal (doctor-case relationships)
- Direct relationships: Doctors who treated this exact case
- Condition expertise: Doctors who treat matching ICD-10 codes
- Specialization match: Doctors with required medical specialties
- Similar cases: Doctors who treated cases with same conditions
- See Apache AGE Graph Usage for detailed implementation
Historical Performance (30% weight): Past outcomes, ratings, success rates
- Average ratings from clinical experiences (1-5 scale)
- Success rate (percentage of positive outcomes)
- See Historical Performance Score Calculation for details
Final Score Formula:
Semantic Graph Retrieval Scoring Flow Diagram:
graph TB
subgraph Input["Input"]
Case[Medical Case<br/>with embedding]
Doctor[Doctor<br/>candidate]
end
subgraph Scoring["Semantic Graph Retrieval Scoring Components"]
Vector[Vector Similarity<br/>40% weight]
Graph[Graph Relationships<br/>30% weight]
Historical[Historical Performance<br/>30% weight]
end
subgraph VectorCalc["Vector Calculation"]
V1[Load case embedding]
V2[Load doctor's historical cases]
V3[Calculate cosine similarity<br/>PgVector <=> operator]
V1 --> V2 --> V3
end
subgraph GraphCalc["Graph Calculation"]
G1[Direct relationships]
G2[Condition expertise]
G3[Specialization match]
G4[Similar cases]
G5[Combine: 40% + 25% + 25% + 10%]
G1 --> G5
G2 --> G5
G3 --> G5
G4 --> G5
end
subgraph HistoricalCalc["Historical Calculation"]
H1[Load clinical experiences]
H2[Calculate avg rating<br/>1-5 scale → 0-1]
H3[Calculate success rate<br/>% positive outcomes]
H4[Combine: 60% rating + 40% success]
H1 --> H2 --> H4
H1 --> H3 --> H4
end
Case --> VectorCalc
Doctor --> VectorCalc
Case --> GraphCalc
Doctor --> GraphCalc
Doctor --> HistoricalCalc
VectorCalc --> Vector
GraphCalc --> Graph
HistoricalCalc --> Historical
Vector --> Final[Final Score<br/>0-100 scale]
Graph --> Final
Historical --> Final
style Vector fill:#e3f2fd
style Graph fill:#fff3e0
style Historical fill:#f3e5f5
style Final fill:#e8f5e9
Note: Tools internally use MedGemma via CaseAnalysisService when needed for case analysis
Vector Similarity Score Calculation¶
The vector similarity score (SemanticGraphRetrievalServiceImpl.calculateVectorSimilarityScore()) measures semantic
similarity between the query case and cases previously treated by the doctor using PgVector embeddings.
Algorithm Steps:
-
Embedding Check: Verify the query case has an embedding:
- If no embedding exists: return neutral score (0.5)
- This allows graceful degradation when embeddings are not available
-
Load Historical Cases: Retrieve doctor's clinical experiences:
- Query
ClinicalExperienceRepository.findByDoctorId(doctor.id()) - Extract case IDs from experiences:
experiences.stream().map(ClinicalExperience::caseId).distinct().toList() - If no experiences exist: return neutral score (0.5)
- Query
-
Cosine Similarity Calculation: Compute average similarity using PgVector:
SELECT AVG(1 - (mc1.embedding <=> mc2.embedding::vector)) as avg_similarity FROM medexpertmatch.medical_cases mc1 JOIN medexpertmatch.medical_cases mc2 ON mc2.id = :queryCaseId WHERE mc1.id IN (:doctorCaseIds) AND mc1.embedding IS NOT NULL AND mc2.embedding IS NOT NULL- Uses PgVector cosine distance operator
<=>(returns distance 0-2) - Converts distance to similarity:
1 - distance(similarity 1-(-1), normalized to 0-1) - Averages similarity across all doctor's historical cases
- Only includes cases with valid embeddings
- Uses PgVector cosine distance operator
-
Score Normalization: Clamp result to valid range [0.0, 1.0]:
-
Error Handling: Catch exceptions and return neutral score (0.5):
- Logs warnings for debugging
- Prevents scoring failures from breaking the matching process
Weight in Final Score: 40% (VECTOR_WEIGHT = 0.4)
Implementation Location: SemanticGraphRetrievalServiceImpl.calculateVectorSimilarityScore() (lines 163-234)
Historical Performance Score Calculation¶
The historical performance score (SemanticGraphRetrievalServiceImpl.calculateHistoricalPerformanceScore()) evaluates a
doctor's past performance based on clinical experiences, ratings, and outcomes.
Algorithm Steps:
-
Load Clinical Experiences: Retrieve doctor's historical cases:
- Query
ClinicalExperienceRepository.findByDoctorId(doctor.id()) - If no experiences exist: return neutral score (0.5)
- Query
-
Calculate Metrics:
- Average Rating:
- Sum all non-null ratings from experiences
- Divide by total experience count
- Rating scale: 1-5 (1 = poor, 5 = excellent)
- Success Rate:
- Count experiences with outcome "SUCCESS" or "IMPROVED"
- Divide by total experience count
- Represents percentage of positive outcomes
- Average Rating:
-
Normalize Rating: Convert 1-5 scale to 0-1 range:
- Example: Rating 5.0 → (5.0 - 1.0) / 4.0 = 1.0
- Example: Rating 3.0 → (3.0 - 1.0) / 4.0 = 0.5
- Example: Rating 1.0 → (1.0 - 1.0) / 4.0 = 0.0
-
Combine Scores: Weighted combination of rating and success rate:
- Rating contributes 60% weight (emphasizes quality)
- Success rate contributes 40% weight (emphasizes outcomes)
-
Score Normalization: Clamp result to valid range [0.0, 1.0]:
-
Error Handling: Catch exceptions and return neutral score (0.5):
- Logs warnings for debugging
- Prevents scoring failures from breaking the matching process
Weight in Final Score: 30% (HISTORICAL_WEIGHT = 0.3)
Implementation Location: SemanticGraphRetrievalServiceImpl.calculateHistoricalPerformanceScore() (lines 424-462)
Apache AGE Graph Usage¶
Apache AGE (Apache Graph Extension) is used to calculate graph relationship scores as part of the Semantic Graph Retrieval scoring algorithm. The graph provides relationship-based signals that complement vector similarity and historical performance.
Graph Structure¶
The medexpertmatch_graph contains the following vertices and relationships:
Graph Structure Diagram:
graph LR
subgraph Vertices["Graph Vertices"]
D[Doctor]
C[MedicalCase]
I[ICD10Code]
S[MedicalSpecialty]
F[Facility]
end
subgraph Relationships["Relationships"]
D -->|TREATED| C
D -->|CONSULTED_ON| C
D -->|SPECIALIZES_IN| S
D -->|TREATS_CONDITION| I
D -->|AFFILIATED_WITH| F
C -->|HAS_CONDITION| I
C -->|REQUIRES_SPECIALTY| S
end
style D fill:#e3f2fd
style C fill:#fff3e0
style I fill:#f3e5f5
style S fill:#e8f5e9
style F fill:#fce4ec
Vertices:
Doctor- Doctor entities fromdoctorstableMedicalCase- Medical case entities frommedical_casestableICD10Code- ICD-10 diagnosis codes fromicd10_codestableMedicalSpecialty- Medical specialties frommedical_specialtiestableFacility- Healthcare facilities fromfacilitiestable
Relationships:
(Doctor)-[:TREATED]->(MedicalCase)- Doctor treated a medical case(Doctor)-[:CONSULTED_ON]->(MedicalCase)- Doctor consulted on a medical case(Doctor)-[:SPECIALIZES_IN]->(MedicalSpecialty)- Doctor specializes in a medical specialty(Doctor)-[:TREATS_CONDITION]->(ICD10Code)- Doctor treats a specific condition (ICD-10 code)(MedicalCase)-[:HAS_CONDITION]->(ICD10Code)- Medical case has a condition(MedicalCase)-[:REQUIRES_SPECIALTY]->(MedicalSpecialty)- Medical case requires a specialty(Doctor)-[:AFFILIATED_WITH]->(Facility)- Doctor is affiliated with a facility
Graph Relationship Scoring¶
The SemanticGraphRetrievalService.calculateGraphRelationshipScore() method uses Apache AGE Cypher queries to calculate
four component scores:
Graph Scoring Flow Diagram:
graph TB
subgraph Input["Input"]
Case[Medical Case<br/>with ICD-10 codes<br/>and required specialty]
Doctor[Doctor<br/>candidate]
end
subgraph Components["Graph Score Components"]
Direct[Direct Relationship Score<br/>40% weight]
Condition[Condition Expertise Score<br/>25% weight]
Specialty[Specialization Match Score<br/>25% weight]
Similar[Similar Cases Score<br/>10% weight]
end
subgraph DirectQuery["Direct Relationship Query"]
DQ1[MATCH Doctor-TREATED-Case<br/>or Doctor-CONSULTED_ON-Case]
DQ2[Score: 1.0 if exists, 0.0 otherwise]
DQ1 --> DQ2
end
subgraph ConditionQuery["Condition Expertise Query"]
CQ1[MATCH Doctor-TREATS_CONDITION-ICD10Code<br/>WHERE code matches case ICD-10]
CQ2[Score: normalized ratio<br/>all match = 1.0, none = 0.0]
CQ1 --> CQ2
end
subgraph SpecialtyQuery["Specialization Query"]
SQ1[MATCH Doctor-SPECIALIZES_IN-Specialty<br/>WHERE specialty matches case requirement]
SQ2[Score: 1.0 if matches, 0.0 otherwise]
SQ1 --> SQ2
end
subgraph SimilarQuery["Similar Cases Query"]
SimQ1[MATCH Doctor-TREATED-Case-HAS_CONDITION-ICD10Code<br/>WHERE code matches case ICD-10]
SimQ2[Score: logarithmic scaling<br/>1 case = 0.5, 2-5 = 0.75, 6+ = 1.0]
SimQ1 --> SimQ2
end
Case --> DirectQuery
Doctor --> DirectQuery
Case --> ConditionQuery
Doctor --> ConditionQuery
Case --> SpecialtyQuery
Doctor --> SpecialtyQuery
Case --> SimilarQuery
Doctor --> SimilarQuery
DirectQuery --> Direct
ConditionQuery --> Condition
SpecialtyQuery --> Specialty
SimilarQuery --> Similar
Direct --> Combined[Combined Graph Score<br/>weighted average]
Condition --> Combined
Specialty --> Combined
Similar --> Combined
Combined --> Final[Graph Score<br/>0.0 - 1.0]
style Direct fill:#e3f2fd
style Condition fill:#fff3e0
style Specialty fill:#f3e5f5
style Similar fill:#e8f5e9
style Final fill:#fce4ec
1. Direct Relationship Score (40% weight)
- Query: Checks if doctor has TREATED or CONSULTED_ON relationships with the specific case
- Cypher Query:
- Score: 1.0 if relationship exists, 0.0 otherwise
- Purpose: Identifies doctors who have direct experience with this exact case
2. Condition Expertise Score (25% weight)
- Query: Finds doctors who treat conditions matching the case's ICD-10 codes
- Cypher Query:
- Score: Normalized ratio of matching ICD-10 codes (all match = 1.0, none = 0.0)
- Purpose: Identifies doctors with expertise in the case's specific conditions
3. Specialization Match Score (25% weight)
- Query: Matches doctors' specialties with case requirements
- Cypher Query:
- Score: 1.0 if specialization matches, 0.0 otherwise
- Purpose: Ensures doctors have the required medical specialty
4. Similar Cases Score (10% weight)
- Query: Finds doctors who treated cases with same ICD-10 codes
- Cypher Query:
- Score: Logarithmic scaling based on number of similar cases (1 case = 0.5, 2-5 = 0.75, 6+ = 1.0)
- Purpose: Identifies doctors with experience treating similar cases
Combined Graph Score¶
The four component scores are combined using weighted average:
graphScore = (directRelationshipScore × 0.4) +
(conditionExpertiseScore × 0.25) +
(specializationMatchScore × 0.25) +
(similarCasesScore × 0.1)
Final Score Range: 0.0 to 1.0 (normalized)
Integration with Overall Scoring¶
The graph score contributes 30% to the final Semantic Graph Retrieval score:
Semantic Graph Retrieval Score Integration Diagram:
graph LR
subgraph Components["Semantic Graph Retrieval Score Components"]
V[Vector Similarity<br/>0.0 - 1.0<br/>40% weight]
G[Graph Relationships<br/>0.0 - 1.0<br/>30% weight]
H[Historical Performance<br/>0.0 - 1.0<br/>30% weight]
end
subgraph Calculation["Score Calculation"]
Calc[Weighted Average:<br/>V × 0.4 + G × 0.3 + H × 0.3]
end
subgraph Output["Final Score"]
Final[Overall Score<br/>0 - 100 scale]
Rationale[Rationale:<br/>Component scores breakdown]
end
V --> Calc
G --> Calc
H --> Calc
Calc --> Final
Calc --> Rationale
style V fill:#e3f2fd
style G fill:#fff3e0
style H fill:#f3e5f5
style Final fill:#e8f5e9
Graceful Degradation¶
If Apache AGE is unavailable or the graph doesn't exist:
- Graph Check:
graphService.graphExists()returns false - Fallback: Returns neutral graph score (0.5)
- Behavior: System continues with vector similarity and historical performance only
- Error Handling: Graph query errors are caught and logged, but don't fail the matching process
Graph Building¶
The graph is automatically populated by MedicalGraphBuilderService:
- Trigger: Automatically built after synthetic data generation
- Process: Creates vertices and relationships from relational database data
- Synchronization: Graph is synchronized with relational data
- Idempotent: Uses MERGE operations, can be safely rebuilt
Graph Building Flow Diagram:
graph TB
subgraph Trigger["Trigger"]
DataGen[Synthetic Data Generation<br/>Completes]
end
subgraph Build["Graph Building Process"]
Step1[1. Create Graph Structure<br/>medexpertmatch_graph]
Step2[2. Create Vertices<br/>Doctors, Cases, ICD-10,<br/>Specialties, Facilities]
Step3[3. Create Indexes<br/>GIN indexes on properties<br/>JSONB columns]
Step4[4. Create Relationships<br/>Batched: 1000 per batch]
end
subgraph Relationships["Relationship Types Created"]
R1[TREATED]
R2[CONSULTED_ON]
R3[SPECIALIZES_IN]
R4[TREATS_CONDITION]
R5[HAS_CONDITION]
R6[REQUIRES_SPECIALTY]
R7[AFFILIATED_WITH]
end
subgraph Result["Result"]
Graph[Complete Graph<br/>Ready for queries]
end
DataGen --> Step1
Step1 --> Step2
Step2 --> Step3
Step3 --> Step4
Step4 --> R1
Step4 --> R2
Step4 --> R3
Step4 --> R4
Step4 --> R5
Step4 --> R6
Step4 --> R7
R1 --> Graph
R2 --> Graph
R3 --> Graph
R4 --> Graph
R5 --> Graph
R6 --> Graph
R7 --> Graph
style Step1 fill:#e3f2fd
style Step2 fill:#fff3e0
style Step3 fill:#f3e5f5
style Step4 fill:#e8f5e9
style Graph fill:#fce4ec
Graph Building Steps:
- Creates graph structure (
medexpertmatch_graph) if it doesn't exist - Creates all vertices (doctors, cases, ICD-10 codes, specialties, facilities)
- Creates graph indexes for performance (GIN indexes on properties JSONB columns)
- Creates all relationships in batches (1000 relationships per batch)
Example Graph Queries¶
Graph Query Examples Diagram:
graph TB
subgraph Query1["Query 1: Similar Cases"]
Q1[MATCH Doctor-TREATED-Case-HAS_CONDITION-ICD10Code<br/>WHERE code matches]
Q1R[Returns: Doctor IDs<br/>with case counts]
Q1 --> Q1R
end
subgraph Query2["Query 2: Condition Expertise"]
Q2[MATCH Doctor-TREATS_CONDITION-ICD10Code<br/>WHERE code IN case ICD-10 codes]
Q2R[Returns: Doctor IDs<br/>with matching condition count]
Q2 --> Q2R
end
subgraph Query3["Query 3: Specialization"]
Q3[MATCH Doctor-SPECIALIZES_IN-Specialty<br/>WHERE specialty matches case requirement]
Q3R[Returns: Doctor IDs<br/>with matching specialty]
Q3 --> Q3R
end
subgraph Query4["Query 4: Direct Relationship"]
Q4[MATCH Doctor-TREATED-Case<br/>WHERE case ID matches]
Q4R[Returns: Relationship count<br/>1 if exists, 0 otherwise]
Q4 --> Q4R
end
style Q1 fill:#e3f2fd
style Q2 fill:#fff3e0
style Q3 fill:#f3e5f5
style Q4 fill:#e8f5e9
Find doctors who treated similar cases:
MATCH (d:Doctor)-[:TREATED]->(c:MedicalCase)-[:HAS_CONDITION]->(i:ICD10Code {code: $icd10Code})
WHERE c.id <> $caseId
RETURN d.id, count(DISTINCT c) as similarCaseCount
ORDER BY similarCaseCount DESC
Find doctors with condition expertise:
MATCH (d:Doctor)-[:TREATS_CONDITION]->(i:ICD10Code)
WHERE i.code IN $icd10Codes
RETURN d.id, count(DISTINCT i) as matchingConditions
ORDER BY matchingConditions DESC
Find doctors by specialization:
Benefits of Graph-Based Scoring¶
- Relationship Discovery: Identifies doctors with direct experience with specific cases
- Condition Expertise: Matches doctors based on their experience with specific ICD-10 codes
- Specialization Matching: Ensures doctors have required medical specialties
- Similar Case Experience: Finds doctors who treated similar cases in the past
- Multi-Hop Traversal: Can discover relationships through multiple graph hops
- Performance: Graph queries are optimized with indexes for fast traversal
Limitations¶
- Graph Availability: Requires Apache AGE extension to be installed and enabled
- Graph Population: Graph must be populated from relational data (automatic after data generation)
- Fallback: Returns neutral score if graph is unavailable (doesn't fail matching)
- Query Complexity: Simple queries are used for compatibility and performance
5. Step 4: Result Interpretation with MedGemma¶
Implementation: interpretResultsWithMedGemma(toolResults, caseAnalysis)
- Model: MedGemma (configured via
CHAT_MODEL) - Prompt Template:
medgemma-result-interpretation.st - Configuration: Uses
primaryChatModelfromSpringAIConfig(configured viaCHAT_MODEL) - Purpose: Generate final response with medical reasoning
- Input: Tool execution results + original case analysis
- Output: Comprehensive response explaining matched doctors with medical context
Why MedGemma: Provides medical reasoning for why doctors were matched
Implementation Details:
-
Input Validation:
- Checks if
toolResultsis null or empty - If empty: Returns case analysis directly with prefix "Based on MedGemma case analysis:"
- Prevents errors when FunctionGemma fails to call tools
- Checks if
-
Input Truncation:
- Tool Results: Truncated to 3000 characters if longer
- Appends
"\n\n[Tool results truncated due to length]" - Prevents excessive prompt sizes
- Appends
- Case Analysis: Truncated to 1500 characters if longer
- Appends
"\n\n[Case analysis truncated]" - Keeps prompts manageable
- Appends
- Tool Results: Truncated to 3000 characters if longer
-
Prompt Construction:
- Renders
medgemma-result-interpretation.sttemplate with variables:caseAnalysis: Truncated case analysis (or "No case analysis available")toolResults: Truncated tool results
- Prompt Size Limit: Truncated to 8000 characters if longer
- Appends
"\n\n[Prompt truncated - please provide a concise response]" - Prevents LLM context window issues
- Appends
- Renders
-
Response Validation:
- Repetition Detection: For responses > 10000 characters:
- Checks if first 1000 characters appear multiple times (> 3 occurrences)
- If repetitive: Truncates to first occurrence + truncation message
- Hard Limit: Responses > 15000 characters are truncated to 15000
- Appends
"\n\n[Response truncated due to excessive length]"
- Appends
- Prevents infinite loops and template rendering issues
- Repetition Detection: For responses > 10000 characters:
-
Error Handling:
- Catches exceptions during MedGemma call
- Logs errors with session ID
- Returns case analysis as fallback if interpretation fails
Implementation Location: MedicalAgentServiceImpl.interpretResultsWithMedGemma() (lines 752-820)
6. Agent Skills Loading¶
Skills Used:
case-analyzer: Provides guidance for case analysis (used by FunctionGemma)doctor-matcher: Provides guidance for doctor matching (used by FunctionGemma)
Location: .claude/skills/case-analyzer/SKILL.md and .claude/skills/doctor-matcher/SKILL.md
Note: Skills provide context to FunctionGemma, but actual medical reasoning is done by MedGemma
7. Available Tools¶
The LLM can invoke the following tools for doctor matching. Note: MedicalAgentTools provides additional tools for
other use cases (evidence retrieval, recommendations, network analytics, routing) - all tools are now fully implemented.
The primary tools for the Find Specialist flow are listed below:
match_doctors_to_case (Primary Tool)¶
Purpose: Comprehensive doctor matching with scoring and ranking
Inputs:
caseId: Medical case ID (required)maxResults: Maximum number of results (default: 10, optional)minScore: Minimum match score threshold 0-100 (optional)preferredSpecialties: List of preferred specialties (optional)requireTelehealth: Require telehealth capability (optional)
Note: The tool does not expose preferredFacilityIds as a parameter, but MatchOptions internally supports it.
Facility filtering is applied in MatchingServiceImpl if provided via MatchOptions.
Outputs:
List<DoctorMatch>: List of doctor matches sorted by score (best match first)- Each
DoctorMatchcontains:Doctor: Doctor entity with full detailsscore: Overall match score (0-100)rank: Ranking positionscoreBreakdown: Detailed score components
- Each
Implementation:
- Calls
MatchingService.matchDoctorsToCase() - Uses
SemanticGraphRetrievalServicefor scoring (vector similarity + graph relationships + historical performance) - Applies filters and sorting
query_candidate_doctors¶
Purpose: Query database for candidate doctors matching case requirements (without scoring)
Inputs:
caseId: Medical case ID (optional)specialty: Required medical specialty (optional)requireTelehealth: Require telehealth capability (optional)maxResults: Maximum number of results (default: 10)
Outputs:
List<Doctor>: List of matching doctors
Database Query: PostgreSQL with JDBC via DoctorRepository
score_doctor_match¶
Purpose: Score a single doctor-case match
Inputs:
doctorId: Doctor ID (required)caseId: Medical case ID (required)
Outputs:
ScoreResult: Score with breakdown:overallScore: Overall match score (0-100)vectorSimilarity: Vector embedding similarity scoregraphScore: Graph relationship scorehistoricalScore: Historical performance scorespecialtyMatch: Specialty alignment score
Scoring Algorithm: Uses SemanticGraphRetrievalService.score() (Semantic Graph Retrieval)
- Combines vector similarity (40%), graph relationships (30%), and historical performance (30%)
- Graph relationships are calculated using Apache AGE Cypher queries
- See Scoring Architecture and Apache AGE Graph Usage sections for details
analyze_case_text¶
Purpose: Analyze medical case text to extract clinical information
Inputs:
caseText: Unstructured case description text
Outputs:
CaseAnalysisResult: Extracted clinical information:clinicalFindings: Extracted clinical findingspotentialDiagnoses: Potential diagnosesrecommendedNextSteps: Recommended next stepsurgentConcerns: Urgent concerns requiring immediate attention
Model Used: MedGemma (via CaseAnalysisService)
8. Response Generation¶
The LLM generates a final response that includes:
- Case Analysis Summary: Key findings from the case
- Matched Doctors: List of recommended doctors with scores
- Reasoning: Explanation of why these doctors were matched
- Next Steps: Recommended actions
9. Result Display¶
UI Components (match.html Thymeleaf template):
- Case Selection Form: Dropdown to select medical case
- Match Button: Triggers matching operation
- Results Card: Shows the LLM-generated response with matched doctors
- Execution Trace Panel: Real-time log of the matching process (via SSE)
- Progress Indicator: Shows "Matching..." during processing
Log Streaming:
- Server-Sent Events (SSE): Real-time execution trace via
/api/v1/logs/stream/{sessionId} - LogStreamService: Manages session-based log streaming
- Log Events: Tool calls, LLM invocations, errors, completion status
- Session Management: Each request gets a unique session ID for log correlation
SSE Client Implementation¶
The UI (match.html) implements an SSE client using the browser's native EventSource API to receive real-time log
updates.
Connection Setup:
- Endpoint:
/api/v1/logs/stream - Session ID: Passed as query parameter
- Media Type: Automatically handled by browser (
text/event-stream)
Event Handler:
eventSource.addEventListener('log', function (event) {
const logEntry = event.data;
// Append to logContent div with styling
addLogEntry(logEntry, level);
});
- Event Name:
log(matches server event name) - Data: Log entry text with timestamp and level
- Display: Appends to
logContentdiv with appropriate styling based on level
Progress Tracking:
The UI updates a progress bar based on log message content:
- "case-analyzer" or "case analysis" → 20% progress
- "tool orchestration" or "FunctionGemma" → 40% progress
- "doctor-matcher" or "matching doctors" → 60% progress
- "complete" or "finished" → 80% progress
Progress Bar Update:
if (message.includes('case-analyzer') || message.includes('case analysis')) {
progress = 20;
statusText = 'Analyzing case...';
} else if (message.includes('tool orchestration') || message.includes('FunctionGemma')) {
progress = 40;
statusText = 'Orchestrating tools...';
} else if (message.includes('doctor-matcher') || message.includes('matching doctors')) {
progress = 60;
statusText = 'Matching doctors to case...';
} else if (message.includes('complete') || message.includes('finished')) {
progress = 80;
statusText = 'Finalizing results...';
}
Session Management:
-
Session ID Generation:
- Generated before each match request
- Format:
"session-{timestamp}"
-
Connection Initialization:
-
Connection Lifecycle:
- Before Request: Reinitialize SSE connection with new session ID
- During Request: Logs stream in real-time
- After Request: Connection remains open (browser handles cleanup)
- On New Request: Previous connection closed, new one created
-
Log Clearing: Previous logs cleared on new request:
Error Handling:
- Connection Errors: Logged to browser console, error message displayed to user
- Reconnection: Automatic via
EventSource(browser handles reconnection attempts) - Connection Loss: Browser automatically attempts to reconnect
Log Entry Styling:
- INFO: Default styling (black text)
- DEBUG: Gray text
- WARN: Yellow/orange text
- ERROR: Red text
Implementation Location: match.html (lines 273-289 for initialization, 330-350 for event handling)
Log Streaming (SSE) Implementation¶
The system uses Server-Sent Events (SSE) to stream real-time execution traces to the UI. This allows users to monitor the matching process as it happens.
LogStreamService Architecture¶
Thread-Local Session Storage:
SessionContextClass: Internal helper class usingThreadLocal<String>for session ID storage- Thread Safety: Each thread maintains its own session ID, allowing tools to access the current session
- Methods:
setSessionId(String sessionId): Sets session ID for current threadgetSessionId(): Gets session ID from current thread (returns "default" if not set)clear(): Removes session ID from current thread
SSE Emitter Management:
- Storage:
ConcurrentHashMap<String, SseEmitter>for thread-safe emitter storage - Key: Session ID string
- Value:
SseEmitterinstance for that session - Lifecycle Handlers:
- Completion: Removes emitter from map, logs debug message
- Timeout: Removes emitter from map, logs debug message
- Error: Removes emitter from map, logs error with exception
Log Event Format:
- Timestamp:
HH:mm:ss.SSSformat (e.g., "14:23:45.123") - Level: Log level (INFO, DEBUG, WARN, ERROR)
- Message: Log message text
- Details: Optional additional details (newline-separated)
- Format:
[timestamp] [level] message\n[details]
Event Structure:
SSE Endpoint¶
Endpoint: GET /api/v1/logs/stream?sessionId={sessionId}
Controller: LogStreamController
Implementation:
- Media Type:
text/event-stream(SSE content type) - Session ID Generation:
- If provided: uses request parameter
- If not provided: generates UUID via
UUID.randomUUID().toString()
- Emitter Creation: Calls
LogStreamService.createEmitter(sessionId) - Timeout:
Long.MAX_VALUE(effectively no timeout) - Initial Message: Sends "Execution trace connected" message on connection
Example Request:
Session Management¶
Session ID Generation in UI:
- Format:
'session-' + Date.now() - Example:
"session-1704123456789" - Uniqueness: Timestamp ensures unique IDs per request
Thread-Local Storage:
- Setting:
LogStreamService.setCurrentSessionId(sessionId)called at start ofmatchDoctors() - Access: Tools call
LogStreamService.getCurrentSessionId()to access current session - Clearing:
LogStreamService.clearCurrentSessionId()called at end of operation
Session Lifecycle:
- Creation: UI generates session ID, creates SSE connection
- Storage: Service sets thread-local session ID
- Usage: Tools log events using session ID
- Cleanup: Session ID cleared, emitter removed on completion/timeout/error
Fallback Behavior:
- If emitter not found for session ID: logs warning, attempts broadcast to all sessions
- If session ID not set: uses "default" session ID
- Prevents log loss when session management issues occur
Log Event Types¶
logMatchDoctorsStep(String sessionId, String step, String details):
- Purpose: Log step-by-step operation progress
- Format:
"Step: {step}"with optional details - Level: INFO
- Usage: Major operation milestones (e.g., "Starting match doctors operation")
logToolCall(String sessionId, String toolName, String parameters):
- Purpose: Log tool invocation
- Format:
"Tool called: {toolName}"with parameters - Level: DEBUG
- Usage: When FunctionGemma calls a tool
logToolResult(String sessionId, String toolName, String result):
- Purpose: Log tool execution results
- Format:
"Tool result: {toolName}"with truncated result - Level: DEBUG
- Truncation: Results > 200 characters are truncated to 200 + "..."
- Usage: After tool execution completes
logError(String sessionId, String error, String stackTrace):
- Purpose: Log errors and exceptions
- Format:
"Error: {error}"with stack trace details - Level: ERROR
- Usage: When exceptions occur during matching
logCompletion(String sessionId, String operation, String result):
- Purpose: Log operation completion
- Format:
"Completed: {operation}"with result summary - Level: INFO
- Usage: When matching operation completes successfully
Implementation Location: LogStreamService.java (lines 154-188)
MedGemma Model Capabilities¶
Why MedGemma?¶
MedGemma is specifically designed for medical applications:
-
Medical Domain Expertise
- Trained on medical literature and clinical data
- Understands medical terminology and concepts
- Can analyze clinical cases effectively
-
Clinical Reasoning
- Extracts clinical findings from case descriptions
- Identifies potential diagnoses
- Recommends appropriate next steps
-
Medical Text Understanding
- Processes chief complaints, symptoms, and clinical notes
- Understands ICD-10 codes and medical specialties
- Interprets urgency levels and clinical priorities
How MedGemma is Used (Hybrid Approach)¶
Case Analysis (CaseAnalysisService)
- ✅ Uses dedicated
caseAnalysisChatClient(MedGemma) - ✅ Always uses MedGemma regardless of configuration
- ✅ Analyzes case text directly
- ✅ Extracts clinical findings, diagnoses, and recommendations
- ✅ Identifies urgent concerns
Medical Agent Service - Case Analysis (MedicalAgentServiceImpl)
- ✅ Step 1: Direct MedGemma call via
analyzeCaseWithMedGemma() - ✅ Uses
medgemma-case-analysis.stprompt template - ✅ Extracts specialty, urgency, clinical findings, ICD-10 codes
- ✅ Provides context for tool orchestration
Medical Agent Service - Result Interpretation (MedicalAgentServiceImpl)
- ✅ Step 4: Direct MedGemma call via
interpretResultsWithMedGemma() - ✅ Uses
medgemma-result-interpretation.stprompt template - ✅ Generates final response with medical reasoning
- ✅ Explains why doctors were matched with medical context
All Medical Reasoning Tasks
- ✅
matchDoctors(): MedGemma for analysis and interpretation - ✅
prioritizeConsults(): MedGemma for urgency analysis - ✅
analyzeCase(): MedGemma for analysis and interpretation - ✅
generateRecommendations(): MedGemma for recommendation reasoning - ✅
routeCase(): MedGemma for routing analysis and interpretation
Hybrid Approach Benefits¶
Separation of Concerns:
- MedGemma: All medical reasoning (case analysis, result interpretation)
- FunctionGemma: Tool orchestration only (no medical reasoning)
Maximizes MedGemma Usage:
- ✅ Case analysis uses MedGemma
- ✅ Result interpretation uses MedGemma
- ✅ All medical reasoning uses MedGemma
- ✅ FunctionGemma only orchestrates tool calls
Works Around FunctionGemma Limitations:
- ✅ Prompts frame as "expert matching" not "diagnosis"
- ✅ FunctionGemma receives MedGemma analysis (doesn't analyze itself)
- ✅ Avoids safety guardrails by not asking FunctionGemma to do medical reasoning
Limitations and Monitoring¶
Current Limitation: MedGemma does NOT support tool calling natively
Current Workaround: Hybrid approach
- MedGemma handles all medical reasoning
- FunctionGemma handles tool orchestration
- Clear separation ensures MedGemma's expertise is maximized
Future Enhancement:
MedGemmaToolCallingMonitorperiodically checks for tool calling support- When MedGemma gains tool calling support, can migrate fully to MedGemma
- This will eliminate the need for FunctionGemma entirely
Configuration¶
Model Configuration (application-local.yml)¶
IMPORTANT: All providers must use OpenAI-compatible APIs. The application uses Ollama with OpenAI-compatible endpoint for local development.
# Chat/LLM Configuration - MedGemma (Medical Reasoning)
CHAT_PROVIDER: openai
# Base URL for OpenAI-compatible endpoint (Ollama service)
# IMPORTANT: Do NOT include /v1 in base URL - Spring AI adds it automatically
CHAT_BASE_URL: http://localhost:11434 # Ollama OpenAI-compatible endpoint
CHAT_API_KEY: ollama
# MedGemma 27B GGUF IQ3_XXS quantization
# Model name format: hf.co/unsloth/medgemma-27b-text-it-GGUF:IQ3_XXS
CHAT_MODEL: hf.co/unsloth/medgemma-27b-text-it-GGUF:IQ3_XXS
CHAT_TEMPERATURE: 0.7
CHAT_MAX_TOKENS: 6000
# Tool Calling Configuration - FunctionGemma (Tool Orchestration)
# IMPORTANT: medgemma:27b does NOT support tools/function calling
# FunctionGemma supports tools, so it's used for all tool invocations
TOOL_CALLING_PROVIDER: openai
TOOL_CALLING_BASE_URL: http://localhost:11434 # Ollama OpenAI-compatible endpoint
TOOL_CALLING_API_KEY: ollama
TOOL_CALLING_MODEL: functiongemma
TOOL_CALLING_TEMPERATURE: 0.7
TOOL_CALLING_MAX_TOKENS: 4096
# Reranking Configuration - MedGemma
RERANKING_PROVIDER: openai
RERANKING_BASE_URL: http://localhost:11434 # Ollama OpenAI-compatible endpoint
RERANKING_API_KEY: ollama
# Reranking uses MedGemma chat model for semantic reranking
RERANKING_MODEL: hf.co/unsloth/medgemma-27b-text-it-GGUF:IQ3_XXS
RERANKING_TEMPERATURE: 0.1
Model Information¶
MedGemma Configuration:
- Model: Configured via
CHAT_MODELenvironment variable - Use cases: Medical text comprehension, clinical reasoning, medical document understanding
FunctionGemma Configuration:
- Model:
functiongemma - Purpose: Tool/function calling (MedGemma doesn't support tools)
- Pull command:
ollama pull functiongemma
Base URL Format¶
IMPORTANT: For OpenAI-compatible APIs, do NOT include /v1 in the base URL. Spring AI's OpenAiApi automatically
adds /v1/chat/completions or /v1/embeddings.
Valid Examples:
http://localhost:11434(Ollama OpenAI-compatible endpoint)http://localhost:8000(LiteLLM proxy)https://api.openai.com(OpenAI)https://YOUR_RESOURCE.openai.azure.com(Azure OpenAI)
Invalid Examples:
http://localhost:11434/v1(includes /v1)https://api.openai.com/v1(includes /v1)
Alternative Configurations¶
Using LiteLLM Proxy (Recommended for Full Compatibility):
CHAT_BASE_URL: http://localhost:8000 # LiteLLM proxy (different port)
TOOL_CALLING_BASE_URL: http://localhost:8000
RERANKING_BASE_URL: http://localhost:8000
Using Cloud Providers (Vertex AI Model Garden, Azure OpenAI, etc.):
CHAT_BASE_URL: https://YOUR_REGION-aiplatform.googleapis.com/v1 # Vertex AI
CHAT_API_KEY: YOUR_API_KEY
CHAT_MODEL: medgemma:27b # Or provider-specific model name
Agent Skills Configuration¶
medexpertmatch:
skills:
enabled: ${MEDEXPERTMATCH_SKILLS_ENABLED:true}
directory: ${MEDEXPERTMATCH_SKILLS_DIRECTORY:.claude/skills}
Configuration Notes:
- Skills are enabled by default (
enabled: true) - Skills directory:
.claude/skills(default) - Can be overridden via environment variables:
MEDEXPERTMATCH_SKILLS_ENABLED=trueMEDEXPERTMATCH_SKILLS_DIRECTORY=.claude/skills
Error Handling¶
Tool Call Errors¶
If a tool call fails:
- Error is logged to execution trace via
LogStreamService - Error details are captured in the session log
-
If FunctionGemma returns empty response or null toolName:
- System logs warning
- Falls back to using MedGemma case analysis directly
- Returns response with case analysis and error note
-
If tool execution throws exception:
- Exception is caught and logged
- Error message is included in response
- System attempts to provide partial results if available
Tool Call Detection¶
The system includes logic to detect which tools were called by FunctionGemma for logging and debugging purposes.
Extraction Method (MedicalAgentServiceImpl.extractToolCalls()):
The method uses multiple strategies to identify tool calls from FunctionGemma's response:
Method 1: ChatResponse Metadata:
- Uses Java reflection to access
chatResponse()method on the response object - Checks metadata keys containing "tool" or "function" (case-insensitive)
- Extracts tool names from metadata values
- Handles cases where metadata structure varies
Method 2: Content Pattern Matching:
The method searches the response content for tool call patterns:
-
Known Tools List:
query_candidate_doctorsscore_doctor_matchanalyze_case_textget_medical_case_detailsquery_facilitiesroute_case_to_facility
-
Pattern 1: Direct Tool Name Mentions:
- Case-insensitive matching
- Handles underscores, spaces, and hyphens:
tool.replace("_", "[_\\s-]") - Regex:
(?i)\\b{toolPattern}\\b
-
Pattern 2: Function Call Patterns:
- Matches:
(called|invoked|executed|using) + tool_name - Regex:
(?i)(?:called|invoked|executed|using)[\\s]+([a-z_]+)(?:\\s|\\()
- Matches:
-
Pattern 3: JSON-Like Structures:
- Matches:
"name": "tool_name"or"function": "tool_name" - Regex:
`(?i)["'](?:name|function|tool)["']\s*[:=]\s*["']([a-z_]+)["']`
- Matches:
-
Pattern 4: Tool Result Mentions:
- Checks if content mentions tool results (indicates tool was called)
- Converts tool names to display format:
tool.replace("_", " ") - Case-insensitive matching
Return Value:
- Success: Comma-separated list of detected tools (e.g., "query_candidate_doctors, score_doctor_match")
- No Detection: "none detected"
- Error: "error - no response"
Purpose:
- Logging: Records which tools were invoked for debugging
- Monitoring: Tracks tool usage patterns
- Debugging: Helps identify when FunctionGemma fails to call tools
Implementation Location: MedicalAgentServiceImpl.extractToolCalls() (lines 876-1004)
Model Refusal¶
If FunctionGemma refuses to perform tasks:
- Prompt is adjusted to frame as "expert matching" not "medical diagnosis"
- Agent skills provide context about non-diagnostic use case
- System emphasizes expert matching, not medical advice
- MedGemma analysis is provided as context (not asking FunctionGemma to analyze)
Response Size Limits¶
To prevent template rendering issues:
- Response truncation: Responses > 50KB are truncated
- Tool results truncation: Tool results > 3000 chars are truncated
- Case analysis truncation: Case analysis > 1500 chars is truncated
- Prompt size limit: Prompts > 8000 chars are truncated
- Repetition detection: Detects and truncates repetitive response patterns
Response Size Management¶
The system implements multiple truncation points to prevent memory issues, template rendering problems, and ensure UI responsiveness.
Truncation Limits:
-
Response Text (
MatchController.matchDoctors()):- Limit: 50KB (50,000 characters)
- Location: After receiving
AgentResponsefrom API - Implementation:
- Purpose: Prevents Thymeleaf template parsing issues with very large responses
-
Error Messages (
MatchController.matchDoctors()):- Limit: 500 characters
- Location: In catch blocks when exceptions occur
- Implementation:
- Purpose: Prevents error display issues in UI
-
Tool Results (in
MedicalAgentServiceImpl):- Limit: 3000 characters
- Location: Before passing tool results to MedGemma for interpretation
- Purpose: Prevents prompt size issues when tool results are very large
-
Case Analysis (in
MedicalAgentServiceImpl):- Limit: 1500 characters
- Location: Before including in tool calling prompt
- Purpose: Keeps prompts manageable for FunctionGemma
-
Prompt Size (in
MedicalAgentServiceImpl):- Limit: 8000 characters
- Location: Before sending to LLM
- Purpose: Prevents LLM context window issues
-
Tool Result Logging (
LogStreamService.logToolResult()):- Limit: 200 characters
- Location: When logging tool results to SSE stream
- Implementation:
- Purpose: Keeps log streams manageable, prevents UI log panel overflow
Truncation Strategy:
- Append Indicator: All truncations append a message indicating truncation occurred
- Logging: Warnings logged when truncation happens (for monitoring)
- Graceful Degradation: System continues to function even with truncated data
- User Notification: Truncation messages inform users that content was cut
Purpose:
- Template Rendering: Prevents Thymeleaf template parsing failures
- Memory Management: Avoids excessive memory usage
- UI Responsiveness: Ensures UI remains responsive with large responses
- LLM Context: Keeps prompts within model context windows
- Log Performance: Prevents log streams from becoming too large
Implementation Locations:
MatchController.matchDoctors()(lines 84-89, 108-111)LogStreamService.logToolResult()(lines 171-174)MedicalAgentServiceImpl(various locations for prompt/result truncation)
Fallback Behavior¶
If FunctionGemma tool orchestration fails:
- System uses MedGemma case analysis directly
- Returns response with analysis and error note
- Logs warning for monitoring
- Does not silently fail - always provides some response
Performance Considerations¶
-
Response Time: Typically 5-15 seconds depending on:
- Number of tool calls
- Database query complexity
- LLM processing time
-
Optimization:
- Batch database queries when possible
- Cache case analysis results
- Use connection pooling for database
Testing¶
Manual Testing¶
Via REST API:
# Test match doctors endpoint
curl -X POST "http://localhost:8094/api/v1/agent/match/{caseId}" \
-H "Content-Type: application/json" \
-d '{"sessionId":"test-session"}'
# Example with actual case ID
curl -X POST "http://localhost:8094/api/v1/agent/match/696d4041ee50c1cfdb2e27ae" \
-H "Content-Type: application/json" \
-d '{"sessionId":"test-session-123"}'
Via Web UI:
Method 1: Select Existing Case
- Navigate to
http://localhost:8094/match - Select a medical case from the dropdown
- Click "Match Doctors" button
- View results and execution trace
Method 2: Enter Case Information (Text Input)
- Navigate to
http://localhost:8094/match - Scroll to "Enter Case Information" section
- Fill in the form:
- Chief Complaint (required)
- Symptoms (optional)
- Additional Notes (optional)
- Patient Age (optional)
- Case Type (optional, default: Inpatient)
- Optionally click "Search Existing Cases" to find and reuse an existing case
- Click "Match Doctors" button
- View results and execution trace
Case Search Modal:
- Click "Search Existing Cases" button to open the search modal
- Enter search criteria (text query, specialty, urgency level)
- Click "Search" to find matching cases
- Click on a case result to populate the form fields
- The modal closes automatically after selection
Text Input Endpoint:
# Match doctors from text input
curl -X POST "http://localhost:8094/api/v1/agent/match-from-text" \
-H "Content-Type: application/json" \
-d '{
"caseText": "Patient presents with severe chest pain and shortness of breath",
"patientAge": 65,
"caseType": "INPATIENT",
"symptoms": "Chest pain, Shortness of breath",
"additionalNotes": "History of hypertension",
"sessionId": "test-session-123"
}'
# Response: AgentResponse with matched doctors
Case Search Endpoint:
# Search existing cases
curl "http://localhost:8094/api/cases/search?query=chest&specialty=Cardiology&urgencyLevel=HIGH&maxResults=20"
# Response: JSON array of MedicalCase objects
Log Streaming (SSE):
# Stream execution logs for a session
curl -N "http://localhost:8094/api/v1/logs/stream/{sessionId}"
# Example
curl -N "http://localhost:8094/api/v1/logs/stream/test-session-123"
Integration Testing¶
Test Classes:
MedicalAgentServiceHybridApproachIT: Tests hybrid approach with MedGemma + FunctionGemmaMedicalAgentControllerIT: Tests REST API endpoints includingmatchFromTextMatchControllerIT: Tests case search endpoint (GET /api/cases/search)MedicalCaseRepositoryIT: Tests repository search methodMatchingServiceIT: Tests doctor matching logic and scoringUseCase1IT: End-to-end test for Use Case 1 (Specialist Matching)UseCase2IT: End-to-end test for Use Case 2 (Second Opinion)
Test Coverage:
- ✅ Case analysis with MedGemma
- ✅ Tool orchestration with FunctionGemma
- ✅ Doctor querying and filtering
- ✅ Match scoring with Semantic Graph Retrieval service
- ✅ Error handling and fallbacks
- ✅ End-to-end flow from API to database
- ✅ Log streaming functionality
- ✅ Text input endpoint (
matchFromText) with validation - ✅ Case creation from text input
- ✅ Embedding generation for text-created cases
- ✅ Case search functionality (text search, filters, ordering)
- ✅ Case search REST endpoint
Implementation Details¶
Key Classes¶
Controllers:
MatchController(/match): Web UI controller for match pageMedicalAgentController(/api/v1/agent): REST API controller for agent endpoints
Services:
MedicalAgentServiceImpl: Main service orchestrating the hybrid flowMatchingService: Orchestrates doctor matching logicSemanticGraphRetrievalService: Semantic Graph Retrieval scoring serviceCaseAnalysisService: MedGemma-based case analysisLogStreamService: Session-based log streaming
Tools:
MedicalAgentTools: Spring AI@Toolmethods for LLM invocation- Tools registered with FunctionGemma via Spring AI ChatClient
Repositories:
MedicalCaseRepository: Medical case data accessDoctorRepository: Doctor data access
API Endpoints¶
REST API:
POST /api/v1/agent/match/{caseId}: Match doctors to case- Request body:
{"sessionId": "optional-session-id"} - Response:
AgentResponsewith matched doctors and metadata
- Request body:
POST /api/v1/agent/match-from-text: Match doctors from raw text input- Request body:
{ "caseText": "Patient presents with chest pain...", // Required "patientAge": 65, // Optional "caseType": "INPATIENT", // Optional: INPATIENT, SECOND_OPINION, CONSULT_REQUEST "symptoms": "Chest pain, shortness of breath", // Optional "additionalNotes": "History of hypertension", // Optional "sessionId": "optional-session-id" // Optional } - Response:
AgentResponsewith matched doctors and metadata - Creates a medical case, generates embeddings, and matches doctors in a single call
- Supports handwritten text (after OCR/transcription) and direct text input
- Request body:
GET /api/cases/search: Search existing medical cases- Query parameters:
query(optional): Text to search in chiefComplaint, symptoms, additionalNotesspecialty(optional): Filter by required specialtyurgencyLevel(optional): Filter by urgency level (CRITICAL, HIGH, MEDIUM, LOW)maxResults(optional, default: 20): Maximum number of results
- Response: JSON array of
MedicalCaseobjects
- Query parameters:
Web UI:
GET /match: Display match page with case selection and text input formPOST /match/{caseId}: Submit match request (forwards to REST API)
Log Streaming:
GET /api/v1/logs/stream/{sessionId}: Server-Sent Events stream for execution trace
Data Flow¶
Method 1: Match from Existing Case (POST /api/v1/agent/match/{caseId})
- Case Loading:
MedicalCaseRepository.findById()loads case from PostgreSQL - Case Analysis: MedGemma analyzes case text and extracts requirements
- Tool Invocation: FunctionGemma calls
match_doctors_to_casetool - Doctor Matching:
MatchingServicefinds candidates and scores them -
Scoring:
SemanticGraphRetrievalServicecalculates scores using:- Vector embeddings (PgVector)
- Graph relationships (Apache AGE)
- Historical performance (PostgreSQL)
- Result Interpretation: MedGemma interprets results with medical reasoning
- Response: Final response returned to UI with matched doctors
Method 2: Match from Text Input (POST /api/v1/agent/match-from-text)
- Case Creation: Service creates new
MedicalCaseentity from text input- Generates unique case ID using
IdGenerator.generateId() - Sets
chiefComplaintfrom requiredcaseTextparameter - Sets optional fields (
symptoms,additionalNotes,patientAge,caseType)
- Generates unique case ID using
- Case Persistence:
MedicalCaseRepository.insert()saves case to PostgreSQL - Embedding Generation:
- Builds embedding text from case fields (chiefComplaint + symptoms + notes)
- Calls
EmbeddingService.generateEmbedding()to create vector embedding - Updates case with embedding using
MedicalCaseRepository.updateEmbedding()
- Case Analysis: MedGemma analyzes case text and extracts requirements
- Tool Invocation: FunctionGemma calls
match_doctors_to_casetool - Doctor Matching:
MatchingServicefinds candidates and scores them - Scoring:
SemanticGraphRetrievalServicecalculates scores (same as Method 1) - Result Interpretation: MedGemma interprets results with medical reasoning
- Response: Final response returned to UI with matched doctors
Case Search Flow (GET /api/cases/search)
- Query Processing:
MatchController.searchCases()receives search parameters - Repository Search:
MedicalCaseRepository.search()executes SQL query with:- Case-insensitive text search (
ILIKE) inchiefComplaint,symptoms,additionalNotes - Optional filters for
specialtyandurgencyLevel - Result limiting via
LIMITclause - Ordering by
created_at DESC(most recent first)
- Case-insensitive text search (
- Response: Returns JSON array of matching
MedicalCaseobjects
Text Input and Case Search Features¶
Text Input Endpoint¶
The POST /api/v1/agent/match-from-text endpoint enables direct text input for medical cases, supporting:
- Handwritten Text: After OCR/transcription, text can be submitted directly
- Direct Form Input: Users can enter case information without pre-creating cases
- Automatic Processing: Case creation, embedding generation, and doctor matching in a single call
- Optional Parameters: Supports symptoms, additional notes, patient age, and case type
Case Search Functionality¶
The GET /api/cases/search endpoint provides flexible case search capabilities:
- Text Search: Case-insensitive search in chief complaint, symptoms, and additional notes
- Filtering: Filter by specialty and urgency level
- Result Limiting: Configurable maximum results (default: 20)
- Ordering: Results ordered by creation date (most recent first)
UI Integration¶
The web UI (/match page) includes:
- Text Input Form: Direct case entry with all optional fields
- Case Search Modal: Search and select existing cases to populate the form
- Form Auto-Population: Selected cases automatically fill form fields
- Dual Input Methods: Both dropdown selection and text input are available
Benefits¶
- Workflow Simplification: No need to pre-create cases before matching
- Case Reuse: Search and reuse existing cases via modal interface
- Handwritten Support: Enables OCR/transcription workflows
- Time Savings: Form auto-population reduces manual data entry
Related Documentation¶
- Architecture Overview
- MedGemma Configuration
- AI Provider Configuration
- Use Cases
- Development Guide
- Testing Guide - Text input and case search testing