MedExpertMatch Development Guide¶
Last Updated: 2026-02-04
Status: Implementation Phase
Overview¶
This guide provides setup instructions and development workflow for MedExpertMatch.
Prerequisites¶
- Java 21 (LTS)
- Maven 3.9+
- PostgreSQL 17 with PgVector and Apache AGE extensions
- Docker and Docker Compose (for local development)
- Python 3.8+ (for documentation)
Project Structure¶
MedExpertMatch uses a modular structure organized by domain:
med-expert-match/
├── docs/ # Documentation
├── src/
│ └── main/
│ ├── java/
│ │ └── com/berdachuk/medexpertmatch/
│ │ ├── core/ # Configuration, utilities, monitoring
│ │ ├── doctor/ # Doctor domain
│ │ ├── medicalcase/ # Medical case domain
│ │ ├── medicalcoding/ # ICD-10 codes
│ │ ├── clinicalexperience/ # Clinical experience
│ │ ├── facility/ # Facility domain
│ │ ├── caseanalysis/ # Case analysis service
│ │ ├── retrieval/ # Matching and Semantic Graph Retrieval services
│ │ ├── llm/ # LLM orchestration, agent skills
│ │ ├── graph/ # Graph service (Apache AGE)
│ │ ├── ingestion/ # Data ingestion, FHIR adapters
│ │ └── web/ # Web UI controllers
│ └── resources/
│ ├── db/migration/ # Flyway migrations
│ ├── prompts/ # Prompt templates (.st files)
│ ├── sql/ # SQL query files
│ ├── templates/ # Thymeleaf templates
│ └── static/ # Static resources (CSS, JS)
└── pom.xml
See Architecture for detailed module descriptions.
Current Implementation Status¶
Completed¶
- ✅ Core domain models (Doctor, MedicalCase, ICD10Code, ClinicalExperience, Facility)
- ✅ Database schema with Flyway migrations
- ✅ Repository layer with JDBC implementations
- ✅ Spring AI configuration (
SpringAIConfig.java) with custom property mapping - ✅ MedGemma integration via OpenAI-compatible providers
- ✅ Tool calling support with FunctionGemma (
MedicalAgentConfiguration) - ✅ Case analysis service (
CaseAnalysisService) using MedGemma - ✅ Matching services (
MatchingService,SemanticGraphRetrievalService) - ✅ Embedding service (
EmbeddingService) for vector embedding generation - ✅ Vector similarity calculation using pgvector cosine distance
- ✅ Graph service (
GraphService) for Apache AGE queries - ✅ Graph builder service (
MedicalGraphBuilderService) for populating graph with vertices and edges - ✅ Automatic graph building after synthetic data generation
- ✅ Medical agent service (
MedicalAgentService) with Agent Skills integration - ✅ 7 Agent Skills (case-analyzer, doctor-matcher, evidence-retriever, recommendation-engine, clinical-advisor, network-analyzer, routing-planner)
- ✅ Java @Tool methods (
MedicalAgentTools) - ✅ FHIR adapters for data ingestion
- ✅ Automatic embedding generation in test data flow
- ✅ Web UI controllers with Thymeleaf templates
- ✅ REST API endpoints for agent operations
- ✅ Text input endpoint (
POST /api/v1/agent/match-from-text) for direct text input - ✅ Case search endpoint (
GET /api/cases/search) for searching existing cases - ✅ UI text input form and case search modal
In Progress¶
- 🔄 Integration testing
- 🔄 Performance optimization
- 🔄 UI implementation completion
Configuration¶
The application uses custom Spring AI configuration that reads from spring.ai.custom.* properties:
- Environment variables →
application.yml(property mapping) →SpringAIConfig.java→ Spring AI Beans - Separate configuration for chat, embedding, reranking, and tool calling
- See AI Provider Configuration for details
Building Documentation¶
# Install dependencies
pip install -r requirements-docs.txt
# Serve documentation locally
mkdocs serve
# Build documentation
mkdocs build
Local Development Setup¶
Prerequisites¶
- Java 21 (LTS)
- Maven 3.9+
- PostgreSQL 17 with PgVector and Apache AGE 1.6.0
- Docker and Docker Compose (for local database)
- Ollama (for local MedGemma models) - Optional
- FunctionGemma (for tool calling) - Required if using agent skills
Database Setup¶
# Start PostgreSQL with docker-compose
docker compose -f docker-compose.dev.yml up -d
# Database will be available at localhost:5433
# Database: medexpertmatch
# User: medexpertmatch
# Password: medexpertmatch
MedGemma Setup (Local Development)¶
See MedGemma Setup Guide for detailed instructions.
Quick Start:
# Pull MedGemma model (if using Ollama)
ollama pull hf.co/unsloth/medgemma-27b-text-it-GGUF:IQ3_XXS
# Pull FunctionGemma (required for tool calling)
ollama pull functiongemma
# Pull embedding model
ollama pull nomic-embed-text
Running the Application¶
# With local profile (uses application-local.yml)
mvn spring-boot:run -Dspring-boot.run.arguments=--spring.profiles.active=local
# Or set environment variable
export SPRING_PROFILES_ACTIVE=local
mvn spring-boot:run
The application will start on port 8094 (local profile) or 8080 (default).
Testing¶
# Run all tests
mvn test
# Run integration tests only
mvn verify
# Build test container first (required for integration tests)
./scripts/build-test-container.sh
# Run embedding-specific tests
mvn test -Dtest=EmbeddingServiceIT,MedicalCaseRepositoryEmbeddingIT,TestDataGeneratorEmbeddingIT
Description and Embedding Generation¶
Medical case descriptions and vector embeddings are automatically generated during test data creation:
# Generate test data (includes automatic description generation, embedding generation, and graph building)
curl -X POST http://localhost:8094/api/v1/test-data/generate?size=small&clear=true
# Descriptions are generated automatically after medical cases are created (55% progress)
# Embeddings are generated automatically after descriptions are created (70-90% progress)
# Graph is built automatically after clinical experiences are created (95% progress)
# Progress is logged throughout the generation process
The description generation process:
- Finds all medical cases without descriptions
- Generates comprehensive descriptions using
MedicalCaseDescriptionService(LLM-enhanced) - Falls back to simple text concatenation if LLM fails
- Stores descriptions in the
abstractfield of medical cases
The embedding generation process:
- Finds all medical cases without embeddings
- Uses stored descriptions (from description generation step) for embedding creation
- Generates 1536-dimensional embeddings using Spring AI
EmbeddingModel - Normalizes and stores embeddings in PostgreSQL using pgvector format
- Updates embedding dimension metadata
Graph Building¶
Apache AGE graph is automatically built after synthetic data generation:
# Generate test data (includes automatic graph building)
curl -X POST http://localhost:8094/api/v1/test-data/generate?size=small&clear=true
# Graph building happens automatically at 95% progress
# Graph is populated with vertices and relationships from database data
The graph building process:
- Creates graph structure if it doesn't exist (
medexpertmatch_graph) - Creates all vertices (doctors, medical cases, ICD-10 codes, specialties, facilities)
- Creates graph indexes for performance (GIN indexes on properties JSONB columns)
- Creates all relationships in batches (1000 per batch):
- TREATED relationships from ClinicalExperience
- SPECIALIZES_IN relationships from Doctor.specialties
- HAS_CONDITION relationships from MedicalCase.icd10Codes
- TREATS_CONDITION relationships from ClinicalExperience + MedicalCase
- REQUIRES_SPECIALTY relationships from MedicalCase.requiredSpecialty
- AFFILIATED_WITH relationships from Doctor.facilityIds
- Graph building errors are logged but don't fail data generation (optional step)
Manual Graph Building:
# Build graph manually (if needed)
# Graph can be rebuilt safely - uses MERGE operations (idempotent)
# Called automatically by SyntheticDataGenerator.buildGraph()
Apache AGE Cypher Query Patterns¶
When working with Apache AGE graph queries, follow these patterns to ensure compatibility:
Using GraphService¶
All graph operations must go through the GraphService interface:
@Autowired
private GraphService graphService;
// Execute a Cypher query
Map<String, Object> params = new HashMap<>();
params.
put("doctorId","123");
params.
put("name","Dr. Smith");
params.
put("email","dr.smith@example.com");
List<Map<String, Object>> results = graphService.executeCypher(
"MERGE (d:Doctor {id: $doctorId, name: $name, email: $email})",
params
);
MERGE Clause - Critical Pattern¶
CRITICAL: When using MERGE with embedded parameters, include ALL properties in the MERGE clause itself.
✅ Valid Pattern:
❌ Invalid Pattern (will fail with BadSqlGrammarException):
// This pattern fails when parameters are embedded as strings
String cypher = "MERGE (d:Doctor {id: $doctorId}) SET d.name = $name, d.email = $email";
Reason: Apache AGE 1.6.0's parser does not properly handle MERGE ... SET pattern when parameters are embedded as
strings. While MERGE ... SET works with literal values in test files, it fails when using the parameter embedding
mechanism.
Vertex Creation Examples¶
Single Property:
String cypher = "MERGE (s:MedicalSpecialty {id: $specialtyId, name: $name})";
Map<String, Object> params = new HashMap<>();
params.
put("specialtyId",specialtyId);
params.
put("name",name);
graphService.
executeCypher(cypher, params);
Multiple Properties:
String cypher = "MERGE (d:Doctor {id: $doctorId, name: $name, email: $email})";
Map<String, Object> params = new HashMap<>();
params.
put("doctorId",doctorId);
params.
put("name",name !=null?name:"");
params.
put("email",email !=null?email:"");
graphService.
executeCypher(cypher, params);
Complex Vertex:
String cypher = """
MERGE (c:MedicalCase {
id: $caseId,
chiefComplaint: $chiefComplaint,
urgencyLevel: $urgencyLevel
})
""";
Relationship Creation Examples¶
Simple Relationship:
String cypher = """
MATCH (d:Doctor {id: $doctorId})
MATCH (c:MedicalCase {id: $caseId})
MERGE (d)-[:TREATED]->(c)
""";
Map<String, Object> params = new HashMap<>();
params.
put("doctorId",doctorId);
params.
put("caseId",caseId);
graphService.
executeCypher(cypher, params);
Relationship with Properties:
String cypher = """
MATCH (d:Doctor {id: $doctorId})
MATCH (c:MedicalCase {id: $caseId})
MERGE (d)-[r:TREATED {created: $created, outcome: $outcome}]->(c)
""";
Parameter Handling¶
-
Null Values: Always provide default values for nullable parameters
-
String Escaping: Parameters are automatically escaped by
GraphService- Single quotes are escaped:
'→\' - Backslashes are escaped:
\→\\ - Newlines/tabs are escaped:
\n,\t
- Single quotes are escaped:
-
Parameter Map: Always use
Map<String, Object>for parameters
Query Format Guidelines¶
Single-Line Queries (preferred for simple operations):
Multi-Line Queries (for complex queries with MATCH clauses):
String cypher = """
MATCH (d:Doctor {id: $doctorId})
MATCH (c:MedicalCase {id: $caseId})
MERGE (d)-[:TREATED]->(c)
""";
Error Handling¶
The GraphService handles errors gracefully:
- Graph Not Exists: Automatically creates graph if it doesn't exist (handles
3F000SQL state) - Transaction Aborted: Returns empty results when transaction is aborted (
25P02SQL state) - Apache AGE Compatibility: Catches
BadSqlGrammarExceptionand returns empty results - Logging: All failures are logged with WARN level, including the query string
Best Practices¶
- Always Use GraphService: Never execute Cypher queries directly via JDBC
- Idempotent Operations: Use
MERGEfor all vertex/edge creation - Null Handling: Always provide default values for nullable parameters
- Error Recovery: Graph operations gracefully degrade - failures return empty results
- Testing: Test graph operations with real Apache AGE in Testcontainers
Implementation Details¶
- Service Location:
com.berdachuk.medexpertmatch.graph.service.impl.GraphServiceImpl - Graph Name: Uses constant
GRAPH_NAME = "medexpertmatch"(configurable viamedexpertmatch.graph.nameproperty) - Connection Handling: Automatically executes
LOAD 'age'on each connection - Dollar-Quoted Strings: Uses PostgreSQL dollar-quoted strings (
$$) to safely embed Cypher queries - Function Call: Executes
ag_catalog.cypher(graph_name, query_string)function
For more details, see the .cursorrules file section "Apache AGE Graph Database Usage".
Error Handling & Null Safety¶
Fail-Fast Null Checks¶
Always validate parameters at method start to fail fast with clear error messages:
@Override
public String generateDescription(MedicalCase medicalCase) {
if (medicalCase == null) {
throw new IllegalArgumentException("MedicalCase cannot be null");
}
// ... rest of method
}
Benefits: Prevents NPE in catch blocks, makes debugging easier, provides clear error messages.
Safe Error Logging¶
Use safely computed variables in catch blocks, never call methods on potentially null objects:
String caseId = medicalCase.id() != null ? medicalCase.id() : "unknown";
long startTime = System.currentTimeMillis();
try{
// ... operation
}catch(
Exception e){
long duration = System.currentTimeMillis() - startTime;
log.
error("Error processing case: {} | Duration: {} ms",caseId, duration, e);
// Use caseId, not medicalCase.id() - avoids NPE if medicalCase is null
}
Duration Calculation¶
Always calculate elapsed time correctly by declaring startTime outside try block:
long startTime = System.currentTimeMillis(); // Declare outside try block
try{
// ... operation
long endTime = System.currentTimeMillis();
long duration = endTime - startTime;
}catch(
Exception e){
long duration = System.currentTimeMillis() - startTime; // Correct elapsed time
log.
error("Error after {} ms",duration, e);
}
Anti-pattern (avoid):
try{
// ...
}catch(Exception e){
long duration = System.currentTimeMillis(); // Wrong - this is current time, not elapsed!
log.
error("Error after {} ms",duration, e);
}
LLM Call Rate Limiting¶
Rate limiting should be handled by callers, not service implementations. This prevents double wrapping:
// Service implementation - NO rate limiting wrapper
public String generateDescription(MedicalCase medicalCase) {
// Direct LLM call - no llmCallLimiter wrapper
return chatClient.prompt(prompt).call().content();
}
// Caller - handles rate limiting
public void processBatch(List<MedicalCase> cases) {
for (MedicalCase
case :
cases){
String description = llmCallLimiter.execute(LlmClientType.CHAT, () -> {
return descriptionService.generateDescription(
case);
});
}
}
Rationale: Rate limiting is infrastructure concern, not business logic. Service should focus on description generation, callers manage concurrency.
For more details, see Coding Rules - Error Handling.
Configuration¶
Environment Variables¶
Key environment variables for AI configuration:
CHAT_PROVIDER,CHAT_BASE_URL,CHAT_API_KEY,CHAT_MODEL,CHAT_TEMPERATURE,CHAT_MAX_TOKENSEMBEDDING_PROVIDER,EMBEDDING_BASE_URL,EMBEDDING_API_KEY,EMBEDDING_MODEL,EMBEDDING_DIMENSIONSRERANKING_PROVIDER,RERANKING_BASE_URL,RERANKING_API_KEY,RERANKING_MODEL,RERANKING_TEMPERATURETOOL_CALLING_PROVIDER,TOOL_CALLING_BASE_URL,TOOL_CALLING_API_KEY,TOOL_CALLING_MODEL,TOOL_CALLING_TEMPERATURE,TOOL_CALLING_MAX_TOKENS
These are mapped to spring.ai.custom.* properties in application.yml.
Application Profiles¶
local- Local development with Ollama/MedGemmadev- Development with remote AI providerstest- Testing environment (uses mock AI providers)prod- Production environment
Related Documentation¶
- Architecture - System architecture
- AI Provider Configuration - AI provider setup
- MedGemma Configuration - MedGemma model configuration
- MedGemma Setup - Local MedGemma setup guide
- Testing - Testing guidelines
Last updated: 2026-01-23