Enhanced Patient Deletion Management: A Comprehensive Guide
Hey guys! Today, we're diving deep into the enhancements made to patient deletion management, mirroring the improvements we implemented in PR #4. This is a critical step in ensuring our system is not only efficient but also compliant with regulations like GDPR. So, let's get started!
Understanding the Context
Previously, PR #4 introduced advanced deletion management for professionals, which included features like soft deletes with a grace period, automated deferred anonymization, detection of returning users after anonymization, deletion blocking if under investigation, and admin endpoints for fine-grained control. This issue aims to bring those exact same high-quality improvements to patient data. It's all about ensuring consistency and compliance with GDPR and other regulatory standards. Think of it as giving our patient data the same VIP treatment we give our professional data. Ensuring we handle sensitive information with the utmost care is paramount, right?
The Importance of Consistent Data Handling
When it comes to data management, consistency is key. Applying the same rigorous standards to patient data as we do to professional data minimizes the risk of errors and oversights. It also simplifies our processes, making it easier for our team to manage data deletion across the board. Plus, a unified approach strengthens our commitment to regulatory compliance, keeping us on the right side of the law. And let's be honest, a little peace of mind goes a long way. We want to create a system where all data, regardless of its category, is handled with the same level of care and precision. The soft delete feature, for instance, provides a safety net, allowing us to recover data within a specified period if needed. This is especially crucial in healthcare, where information can sometimes be needed unexpectedly. The automated deferred anonymization ensures that personal data is rendered unidentifiable over time, a critical requirement under GDPR. By automatically anonymizing data after the grace period, we reduce the risk of unauthorized access to sensitive information.
Addressing the Gaps in Patient Data Management
Currently, there are disparities between how we handle professional and patient data deletions. While our Professional model has comprehensive features, the Patient model lags behind. This means we need to bridge that gap to ensure our data handling is both consistent and compliant. This project is about making sure that we are not leaving any stone unturned in our efforts to protect patient privacy. The under_investigation flag, for example, will prevent data from being deleted if there is an active investigation related to the patient. This feature alone can save us from potential legal and ethical quagmires.
Analyzing the Current Differences
To better understand what needs to be done, let's break down the differences between the Professional and Patient models.
Professional Model (✅ Complete)
The Professional model boasts a robust set of fields to manage deletions effectively:
# Fields present in Professional
under_investigation: bool  # Block deletion if under investigation
investigation_notes: str | None  # Notes related to the investigation
correlation_hash: str | None  # SHA-256 hash for detecting returning users
soft_deleted_at: datetime | None  # Start of the grace period
anonymized_at: datetime | None  # End of anonymization
deleted_at: datetime | None  # Deprecated field
deleted_by: str | None  # User who initiated deletion
deletion_reason: Literal[...]  # Reason for deletion
Patient Model (❌ Incomplete)
The Patient model, however, is missing several key fields:
# Missing fields in Patient
# ❌ under_investigation
# ❌ investigation_notes
# ❌ correlation_hash
# ❌ soft_deleted_at
# ❌ anonymized_at
# ✅ deleted_at (existing, but to be deprecated)
# ✅ deleted_by (existing)
# ✅ deletion_reason (existing)
The goal here is clear: we need to bring the Patient model up to par with the Professional model. This involves adding the missing fields and ensuring they function as intended. These seemingly small changes can have a significant impact on our ability to manage patient data effectively. For instance, the correlation_hash allows us to detect if a patient who has been anonymized tries to re-register. This can be useful for identifying potential fraud or misuse of the system. The soft_deleted_at and anonymized_at fields are crucial for implementing the soft delete and deferred anonymization features, respectively.
Implementation Plan (TDD - 3 Phases)
We'll be following a Test-Driven Development (TDD) approach in three phases to ensure everything is implemented correctly. TDD helps us catch issues early and ensures our code does exactly what we intend it to do. It's like building a house with a solid foundation – we start with tests that define what we want our code to do, then write the code to pass those tests.
Phase 1: Model and Schemas (Commits 1-3)
This phase focuses on updating the Patient model and creating the necessary schemas. This is the groundwork, laying the foundation for the rest of the implementation. The first few commits will be crucial in setting the stage for the subsequent phases.
Commit 1: Adding Fields to the Patient Model
- [ ] Add 
under_investigation: bool(defaultFalse, indexed) - [ ] Add 
investigation_notes: str | None(varchar 1000) - [ ] Add 
soft_deleted_at: datetime | None(indexed, timezone-aware) - [ ] Add 
anonymized_at: datetime | None(indexed, timezone-aware) - [ ] Add 
correlation_hash: str | None(varchar 64, indexed) - [ ] Create Alembic migration
 - [ ] Apply migration in development
 
Modified Files:
app/models/patient.pyalembic/versions/XXXX_add_deletion_fields_to_patients.py
Commit 2: Pydantic Schemas for Deletion Management
- [ ] Create 
PatientDeletionContext(deletion reason with metadata) - [ ] Create 
PatientRestoreRequest(restoration after soft delete) - [ ] Create 
PatientAnonymizationStatus(anonymization status) - [ ] Create 
PatientDeletionBlockedResponse(error if under investigation) - [ ] Adapt 
deletion_reasonto include patient-specific cases 
Modified Files:
app/schemas/patient.py
deletion_reason values for patients:
Literal[
    "user_request",           # Patient request
    "gdpr_compliance",        # GDPR compliance
    "admin_action",           # Admin action
    "prolonged_inactivity",   # Prolonged inactivity
    "duplicate_account",      # Duplicate account
    "deceased",               # Patient deceased
    "other",
]
Commit 3: Correlation Functions for Return Detection
- [ ] Implement 
_generate_patient_correlation_hash(email: str, patient_id: int) -> str- Use SHA-256(email + patient_id)
 - Generate only before anonymization
 
 - [ ] Implement 
_check_returning_patient(email: str) -> dict | None- Search for 
correlation_hashin anonymized patients - Return metadata if match found
 
 - Search for 
 - [ ] Create unit tests (6 tests minimum)
 
Modified Files:
app/services/patient_service.pytests/unit/test_patient_correlation.py
Phase 2: Soft Delete and Anonymization (Commits 4-7)
This phase is where we bring the core deletion functionalities to life. We'll implement soft delete, deferred anonymization, and detection of returning patients. It's the heart of our project, where we transform the theoretical into the practical.
Commits 4-5: Soft Delete with Grace Period (TDD)
RED Phase:
- [ ] Create 3 tests:
test_soft_delete_patient_sets_soft_deleted_attest_soft_delete_blocked_if_under_investigationtest_soft_delete_generates_correlation_hash
 
GREEN Phase:
- [ ] Modify 
_soft_delete()inpatient_service.py- Check 
under_investigation(raise RFC 9457 exception if True) - Set 
soft_deleted_at = now() - Generate 
correlation_hashbefore anonymization - Publish 
identity.patient.soft_deletedevent 
 - Check 
 - [ ] Adapt DELETE endpoint 
/api/v1/patients/{id} 
REFACTOR Phase:
- [ ] Linting with ruff
 - [ ] Ensure 0 regression on existing tests
 
Modified Files:
app/services/patient_service.pyapp/api/v1/endpoints/patients.pytests/unit/test_patient_soft_delete.py
Commit 6: Deferred Anonymization (Scheduler)
RED Phase:
- [ ] Create 2 tests:
test_anonymize_expired_patients(soft_deleted_at > 7 days)test_skip_patients_in_grace_period(soft_deleted_at < 7 days)
 
GREEN Phase:
- [ ] Create 
app/services/patient_anonymization_scheduler.py - [ ] Implement 
anonymize_expired_patient_deletions(test_mode: bool = False)- Query: 
soft_deleted_at < now() - 7 days AND anonymized_at IS NULL - Anonymization: irreversible bcrypt hashing
 - Set 
anonymized_at = now() - Publish 
identity.patient.anonymizedevent 
 - Query: 
 - [ ] Implement 
_anonymize_patient_entity(patient: Patient)- Bcrypt hash: first_name, last_name, email
 - Replace: phone → 
"+ANONYMIZED" - Keep: correlation_hash, soft_deleted_at, anonymized_at
 
 - [ ] APScheduler integration (commented out, ready for production)
 
REFACTOR Phase:
- [ ] Linting with ruff
 - [ ] Ensure 0 regression
 
Modified Files:
app/services/patient_anonymization_scheduler.pyapp/main.py(commented-out scheduler integration)tests/unit/test_patient_anonymization_scheduler.py
Commit 7: Detecting Returning Patients
RED Phase:
- [ ] Create 3 tests:
test_detect_returning_patient_after_anonymizationtest_no_detection_for_new_patienttest_returning_patient_event_published
 
GREEN Phase:
- [ ] Modify 
sync_patient_registration()inpatient_service.py- Call 
_check_returning_patient(email) - If match: publish 
identity.patient.returning_userevent - Create new independent account (no auto-restoration)
 
 - Call 
 
REFACTOR Phase:
- [ ] Linting with ruff
 
Modified Files:
app/services/patient_service.pytests/unit/test_patient_returning_detection.py
Phase 3: Admin Endpoints and Documentation (Commits 8-10)
The final phase focuses on providing administrative control and documenting our work. This includes creating admin endpoints for managing patient deletions and ensuring our documentation is clear and comprehensive. It's about making our system not just functional but also manageable and understandable.
Commit 8: Admin Endpoints
- 
[ ] POST
/api/v1/admin/patients/{id}/investigation- Body: 
{"investigation_notes": "Investigation in progress..."} - Set 
under_investigation = True - Publish 
identity.patient.investigation_startedevent - Return: 200 OK with updated status
 
 - Body: 
 - 
[ ] DELETE
/api/v1/admin/patients/{id}/investigation- Set 
under_investigation = False - Clear 
investigation_notes - Publish 
identity.patient.investigation_endedevent - Return: 200 OK
 
 - Set 
 - 
[ ] POST
/api/v1/admin/patients/{id}/restore- Check: 
soft_deleted_at IS NOT NULL AND anonymized_at IS NULL - Set 
soft_deleted_at = None,correlation_hash = None - Publish 
identity.patient.restoredevent - Return: 200 OK with restored patient
 - Error 400 if already anonymized (irreversible)
 
 - Check: 
 - 
[ ] GET
/api/v1/admin/patients/deleted- Filter: 
soft_deleted_at IS NOT NULL - Pagination: query params 
page,limit - Sort: by 
soft_deleted_at DESC - Include: time remaining before anonymization (7 days - elapsed)
 - Return: paginated list with metadata
 
 - Filter: 
 
Modified Files:
app/api/v1/endpoints/admin/patients.py(new file)app/api/v1/admin/__init__.pytests/unit/test_admin_patient_endpoints.py
Commit 9: E2E Integration Tests
- [ ] Create 
tests/integration/test_patient_deletion_workflow.py - [ ] Tests with real PostgreSQL and Redis
 - [ ] Complete workflow:
- Patient registration
 - Soft delete (grace period)
 - Wait for expiration (simulation via fixture)
 - Automatic anonymization
 - New registration with same email → return detection
 
 - [ ] Validate published events (Redis Pub/Sub)
 
Modified Files:
tests/integration/test_patient_deletion_workflow.pydocker-compose.test.yaml(if adjustments needed)
Commit 10: Final Documentation
- [ ] Update 
CLAUDE.mdwith patient examples - [ ] Document admin endpoints in OpenAPI
 - [ ] Create sequence diagrams:
- Soft delete → anonymization workflow
 - Patient return detection
 
 - [ ] Scheduler integration guide (production)
 - [ ] Usage examples in 
docs/patient_deletion.md 
Modified Files:
CLAUDE.mddocs/patient_deletion.md(new)app/api/v1/endpoints/admin/patients.py(OpenAPI docstrings)
Published Events (Redis Pub/Sub)
Here’s a rundown of the events we’ll be publishing, which help keep other parts of the system informed about changes.
- 
identity.patient.soft_deleted(Commit 4-5){ "patient_id": 123, "keycloak_user_id": "uuid", "soft_deleted_at": "2025-11-02T10:00:00Z", "deletion_reason": "user_request", "grace_period_expires_at": "2025-11-09T10:00:00Z" } - 
identity.patient.anonymized(Commit 6){ "patient_id": 123, "anonymized_at": "2025-11-09T10:00:00Z", "correlation_hash": "sha256_hash" } - 
identity.patient.returning_user(Commit 7){ "new_patient_id": 456, "previous_patient_id": 123, "email": "patient@example.com", "detected_at": "2025-12-01T14:00:00Z", "previous_anonymized_at": "2025-11-09T10:00:00Z" } - 
identity.patient.investigation_started(Commit 8) - 
identity.patient.investigation_ended(Commit 8) - 
identity.patient.restored(Commit 8) 
Success Criteria
To ensure we've nailed this, here are our success criteria:
- [ ] Full parity with PR #4: Same features for patients as professionals
 - [ ] Exhaustive testing: 100% coverage of new features
 - [ ] Strict TDD: RED → GREEN → REFACTOR cycle respected
 - [ ] 0 regression: All existing tests pass
 - [ ] Linting: 0 ruff errors
 - [ ] Complete documentation: CLAUDE.md, OpenAPI, guides
 - [ ] Reversible migration: 
migrate-upandmigrate-downfunctional - [ ] Backward compatibility: NO breaking changes
 
References
Here are some useful references to guide us:
- PR #4: feat/professional-deletion-improvements
 - Professional Model: 
app/models/professional.py(lines 129-180) - Professional Service: 
app/services/professional_service.py - Professional Scheduler: 
app/services/anonymization_scheduler.py - RFC 9457: Problem Details for HTTP APIs
 - GDPR: Articles 17 (right to be forgotten), 5.1.e (storage limitation)
 
Labels
We'll be using these labels to keep things organized:
enhancement(feature improvement)good first issue(well-structured, guided TDD)priority: high(critical model parity)area: patients(concerns Patient entity)area: rgpd(GDPR compliance)
Estimation
Here's our estimation for the task:
- Complexity: Medium (existing structure in PR #4 to reuse)
 - Estimated time: 12-16 hours (10 commits)
 - Dependencies: None (independent of PR #4)
 - Priority: High (critical functional parity)
 
Let's make this happen, team! By bringing the Patient model up to par with the Professional model, we're not just improving our system—we're reinforcing our commitment to high-quality data management and regulatory compliance.