Enhanced Patient Deletion Management: A Comprehensive Guide

Nov 2, 2025 by Admin 60 views

Hey guys! Today, we're diving deep into the enhancements made to patient deletion management, mirroring the improvements we implemented in PR #4. This is a critical step in ensuring our system is not only efficient but also compliant with regulations like GDPR. So, let's get started!

Understanding the Context

Previously, PR #4 introduced advanced deletion management for professionals, which included features like soft deletes with a grace period, automated deferred anonymization, detection of returning users after anonymization, deletion blocking if under investigation, and admin endpoints for fine-grained control. This issue aims to bring those exact same high-quality improvements to patient data. It's all about ensuring consistency and compliance with GDPR and other regulatory standards. Think of it as giving our patient data the same VIP treatment we give our professional data. Ensuring we handle sensitive information with the utmost care is paramount, right?

The Importance of Consistent Data Handling

When it comes to data management, consistency is key. Applying the same rigorous standards to patient data as we do to professional data minimizes the risk of errors and oversights. It also simplifies our processes, making it easier for our team to manage data deletion across the board. Plus, a unified approach strengthens our commitment to regulatory compliance, keeping us on the right side of the law. And let's be honest, a little peace of mind goes a long way. We want to create a system where all data, regardless of its category, is handled with the same level of care and precision. The soft delete feature, for instance, provides a safety net, allowing us to recover data within a specified period if needed. This is especially crucial in healthcare, where information can sometimes be needed unexpectedly. The automated deferred anonymization ensures that personal data is rendered unidentifiable over time, a critical requirement under GDPR. By automatically anonymizing data after the grace period, we reduce the risk of unauthorized access to sensitive information.

Addressing the Gaps in Patient Data Management

Currently, there are disparities between how we handle professional and patient data deletions. While our Professional model has comprehensive features, the Patient model lags behind. This means we need to bridge that gap to ensure our data handling is both consistent and compliant. This project is about making sure that we are not leaving any stone unturned in our efforts to protect patient privacy. The under_investigation flag, for example, will prevent data from being deleted if there is an active investigation related to the patient. This feature alone can save us from potential legal and ethical quagmires.

Analyzing the Current Differences

To better understand what needs to be done, let's break down the differences between the Professional and Patient models.

Professional Model (✅ Complete)

The Professional model boasts a robust set of fields to manage deletions effectively:

# Fields present in Professional
under_investigation: bool  # Block deletion if under investigation
investigation_notes: str | None  # Notes related to the investigation
correlation_hash: str | None  # SHA-256 hash for detecting returning users
soft_deleted_at: datetime | None  # Start of the grace period
anonymized_at: datetime | None  # End of anonymization
deleted_at: datetime | None  # Deprecated field
deleted_by: str | None  # User who initiated deletion
deletion_reason: Literal[...]  # Reason for deletion

Patient Model (❌ Incomplete)

The Patient model, however, is missing several key fields:

# Missing fields in Patient
# ❌ under_investigation
# ❌ investigation_notes
# ❌ correlation_hash
# ❌ soft_deleted_at
# ❌ anonymized_at
# ✅ deleted_at (existing, but to be deprecated)
# ✅ deleted_by (existing)
# ✅ deletion_reason (existing)

The goal here is clear: we need to bring the Patient model up to par with the Professional model. This involves adding the missing fields and ensuring they function as intended. These seemingly small changes can have a significant impact on our ability to manage patient data effectively. For instance, the correlation_hash allows us to detect if a patient who has been anonymized tries to re-register. This can be useful for identifying potential fraud or misuse of the system. The soft_deleted_at and anonymized_at fields are crucial for implementing the soft delete and deferred anonymization features, respectively.

Implementation Plan (TDD - 3 Phases)

We'll be following a Test-Driven Development (TDD) approach in three phases to ensure everything is implemented correctly. TDD helps us catch issues early and ensures our code does exactly what we intend it to do. It's like building a house with a solid foundation – we start with tests that define what we want our code to do, then write the code to pass those tests.

Phase 1: Model and Schemas (Commits 1-3)

This phase focuses on updating the Patient model and creating the necessary schemas. This is the groundwork, laying the foundation for the rest of the implementation. The first few commits will be crucial in setting the stage for the subsequent phases.

Commit 1: Adding Fields to the Patient Model

[ ] Add under_investigation: bool (default False, indexed)
[ ] Add investigation_notes: str | None (varchar 1000)
[ ] Add soft_deleted_at: datetime | None (indexed, timezone-aware)
[ ] Add anonymized_at: datetime | None (indexed, timezone-aware)
[ ] Add correlation_hash: str | None (varchar 64, indexed)
[ ] Create Alembic migration
[ ] Apply migration in development

Modified Files:

app/models/patient.py
alembic/versions/XXXX_add_deletion_fields_to_patients.py

Commit 2: Pydantic Schemas for Deletion Management

[ ] Create PatientDeletionContext (deletion reason with metadata)
[ ] Create PatientRestoreRequest (restoration after soft delete)
[ ] Create PatientAnonymizationStatus (anonymization status)
[ ] Create PatientDeletionBlockedResponse (error if under investigation)
[ ] Adapt deletion_reason to include patient-specific cases

Modified Files:

app/schemas/patient.py

deletion_reason values for patients:

Literal[
    "user_request",           # Patient request
    "gdpr_compliance",        # GDPR compliance
    "admin_action",           # Admin action
    "prolonged_inactivity",   # Prolonged inactivity
    "duplicate_account",      # Duplicate account
    "deceased",               # Patient deceased
    "other",
]

Commit 3: Correlation Functions for Return Detection

[ ] Implement _generate_patient_correlation_hash(email: str, patient_id: int) -> str
- Use SHA-256(email + patient_id)
- Generate only before anonymization
[ ] Implement _check_returning_patient(email: str) -> dict | None
- Search for correlation_hash in anonymized patients
- Return metadata if match found
[ ] Create unit tests (6 tests minimum)

Modified Files:

app/services/patient_service.py
tests/unit/test_patient_correlation.py

Phase 2: Soft Delete and Anonymization (Commits 4-7)

This phase is where we bring the core deletion functionalities to life. We'll implement soft delete, deferred anonymization, and detection of returning patients. It's the heart of our project, where we transform the theoretical into the practical.

Commits 4-5: Soft Delete with Grace Period (TDD)

RED Phase:

[ ] Create 3 tests:
- test_soft_delete_patient_sets_soft_deleted_at
- test_soft_delete_blocked_if_under_investigation
- test_soft_delete_generates_correlation_hash

GREEN Phase:

[ ] Modify _soft_delete() in patient_service.py
- Check under_investigation (raise RFC 9457 exception if True)
- Set soft_deleted_at = now()
- Generate correlation_hash before anonymization
- Publish identity.patient.soft_deleted event
[ ] Adapt DELETE endpoint /api/v1/patients/{id}

REFACTOR Phase:

[ ] Linting with ruff
[ ] Ensure 0 regression on existing tests

Modified Files:

app/services/patient_service.py
app/api/v1/endpoints/patients.py
tests/unit/test_patient_soft_delete.py

Commit 6: Deferred Anonymization (Scheduler)

RED Phase:

[ ] Create 2 tests:
- test_anonymize_expired_patients (soft_deleted_at > 7 days)
- test_skip_patients_in_grace_period (soft_deleted_at < 7 days)

GREEN Phase:

[ ] Create app/services/patient_anonymization_scheduler.py
[ ] Implement anonymize_expired_patient_deletions(test_mode: bool = False)
- Query: soft_deleted_at < now() - 7 days AND anonymized_at IS NULL
- Anonymization: irreversible bcrypt hashing
- Set anonymized_at = now()
- Publish identity.patient.anonymized event
[ ] Implement _anonymize_patient_entity(patient: Patient)
- Bcrypt hash: first_name, last_name, email
- Replace: phone → "+ANONYMIZED"
- Keep: correlation_hash, soft_deleted_at, anonymized_at
[ ] APScheduler integration (commented out, ready for production)

REFACTOR Phase:

[ ] Linting with ruff
[ ] Ensure 0 regression

Modified Files:

app/services/patient_anonymization_scheduler.py
app/main.py (commented-out scheduler integration)
tests/unit/test_patient_anonymization_scheduler.py

Commit 7: Detecting Returning Patients

RED Phase:

[ ] Create 3 tests:
- test_detect_returning_patient_after_anonymization
- test_no_detection_for_new_patient
- test_returning_patient_event_published

GREEN Phase:

[ ] Modify sync_patient_registration() in patient_service.py
- Call _check_returning_patient(email)
- If match: publish identity.patient.returning_user event
- Create new independent account (no auto-restoration)

REFACTOR Phase:

[ ] Linting with ruff

Modified Files:

app/services/patient_service.py
tests/unit/test_patient_returning_detection.py

Phase 3: Admin Endpoints and Documentation (Commits 8-10)

The final phase focuses on providing administrative control and documenting our work. This includes creating admin endpoints for managing patient deletions and ensuring our documentation is clear and comprehensive. It's about making our system not just functional but also manageable and understandable.

Commit 8: Admin Endpoints

[ ] POST /api/v1/admin/patients/{id}/investigation
- Body: {"investigation_notes": "Investigation in progress..."}
- Set under_investigation = True
- Publish identity.patient.investigation_started event
- Return: 200 OK with updated status
[ ] DELETE /api/v1/admin/patients/{id}/investigation
- Set under_investigation = False
- Clear investigation_notes
- Publish identity.patient.investigation_ended event
- Return: 200 OK
[ ] POST /api/v1/admin/patients/{id}/restore
- Check: soft_deleted_at IS NOT NULL AND anonymized_at IS NULL
- Set soft_deleted_at = None, correlation_hash = None
- Publish identity.patient.restored event
- Return: 200 OK with restored patient
- Error 400 if already anonymized (irreversible)
[ ] GET /api/v1/admin/patients/deleted
- Filter: soft_deleted_at IS NOT NULL
- Pagination: query params page, limit
- Sort: by soft_deleted_at DESC
- Include: time remaining before anonymization (7 days - elapsed)
- Return: paginated list with metadata

Modified Files:

app/api/v1/endpoints/admin/patients.py (new file)
app/api/v1/admin/__init__.py
tests/unit/test_admin_patient_endpoints.py

Commit 9: E2E Integration Tests

[ ] Create tests/integration/test_patient_deletion_workflow.py
[ ] Tests with real PostgreSQL and Redis
[ ] Complete workflow:
1. Patient registration
2. Soft delete (grace period)
3. Wait for expiration (simulation via fixture)
4. Automatic anonymization
5. New registration with same email → return detection
[ ] Validate published events (Redis Pub/Sub)

Modified Files:

tests/integration/test_patient_deletion_workflow.py
docker-compose.test.yaml (if adjustments needed)

Commit 10: Final Documentation

[ ] Update CLAUDE.md with patient examples
[ ] Document admin endpoints in OpenAPI
[ ] Create sequence diagrams:
- Soft delete → anonymization workflow
- Patient return detection
[ ] Scheduler integration guide (production)
[ ] Usage examples in docs/patient_deletion.md

Modified Files:

CLAUDE.md
docs/patient_deletion.md (new)
app/api/v1/endpoints/admin/patients.py (OpenAPI docstrings)

Published Events (Redis Pub/Sub)

Here’s a rundown of the events we’ll be publishing, which help keep other parts of the system informed about changes.

identity.patient.soft_deleted (Commit 4-5)

{
  "patient_id": 123,
  "keycloak_user_id": "uuid",
  "soft_deleted_at": "2025-11-02T10:00:00Z",
  "deletion_reason": "user_request",
  "grace_period_expires_at": "2025-11-09T10:00:00Z"
}

identity.patient.anonymized (Commit 6)

{
  "patient_id": 123,
  "anonymized_at": "2025-11-09T10:00:00Z",
  "correlation_hash": "sha256_hash"
}

identity.patient.returning_user (Commit 7)

{
  "new_patient_id": 456,
  "previous_patient_id": 123,
  "email": "patient@example.com",
  "detected_at": "2025-12-01T14:00:00Z",
  "previous_anonymized_at": "2025-11-09T10:00:00Z"
}

identity.patient.investigation_started (Commit 8)
identity.patient.investigation_ended (Commit 8)
identity.patient.restored (Commit 8)

Success Criteria

To ensure we've nailed this, here are our success criteria:

[ ] Full parity with PR #4: Same features for patients as professionals
[ ] Exhaustive testing: 100% coverage of new features
[ ] Strict TDD: RED → GREEN → REFACTOR cycle respected
[ ] 0 regression: All existing tests pass
[ ] Linting: 0 ruff errors
[ ] Complete documentation: CLAUDE.md, OpenAPI, guides
[ ] Reversible migration: migrate-up and migrate-down functional
[ ] Backward compatibility: NO breaking changes

References

Here are some useful references to guide us:

PR #4: feat/professional-deletion-improvements
Professional Model: app/models/professional.py (lines 129-180)
Professional Service: app/services/professional_service.py
Professional Scheduler: app/services/anonymization_scheduler.py
RFC 9457: Problem Details for HTTP APIs
GDPR: Articles 17 (right to be forgotten), 5.1.e (storage limitation)

Labels

We'll be using these labels to keep things organized:

enhancement (feature improvement)
good first issue (well-structured, guided TDD)
priority: high (critical model parity)
area: patients (concerns Patient entity)
area: rgpd (GDPR compliance)

Estimation

Here's our estimation for the task:

Complexity: Medium (existing structure in PR #4 to reuse)
Estimated time: 12-16 hours (10 commits)
Dependencies: None (independent of PR #4)
Priority: High (critical functional parity)

Let's make this happen, team! By bringing the Patient model up to par with the Professional model, we're not just improving our system—we're reinforcing our commitment to high-quality data management and regulatory compliance.