Skip to content

Session Capacity & Suspended-Session Handover

Context handover for continuing work on PR #2467 in a fresh session. The PR ships the SignalR active-reviewer-tracking feature plus the rejoin-duplicate fix for user ticket #2446. During post-review discussion a second over-capacity vector was identified (suspended sessions excluded from the capacity tally). This document captures the open design decisions, glossary, and the plan for the next session.

Current state of PR #2467

  • All CI checks green on commit 36b42a243
  • Full E2E suite green (run #24440394945), including the new annotation-capacity.spec.ts regression test for #2446
  • All review-thread comments resolved on the current head
  • All unit + integration tests passing locally

What is already merged into this branch

  • BuildAssignmentCapacityFilter — extended with (InvestigatorId, StageId) natural-key exclusion so a reviewer's refresh can't duplicate their session (the original Sentry finding from the first round of review). In src/libs/project-management/SyRF.ProjectManagement.Mongo.Data/Repositories/StudyRepository.cs.
  • Study.CollapseDuplicateActiveReviewSessions(investigatorId, stageId) — self-healing repair for documents written before the uniqueness guard; keeps the session with annotation progress, drops the rest, relies on the SessionTally computed getter to recompute on save. In src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Study.cs.
  • NotificationHub.JoinStudyReview fallback — wired to the collapse method so legacy duplicates heal the first time a real user's rejoin hits the study.
  • E2E test annotation-capacity.spec.ts — three real browsers on the same study with SessionCountTarget=2, three refreshes, Mongo assertions on ActiveReviewSessions + TotalAllocatedSessionCount.
  • Integration tests in AtomicAssignmentIntegrationTests — cross-stage, suspended rejoin, join/leave/rejoin cycle, same-user concurrency, preexisting-duplicate collapse.
  • Unit tests in StudyActiveReviewSessionTests — domain-level behaviour of CollapseDuplicateActiveReviewSessions including keeper-selection priority.

What is NOT yet done (the open work this document tracks)

Server-side fix: DONE. The !IsSuspended filter has been removed from ExtractionInfo.SessionTallies. Suspended sessions now keep holding their slot in the stored tally for the grace period, so a different reviewer is rejected by the atomic capacity filter (or by the AddOrRefreshActiveReviewSession fallback) rather than silently stealing the slot.

Regression coverage added:

  • SessionTallies_Should_Include_Suspended_Sessions_In_Active_Counts — domain
  • AddOrRefreshActiveReviewSession_Should_Throw_When_Capacity_Held_By_Suspended_Session — domain
  • SessionTallies_Should_Still_Report_Allocation_When_All_Sessions_Suspended — domain
  • AtomicAssignment_ReturnsNull_WhenOtherReviewersSlotIsHeldBySuspendedSession — integration (real Mongo)

The three prior tests that encoded the buggy exclusion were flipped to assert the new (correct) behaviour.

What remains are UX polish items (intentionally deferred to a later session, described below): client-side grace-period countdown (Scenario 1) and form-lock after expiry (Scenario 2). Neither is required for merge safety — the server no longer permits over-annotation regardless of client state.

Glossary of session-accounting terms

Two distinct entities exist in the Study aggregate:

Entity Type Meaning
ActiveReviewSession Study.ActiveReviewSessions: List<ActiveReviewSession> A reviewer is currently on the study page with the review hub open. Created by JoinStudyReview. Removed on clean disconnect, after the suspended grace period, or by idle-timeout.
AnnotationSession Study.ExtractionInfo.Sessions: List<AnnotationSession> A saved annotation submission. Created when the reviewer saves the form. One per reviewer per stage (plus optional reconciliation sessions).

A reviewer transitions from ActiveReviewSession only → ActiveReviewSession plus AnnotationSession when they save. The AnnotationSession persists after the review session is torn down.

ActiveReviewSession state flags

Flag Set when Cleared when
HasStartedAnnotating Reviewer types in any annotation form field (form becomes dirty) Form marked clean (all fields reset)
SuspendedSince (⇒ IsSuspended) OnDisconnectedAsync fires with a non-null exception and no other live connections remain for the user on this study+stage (involuntary disconnect) Reviewer reconnects (JoinStudyReview calls Resume()) OR the 2-hour RemoveSuspendedSessionConsumer fires and removes the session entirely
IdleSince (⇒ IsIdle) MarkSessionIdle scheduled message fires 5 minutes after last form interaction, with HasStartedAnnotating == false Reviewer interacts with form again

SessionTally metrics (computed on ExtractionInfo.SessionTallies getter)

Metric Counts Used for
NumberOfCandidateSessions AnnotationSessions that are NOT reconciliation Base count for capacity, "how many reviewers have saved"
NumberOfCompletedCandidateSessions Candidates with Status == Completed Stage completion progress
ReconciliationStarted / ReconciliationCompleted Presence + completion of reconciliation AnnotationSession Reconciliation flow gating
NumberOfActiveReviewSessions ActiveReviewSessions for the stage ⚠️ currently filters !IsSuspended Presence UI ("who's reviewing right now")
NumberOfDirtyActiveSessions Active review sessions with HasStartedAnnotating == true Surplus-warning dialog (someone is typing)
NumberOfReservedSessions Active review sessions for reviewers who do NOT have a saved AnnotationSession for this stage — "claimed a slot, haven't saved yet" ⚠️ currently filters !IsSuspended Capacity enforcement
NumberOfAnnotatingReservedSessions Reserved sessions where HasStartedAnnotating == true Engagement threshold (warning/lock)
TotalAllocatedSessionCount (derived) NumberOfCandidateSessions + NumberOfReservedSessions Capacity filter compares this against Stage.SessionCountTarget
TotalEngagedSessionCount (derived) NumberOfCandidateSessions + NumberOfAnnotatingReservedSessions Engagement threshold (UI warnings)

Answers to the specific glossary questions

"What does an annotating reserved session mean? Is that somebody who is interacting with the form? Is it someone who has already created and saved a session, or is it someone who's just opened the form?"

An annotating reserved session is a reviewer who:

  1. Has the review page open (i.e. has an ActiveReviewSession),
  2. Has NOT yet saved an AnnotationSession (so they're "reserved", not a "candidate"),
  3. Has typed into at least one annotation form field (HasStartedAnnotating == true).

They're the people who've started filling in the form but haven't hit Save.

By contrast:

  • Just opened the form, hasn't typed anything → reserved (counts for capacity) but not annotating-reserved (doesn't count for engagement).
  • Saved their annotations → they've "graduated" from reserved to candidate (the saved AnnotationSession means they no longer count as reserved at all).

The open bug — suspended sessions and capacity

The SessionTallies computed getter filters active sessions with .Where(ars => !ars.IsSuspended) before grouping. This means when a reviewer's session is suspended (on involuntary disconnect), the tally stops counting them and the stored TotalAllocatedSessionCount drops. A subsequent join from a different reviewer passes the capacity filter and is granted a session.

The two scenarios (reviewer returns)

Scenario 1 — returns DURING the grace window (<2h)

  • Suspended session still exists in ActiveReviewSessions (only the tally exclusion applies).
  • JoinStudyReview → atomic filter excludes via the (InvestigatorId, StageId) guard → returns null → fallback finds the suspended session → Resume() → saves.
  • Reviewer is back in with full submission capability. They see no warning.
  • Study now has [A (resumed), B (original), C (took A's apparent vacancy)] = 3 sessions on a capacity-2 stage. Silent over-annotation.

Scenario 2 — returns AFTER the grace window (≥2h)

  • RemoveSuspendedSessionConsumer has already removed the session, creating an ExpiredReviewSession audit record.
  • JoinStudyReview → atomic filter sees capacity full (stored tally now reflects B+C as 2) and reviewer has no existing session → returns null.
  • Fallback finds no session → throws InvalidOperationException("Study X has reached the target annotation capacity for stage Y").
  • Client surfaces this as the surplus notification / surplus-lock-overlay (already tested by session-lifecycle.spec.ts:99).

Remove .Where(ars => !ars.IsSuspended) from the SessionTallies computed getter in src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs (there are two call sites inside the getter — both activeByStage at the top of the method and activeByStage[stageId] in talliesFromActiveOnly — confirm both are updated).

Rationale: suspended sessions hold their slot for the duration of the grace period, which is what the grace period is for. The cleanup consumer frees the slot after 2h if the reviewer never returns. With this fix:

  • Scenario 1 becomes like Scenario 2: reviewer C is rejected up front rather than silently stealing A's slot.
  • No server-side behaviour change for reviewers who were never suspended.

Side-effects to verify:

  • The filter must be removed from all four downstream counters that currently derive from activeByStage: NumberOfActiveReviewSessions, NumberOfDirtyActiveSessions, NumberOfReservedSessions, NumberOfAnnotatingReservedSessions. Semantically this is correct: a suspended reviewer is still reserving a slot, and if they had typed before the disconnect they're still annotating-reserved.
  • Existing tests that assumed suspended sessions were excluded will need updating — these are in ActiveReviewSessionTests, the ExtractionInfo tests, and possibly the distribution algorithm tests. Each deserves an audit to confirm the test was encoding a design intent vs the current bug.

For Scenario 1 — grace-period countdown

The reviewer has lost connection but still holds their slot. They don't know their session is pending expiry. The user has requested:

  • On disconnect, start a client-side countdown timer.
  • During the last 10 minutes of the grace window, show the reviewer a prominent warning with a live countdown: "Your session will expire in MM:SS if you don't reconnect".
  • Cancel the timer immediately on reconnect (SignalR up, JoinStudyReview successful).
  • If the 2-hour window elapses without reconnect, transition to Scenario 2 UX.

Implementation notes (for the next session):

  • The countdown lives in the Angular client, not the server — it's driven by SuspendedSince and ReviewSessionConstants.SuspendedSessionGracePeriod (2h). Expose the grace period to the client via appConfig or a SignalR payload on suspension.
  • The presence-store already tracks sessionRemoved, shouldLockForm, etc. Add a sessionSuspendedUntil: Date | null signal and derive sessionExpiringInMinutes from it.
  • Consider what happens if multiple tabs exist — the countdown should only show on tabs that were active when the suspension occurred.

For Scenario 2 — session expired, prevent new submissions

The reviewer returns after the grace period. The session is gone. The user has requested:

  • UI shows "Your session has expired" with explanation.
  • Form is read-only — the reviewer cannot fill in any field.
  • Save button is disabled.
  • Exception: if the reviewer already has a saved AnnotationSession (they'd previously submitted), they can view but not edit — or perhaps re-open as an edit of their existing candidate session. This is an open product question to resolve next session.
  • Provide a clear action for the reviewer to leave the page or navigate to another study they still have a slot on.

Implementation notes:

  • The server already throws the capacity error. The client needs to turn that into a first-class state, not just a transient toast.
  • StudyPresenceStore should add a sessionExpired state and a formLocked derived signal that the review page consumes to set form controls to disabled + read-only.
  • The existing surplus-lock-overlay is the closest precedent — adapt or reuse rather than build a parallel component.

Testing strategy for the above

Extensive tests are critical — false-positive expiry warnings would be very visible to users. At minimum:

Unit (Angular)

  • Countdown component: starts at the correct time, updates every second, shows/hides at the correct thresholds, cancels on reconnect event.
  • Presence store: sessionSuspendedUntil flips on suspension StudyPresenceUpdated snapshot, flips off on sessionRemoved or joined.
  • Form lock component: read-only when sessionExpired, writable otherwise; interaction with save-button disabled state.

Unit (C#)

  • Tally computation with suspended sessions included — verify the four downstream counters all include suspended in their count. Migrate existing tests that encoded the exclusion to the new behaviour with a migration comment explaining why.
  • ExpiredReviewSession audit record is still created exactly once per actual expiry (no double-audit when a reviewer reconnects then disconnects again after grace).

Integration (C# + Mongo)

  • Reviewer A joins, disconnects involuntarily → suspended → stored tally reflects their slot still held (regression test for the bug).
  • Reviewer C attempts to join while A is suspended → atomic returns null, fallback throws capacity error. (Mirror of the #2446 E2E at the integration level.)
  • 2-hour consumer fires after A suspends, A has not returned → session removed, expired audit created, stored tally now allows C.
  • A reconnects at T+1h59m → suspended→resumed, tally reflects A holds slot, C's subsequent attempt is rejected.

E2E (Playwright)

  • Mirror of annotation-capacity.spec.ts but with a mid-test disconnect: A closes tab uncleanly, B remains, C tries to join → C sees surplus.
  • UI countdown test: simulate suspension (requires a test hook to set SuspendedSince or trigger the server path), assert the last-10-min countdown renders with the expected time and decrements.
  • False-positive guard: reviewer with a healthy connection never sees the countdown or the expired-session UI, across long sessions, navigations, form interactions.

Acceptance criteria for "no false positives"

  • A healthy reviewer who annotates for 30+ minutes never sees any suspension/expiry UI.
  • A reviewer who briefly reconnects (wifi blip <10s) sees no UI change.
  • A reviewer who reconnects just after the countdown starts sees the countdown disappear promptly (≤1s).
  • A reviewer who is still disconnected when the countdown ends sees the Scenario 2 UX and the form is locked.

Open product decisions for the next session

  1. When a reviewer returns after grace with an existing saved AnnotationSession, can they edit it? Currently StudyPresenceStore and review page would route them through the normal edit flow if they click back into the study. Is that correct, or should it be locked?
  2. How should the countdown be delivered to the client? Options:
  3. Push via SignalR on suspension (suspension event carries suspendedUntil)
  4. Query endpoint (client polls /api/study/{id}/presence on reconnect)
  5. Derived from appConfig grace period + a SuspendedSince value pushed by the presence snapshot The first is most responsive but adds a SignalR event; the third requires no new server surface but couples the client to the constant.
  6. Should suspension affect the current-reviewers-sidebar UI (should suspended reviewers appear as "away" avatars)? This is adjacent but touches the same plumbing.

Files to look at first in the next session

  • src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ExtractionInfo.cs — the actual bug, two lines of .Where(ars => !ars.IsSuspended) to remove.
  • src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Study.cs — review AddOrRefreshActiveReviewSession once the tally is fixed; its capacity check becomes correct automatically.
  • src/services/api/SyRF.API.Endpoint/SignalR/NotificationHub.cs — the join/disconnect flows; likely the place to add the suspension push.
  • src/services/web/src/app/study-presence/study-presence.store.ts and signal-r.service.ts — where the countdown state will live.
  • src/services/web/src/app/stage/stage-review/stage-review.component.ts — the form that needs the read-only gate for Scenario 2.
  • src/libs/project-management/SyRF.ProjectManagement.Core.Tests/Repositories/AtomicAssignmentIntegrationTests.cs — model for the new integration tests.
  • e2e/tests/session-lifecycle.spec.ts and e2e/tests/annotation-capacity.spec.ts — models for the new E2E tests.

Sentry comment disposition

The original Sentry comment on this PR — flagging AddOrRefreshActiveReviewSession capacity check as incorrect because the tally excludes suspended sessions — was initially dismissed as out of scope. On re-review that was wrong: the !IsSuspended filter and the entire suspended-session concept were introduced by this PR, so the bug belonged here. The server-side fix has now been applied on this branch — the filter is gone, and the regression tests above prove a suspended session continues to hold its slot for the 2-hour grace window.