Skip to content

Active Reviewer Tracking — Implementation Overhaul Plan

This document captures everything needed to continue the PR #2467 work in a new session. It covers the current state, known issues, design decisions made, open questions, and the user's feedback on what needs to change.


1. What This Feature Does

When multiple reviewers annotate studies concurrently, SyRF tracks who is actively reviewing which study in real time using SignalR. This prevents wasted effort when reviewers work on studies that have already reached their annotation target. The feature provides:

  • Slot reservation — each reviewer occupies a slot on a study; the slot counts toward the annotation target
  • Idle detection — if a reviewer doesn't interact with the form, their session is eventually marked idle and ultimately removed, freeing the slot for others
  • Surplus notification — when a reviewer's slot is freed and another reviewer fills it, the original reviewer is told the study now has enough reviewers
  • Form locking — administrators can enable enforcement so the annotation form is locked for surplus reviewers

2. Current Architecture

Where state lives

Location What's stored Why it's there
Study document (pmStudy collection) — ActiveReviewSessions array InvestigatorId, StageId, HasStartedAnnotating, IdleSince (DateTime?), SuspendedSince (DateTime?) Domain truth. Business logic (capacity, surplus, slot counting) operates on this.
ReviewSessionConnection collection (separate) ConnectionId (SignalR), StudyId, StageId, InvestigatorId, LastHeartbeatUtc, ConnectedAtUtc Avoids write amplification — heartbeats (every 30s) only touch this lightweight collection, not the Study document.
Frontend StudyPresenceStore (ngrx signal store, providedIn: 'root') Transient in-memory state from SignalR pushes. Holds studies map (studyId → sessions array), activeStudyId, activeStageId. Computed signals derive isIdle, isReviewerSurplus, shouldLockForm, currentReviewerIdleSince, idleSessionTimeoutMinutes. Not persisted. Receives StudyReviewPresenceSnapshot pushes over SignalR.

Data flow

SignalR hub (API service)
  ↓ StudyPresenceUpdated push (triggered by MongoDB change stream on Study save)
SignalRService.studyPresenceUpdated$ (Subject)
  ↓ APP_INITIALIZER subscription (study-presence.init.ts)
StudyPresenceStore.presenceUpdated() → patchState()
  ↓ Computed signals react
Components read signals (idle banner, surplus banner, lock overlay, admin table)

The scheduled message chain (updated — cancel/reschedule pattern)

JoinStudyReview (API hub)
  ├── Creates ActiveReviewSession on Study (HasStartedAnnotating=false, IdleSince=null)
  ├── Creates ReviewSessionConnection
  ├── Schedules MarkSessionIdle at T+5min → stores IdleScheduleTokenId on session
  └── Schedules CheckConnectionLiveness at T+2min

MarkSessionIdleConsumer (PM service, fires at T+5min IF not cancelled)
  ├── IF already idle → no-op (duplicate message)
  └── ELSE → session.MarkIdle() sets IdleSince=now, saves study
      └── Schedules RemoveIdleSession at T+5min+IdleSessionTimeoutMinutes

RemoveIdleSessionConsumer (PM service, fires at T+5min+N)
  ├── IF IsIdle == false → no-op (reviewer resumed)
  └── ELSE → removes session, creates ExpiredReviewSession audit record
      └── Client detects sessionRemoved → shows expired banner or surplus banner

StartedAnnotating (API hub, called when form gets dirty)
  ├── session.MarkDirty() → HasStartedAnnotating=true, IdleSince=null
  └── CancelScheduledPublish(IdleScheduleTokenId) → cancels pending MarkSessionIdle

StoppedAnnotating (API hub, called when form returns to clean)
  ├── session.MarkClean() → HasStartedAnnotating=false
  └── Schedules NEW MarkSessionIdle at T+5min → stores new IdleScheduleTokenId

Scheduling infrastructure

  • MassTransit IMessageScheduler.SchedulePublish → publishes to MassTransit.Scheduling:ScheduleMessage exchange
  • Quartz service consumes scheduled messages, stores in SQL Server, publishes the actual command at the scheduled time
  • Requires the rabbitmq_delayed_message_exchange plugin (v3.13.0) — installed on the cluster RabbitMQ in this session (cluster-gitops commit 0835889)
  • Note: The Quartz scheduler transport is the actual mechanism, not the delayed message plugin directly. The plugin was installed but may not be strictly necessary — the scheduling was working via Quartz before, the issue was that the Quartz pod had lost its RabbitMQ connection after a pod restart.

Key constants

Constant Value Location Configurable?
MarkIdleDelay 5 minutes ReviewSessionConstants.cs No (hardcoded)
HeartbeatLivenessWindow 2 minutes ReviewSessionConstants.cs No (hardcoded)
SuspendedSessionGracePeriod 2 hours ReviewSessionConstants.cs No (hardcoded)
IdleSessionTimeoutMinutes null (disabled) or positive int Stage.IdleSessionTimeoutMinutes Yes (per-stage, via review-settings admin UI)

3. Known Issues (Must Fix)

3.1 When the idle countdown reaches zero, nothing happens

The IdleNotificationBannerComponent shows a countdown computed from idleSince + idleSessionTimeoutMinutes. When it reaches zero, the banner just shows 0:00 and sits there. The server-side RemoveIdleSessionConsumer fires and removes the session, which triggers a presence snapshot push that should make the reviewer surplus — but the surplus detection (selectIsReviewerSurplus) may not fire because the reviewer's session disappearing from the snapshot means there's no presence entry to compare against.

Root cause to investigate: When the session is removed server-side, does the presence snapshot push arrive at the client? Does the surplus detection logic handle the case where the reviewer's session disappears (they're no longer in the sessions list AND the study is at capacity with other reviewers)?

3.2 Terminology: "paused" is misleading

The user flags that "paused" doesn't accurately describe what's happening. The session isn't paused — the reviewer's slot is still reserved, they can still submit work, but if they don't interact, the slot will eventually be freed. "Paused" implies the system did something to stop them, which isn't the case.

Need to decide: What's the right domain term? Options discussed: - "Idle" — technically accurate but cold/system-y - "Inactive" — similar - "Waiting" — implies the system is waiting for them - Something domain-specific that explains the consequence rather than the state

The user wants the terminology to be meaningful and correct — it should describe what's actually happening from the reviewer's perspective.

3.3 Imperative code in stage-review component

The stage-review.component.ts ngOnInit has two imperative subscribe() blocks with manual state tracking (_lastPresence, _destroy$). The user asked to avoid effect() where possible and instead rely on SignalR events and declarative data routing.

Current imperative code: 1. _lastPresence tracking with manual join/leave in subscribe + ngOnDestroy cleanup 2. annotationFormDirtyChange$ subscription calling startedAnnotating/stoppedAnnotating 3. setInterval tick timer in the idle banner component

3.4 The MarkIdleDelay timer — is it necessary?

The 5-minute MarkIdleDelay exists as a grace period for reading the study before annotating. But: - It only fires when HasStartedAnnotating == false (form never dirtied) - It adds a scheduled message + consumer + invisible wait - Once the form is dirtied, it never fires again (unless form returns to clean via StoppedAnnotating)

User's suggestion: Instead of checking HasStartedAnnotating when the timer fires, why not cancel/reschedule the timer at the appropriate events? If the form is dirtied → cancel the timer. If the form returns to clean → reschedule. If the timer fires → we know for certain the form was never touched. The current approach of scheduling-then-checking is more complex than cancel-and-reschedule.

Simplification opportunity: Remove MarkIdleDelay entirely. Schedule RemoveIdleSession directly from join at T + IdleSessionTimeoutMinutes. The reviewer sees the countdown immediately. Form interaction cancels the removal. One scheduled message, one consumer.

Counter-argument: A reviewer legitimately reading a study for 10 minutes before annotating would see the countdown the entire time. Is that acceptable? The user hasn't explicitly rejected this — they asked "is there any reason to have a countdown to the countdown?" and agreed to remove the pre-idle phase. But removing MarkIdleDelay goes further: it means the reviewer is immediately in the "removal countdown" from the moment they join.

3.5 The scheduler observability is poor

All scheduling logs were at LogDebug — fixed in commit 18e5b29bf (raised to LogInformation). But there's still no way to: - See what scheduled messages are pending in Quartz - See the current state of ActiveReviewSessions on a study without querying MongoDB - See ReviewSessionConnection records without querying MongoDB - Know whether the presence snapshot push actually arrived at the client


4. Design Decisions Already Made

Decision Rationale
Signal store, not global ngrx store Presence is transient real-time state, not normalised entity data. Signal store aligns with project direction.
Not in entities slice Not fetched via API request/response. Arrives via SignalR push. Not normalised byId/allIds.
providedIn: 'root' SignalR pushes arrive regardless of route. The store must be available app-wide.
Server-authoritative idle detection No client-side prediction. The client reacts to isIdle: true from the server. No pre-idle phase.
Countdown from idleSince + idleSessionTimeoutMinutes Both values from the server. Client ticks locally with setInterval between pushes.
ActiveReviewSession is Entity<Guid>, not ValueObject Has identity semantics and mutable lifecycle state. Fixed early after code review comment.
Separate ReviewSessionConnection collection Avoids write amplification from heartbeats on the Study document.
Atomic slot assignment via FindOneAndUpdate Eliminates lost-update race condition from concurrent joins.
$ifNull wrappers in MongoDB pipeline Handles legacy documents missing ActiveReviewSessions or SessionTallies fields.
Delayed message plugin installed rabbitmq_delayed_message_exchange v3.13.0 installed on cluster RabbitMQ (commit 0835889 in cluster-gitops). May not be strictly needed since Quartz handles scheduling.

5. User Feedback to Address

From this session (verbatim themes):

  1. "Why is study presence a root slice of the global store rather than an entity?" — Answered: it's now a signal store. Not entity data because it's transient real-time state from SignalR push.

  2. "Should it use the resource primitive?" — No. resource() is for request/response. Presence arrives via server push.

  3. "The implementation has lots of imperative code" — User wants declarative approaches, avoid effect() where possible, rely on SignalR events.

  4. "Can the implementation be simplified?" — User suspects the 2-timer chain (MarkIdleDelay + IdleSessionTimeoutMinutes) is overengineered. Wants to understand if a simpler approach loses anything.

  5. "Why not cancel/reschedule the timer instead of checking HasStartedAnnotating?" — Valid point. The current approach fires the timer regardless and checks the flag. Cancel-on-dirty, reschedule-on-clean would be cleaner.

  6. "Is 'paused' the right terminology?" — No. The user wants meaningful, domain-correct language. "Paused" implies the system stopped something.

  7. "When the countdown reaches zero, nothing happens" — Bug. Need to handle the transition from idle → removed → surplus on the client.

  8. "I need observability" — Fixed log levels. But still need: diagnostic endpoint for session state, visibility into Quartz scheduled jobs, admin UI showing presence state.

  9. "How can I debug this? It's all opaque." — Addressed partially with log level changes. The user tested in the preview environment and confirmed the idle banner shows but the zero-countdown transition is broken.


6. Files Changed on This Branch

Key files (in the worktree at .worktrees/pr2467.signalr-active-reviewer-tracking):

Frontend (signal store + components): - src/services/web/src/app/study-presence/study-presence.store.ts — the signal store - src/services/web/src/app/study-presence/study-presence.store.spec.ts — 27 tests - src/services/web/src/app/study-presence/study-presence.init.ts — APP_INITIALIZER wiring SignalR → store - src/services/web/src/app/shared/idle-notification-banner/idle-notification-banner.component.ts — idle banner with countdown - src/services/web/src/app/shared/idle-notification-banner/idle-notification-banner.component.scss — styles (needs progress bar styles) - src/services/web/src/app/shared/idle-notification-banner/_idle-notification-banner.component-theme.scss — theme mixin - src/services/web/src/app/shared/surplus-warning-banner/surplus-warning-banner.component.ts — surplus banner - src/services/web/src/app/shared/dialogs/surplus-notification-dialog/surplus-notification-dialog.component.ts — surplus dialog - src/services/web/src/app/stage/stage-review/stage-review.component.ts — presence wiring (join/leave/dirty/clean) - src/services/web/src/app/stage/stage-review/stage-review.component.html — banner placement - src/services/web/src/app/stage/stage-overview/active-sessions-table/active-sessions-table.component.ts — admin sessions table - src/services/web/src/app/core/services/signal-r/signal-r.service.ts — SignalR hub methods + reconnection

Backend (domain model): - src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/ActiveReviewSession.cs — entity - src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/Study.cs — domain methods (AddOrRefresh, Remove, MarkDirty/Clean, Suspend, GetSession) - src/libs/project-management/SyRF.ProjectManagement.Core/Model/StudyAggregate/SessionTally.cs — computed properties - src/libs/project-management/SyRF.ProjectManagement.Core/Model/ReviewSessionConstants.cs — timeout constants - src/libs/project-management/SyRF.ProjectManagement.Core/Model/ReviewSessionConnection.cs — connection entity - src/libs/project-management/SyRF.ProjectManagement.Core/Model/ProjectAggregate/StageEntity/Stage.cs — IdleSessionTimeoutMinutes, EnforceAnnotationTarget

Backend (hub + consumers): - src/services/api/SyRF.API.Endpoint/SignalR/NotificationHub.cs — SignalR hub methods - src/services/api/SyRF.API.Endpoint/SignalR/Dtos/StudyReviewPresenceSnapshot.cs — DTO with IdleSince - src/services/project-management/SyRF.ProjectManagement.Endpoint/Consumers/MarkSessionIdleConsumer.cs - src/services/project-management/SyRF.ProjectManagement.Endpoint/Consumers/RemoveIdleSessionConsumer.cs - src/services/project-management/SyRF.ProjectManagement.Endpoint/Consumers/RemoveSuspendedSessionConsumer.cs - src/services/project-management/SyRF.ProjectManagement.Endpoint/Consumers/CheckConnectionLivenessConsumer.cs

Backend (repository + pipeline): - src/libs/project-management/SyRF.ProjectManagement.Mongo.Data/Repositories/StudyRepository.cs — TryAtomicAssignStudyAsync, BuildAssignmentPipeline with $ifNull - src/libs/project-management/SyRF.ProjectManagement.Mongo.Data/Repositories/ReviewSessionConnectionRepository.cs

Prototypes + docs: - docs/prototypes/idle-presence-ux/index.html — interactive prototype (Active → Paused → Surplus) - docs/prototypes/idle-presence-ux/workflow.html — workflow diagram showing the full lifecycle - docs/features/signalr-active-reviewer-tracking.md — feature brief


7. What Needs to Happen Next

✅ Priority 1: Fix the zero-countdown bug (DONE — commit 4d6f26ec5)

Root cause: When RemoveIdleSessionConsumer removed the session, the client got a snapshot without the reviewer. isCurrentReviewerIdle → false (banner hides), but isReviewerSurplus only triggers at capacity. If nobody else took the slot, the reviewer saw nothing.

Fix: Added sessionRemoved state to StudyPresenceStore. When the reviewer disappears from a snapshot, sessionRemoved = true. New isSessionExpired computed = sessionRemoved && !isReviewerSurplus. Idle banner shows expired state: "Your reserved place on this study has expired. You can still start reviewing, or skip to your next study."

Also made Study.MarkReviewSessionDirty/Clean return bool instead of throwing when no session exists, and added re-join logic in stage-review when annotating after session expiry.

✅ Priority 2: Rethink the terminology (DONE — commit 4d6f26ec5)

Replaced "paused" with "inactive" throughout. Updated surplus banner/dialog to use prototype wording: "This study now has enough reviewers" / "While your session was inactive, other reviewers picked up this study..."

✅ Priority 3: Simplify the timer chain (DONE — this commit)

Implemented cancel/reschedule pattern: - Added IdleScheduleTokenId to ActiveReviewSession (stores MassTransit scheduled message token) - JoinStudyReview: captures token from SchedulePublish, stores on session - StartedAnnotating: cancels pending MarkSessionIdle via CancelScheduledPublish<T>(tokenId) - StoppedAnnotating: schedules new message, stores new token - MarkSessionIdleConsumer: removed HasStartedAnnotating guard (if the message fires, it wasn't cancelled)

✅ Priority 4: Remove imperative code (DONE — this commit)

  • Replaced _lastPresence + subscribe() manual tracking with ComponentStore.effect() using pairwise() for study transitions
  • Replaced annotationFormDirtyChange$ subscription with ComponentStore.effect()
  • setInterval in idle banner kept (clock tick is inherently imperative, wrapped cleanly with effect() start/stop)

Priority 5: Improve observability

  • Add diagnostic endpoint(s) for session state
  • Consider admin panel showing current sessions, scheduled jobs, connection state
  • Frontend: ensure Redux DevTools shows signal store state clearly

8. PR Status

  • Branch: feat/signalr-active-reviewer-tracking (worktree at .worktrees/pr2467.signalr-active-reviewer-tracking)
  • PR checks: All passing (50+ checks including SonarCloud, all test suites, E2E, Docker builds)
  • Review comments: All 39 threads resolved
  • Preview: Deployed at https://pr-2467.syrf.org.uk
  • Tests: 1340 Angular tests, 73+ .NET domain tests, 24 pipeline integration tests — all passing
  • RabbitMQ: Delayed message plugin v3.13.0 installed on cluster (cluster-gitops commit 0835889)

9. Debugging Commands

# API logs — join, scheduling, presence
kubectl logs -f -n pr-2467 $(kubectl get pods -n pr-2467 -o name | grep syrf-api) | grep -i "join\|schedule\|idle\|presence\|annotating"

# PM logs — consumer execution
kubectl logs -f -n pr-2467 $(kubectl get pods -n pr-2467 -o name | grep projectmanagement) | grep -i "MarkSession\|RemoveIdle\|RemoveSuspended\|CheckConnection\|idle"

# Quartz logs — scheduled job firing
kubectl logs -f -n pr-2467 $(kubectl get pods -n pr-2467 -o name | grep quartz) | grep -i "trigger\|fire\|schedul"

# RabbitMQ — check exchanges in pr-2467 vhost
kubectl exec -n rabbitmq rabbitmq-0 -- rabbitmqctl list_exchanges name type -p pr-2467

# RabbitMQ — check queues
kubectl exec -n rabbitmq rabbitmq-0 -- rabbitmqctl list_queues name messages consumers -p pr-2467

# RabbitMQ Management UI
# https://rabbitmq.camarades.net (user: rabbit, password from rabbit-mq secret)

# MongoDB — check ActiveReviewSessions on a study (use CSUUID format)
# Via MCP: mcp__mongodb-syrf-preview__find with collection pmStudy, filter by _id