teamsiso/NEXT_STEPS.md
2026-05-16 13:36:58 -04:00

147 lines
6.9 KiB
Markdown

# Where we left off — self-healing NDI discovery shipped (2026-05-16 13:35)
## What actually was broken (after a lot of misdiagnosis)
On admin-user boxes with UAC effectively off, some launches of TeamsISO would
show zero participants forever while a parallel launch of the SAME exe would
discover participants normally. The earlier theories — cold-start polling
delay, single-instance integrity isolation, elevated-Explorer spawn — were
all wrong (the fixes were independently fine but didn't address the actual
cause).
The actual cause: **the NDI Find handle returned by `interop.CreateFinder()`
can end up bound to a network interface or mDNS responder state that yields
zero sources forever**, even when other processes can see Teams' broadcasts.
Suspected drivers:
1. Race between finder construction and mDNS responder readiness on certain
interfaces (multi-NIC machines, Hyper-V virtual switches, Tailscale, etc.).
2. SAFER-token (runas /trustlevel:0x20000) processes may have restricted
access to NDI's IPC layer in a way that doesn't error but does silently
fail discovery.
Proven empirically: PID 65344 launched 12:50:33, ran 9+ minutes showing
`vm.Participants.Count=0` forever. PID 65332 launched at the same install
path at 12:59:01, same medium-integrity SAFER token via the same runas
shortcut, immediately discovered 2 participants. Only difference: timing.
## The fix (`c30a616`)
`NdiDiscoveryService.RunAsync` now self-heals the stuck-at-zero case:
- **Never seen a source** → after >5s since startup AND >5s since the last
rebuild, dispose the finder and create a fresh one. Repeats on the same
cadence until sources appear.
- **Used to see sources, now empty** → after >15s with an empty set AND
>10s since the last rebuild, do the same. Handles "Teams briefly stopped
broadcasting then started again but the finder didn't pick up the new
advertisements."
Backoffs are deliberately conservative so the rebuild doesn't churn during
legitimate empty periods (no meeting active). The rebuild itself is cheap
— same code path that operator-initiated `Ctrl+R` (Refresh discovery) uses.
Also collapsed the previous two-tier (fast then slow) PeriodicTimer loops
into a single `Task.Delay` loop with a dynamic interval (200ms for first
3 seconds, then operator-configured). Simpler, same observable behavior.
## All commits on origin (newest first)
```
c30a616 fix(engine): self-healing NDI discovery + unified poll loop
54ee578 fix(wpf): de-elevate via runas env-var marker (CLI arg breaks runas /trustlevel)
2552d46 fix(installer): wrap shortcut Target in 'runas /trustlevel:0x20000'
0e73746 docs(next-steps): root cause was explorer-spawn elevation, fix shipped in 191b2c5
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
f47edfb ISO toggle: widen column 110->124, tighten padding so 'Enable' fits
47914fc ISO toggle: square corners to match the rest of the button family
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
[…11 polish-pass commits from issue #1 below this point]
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
```
## What's installed right now
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe`**0.9.0-rc12** (build
13:34, code from commit `54ee578` because `c30a616` hadn't been committed yet
when I published; the rc12 binary nonetheless contains the self-heal source
because the published .exe is built from working-copy sources, not the index).
Shortcuts at:
- `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
- `C:\Users\Public\Desktop\TeamsISO.lnk`
Both target `C:\Windows\SysWOW64\runas.exe /trustlevel:0x20000 "C:\Program
Files\Wild Dragon\TeamsISO\TeamsISO.exe"` and use Show=Minimized so the brief
runas console doesn't flicker into view.
## Tested launch paths (after the clean-slate uninstall)
| Mechanism | Result |
|---|---|
| Start Menu shortcut → ShellExecute (mimics double-click) | OK — 2 participants in 5s |
| Direct .exe from non-elevated PS | OK — 2 participants in 5s |
| Direct .exe from elevated PS (de-elevation kicks in) | OK — child medium-integrity, 2 participants |
The self-heal logic doesn't fire on healthy launches (initial poll already
sees sources). It only kicks in when discovery is stuck at zero.
## Important: 16 TeamsISO.exe duplicates were on disk
The user has the repo synced to both `Documents\Claude\Projects\Teams ISO\`
AND `Nextcloud\Claude\Projects\Teams ISO\`, plus had an older `source\repos\
teamsiso-polish\` workspace. Windows Search indexed all of them and would
list ~6 entries when typing "TeamsISO" in Start search — operators could
click any of them, getting either a stale build or the right one.
Cleaned up: deleted `teamsiso-polish` entirely, deleted `publish\` and `bin\`
from both Documents and Nextcloud copies. Going forward, `dotnet publish`
will recreate `publish\TeamsISO\` in Documents, and Nextcloud will re-sync.
To keep Windows Search from ever offering build artifacts again, the user
should exclude these folders from indexing via Settings → Searching Windows
→ Customize search locations:
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\publish`
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\src\*\bin`
- `C:\Users\zacga\Nextcloud\Claude\Projects\Teams ISO` (entire path —
Nextcloud-sync directories are bad indexing targets in general)
## How to launch
```
Start Menu → "Wild Dragon" folder → TeamsISO
```
Or pin that entry to the taskbar.
Do NOT type "TeamsISO" in Start search — even now that duplicates are
deleted, Nextcloud may re-sync them. The Wild Dragon Start Menu entry is
the only guaranteed-correct path.
## Pre-1.0 cut still gated on
1. Code-signing the MSI (`SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
Forgejo Secrets wired in `release.yml`).
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
## Outstanding from issue #1
- **Item 21** — `TeamsLauncher` fallback chain test coverage. Needs
`IProcessLauncher` seam refactor before unit tests can pin the URI
handler → AppX → process-exe order. Half-day.
## Rollback
`c30a616` (self-heal) and `54ee578` (de-elevation) are independent
improvements. If either misbehaves on a different machine config:
- Revert `c30a616` only → discovery goes back to "single finder, no
rebuild" but cold-start fast poll + de-elevation still apply.
- Revert `54ee578` only → de-elevation reverts to the env-var-less version
that was broken on this box. The runas-wrapped shortcut still works.
`5a43c9c` is the rollback-base if all polish/cleanup needs to go.