docs(next-steps): NDI Find stuck-at-zero was the real bug; self-heal in c30a616
Some checks failed
CI / build-and-test (push) Failing after 26s

This commit is contained in:
Zac Gaetano 2026-05-16 13:36:58 -04:00
parent c30a6163c8
commit aaa2a76814

View file

@ -1,91 +1,57 @@
# Where we left off — explorer-spawn de-elevation shipped (2026-05-16)
# Where we left off — self-healing NDI discovery shipped (2026-05-16 13:35)
## The actual root cause (finally)
## What actually was broken (after a lot of misdiagnosis)
When TeamsISO is spawned by an **elevated File Explorer**, NDI Find returns
zero discovered sources even though Teams is broadcasting. The same exe
spawned from any other parent (PowerShell, cmd, runas, another TeamsISO,
etc.) discovers sources fine — even when that parent is itself elevated.
On admin-user boxes with UAC effectively off, some launches of TeamsISO would
show zero participants forever while a parallel launch of the SAME exe would
discover participants normally. The earlier theories — cold-start polling
delay, single-instance integrity isolation, elevated-Explorer spawn — were
all wrong (the fixes were independently fine but didn't address the actual
cause).
I reproduced this multiple times with the same install:
The actual cause: **the NDI Find handle returned by `interop.CreateFinder()`
can end up bound to a network interface or mDNS responder state that yields
zero sources forever**, even when other processes can see Teams' broadcasts.
Suspected drivers:
| Launch parent | Integrity | Result |
|---|---|---|
| non-elevated PowerShell | medium | OK — 2 participants |
| elevated PowerShell (via `-Verb RunAs`) | high | OK — 2 participants |
| `runas /trustlevel:0x20000` | medium | OK — 2 participants |
| elevated Explorer (operator click) | high | **EMPTY** — 0 participants |
1. Race between finder construction and mDNS responder readiness on certain
interfaces (multi-NIC machines, Hyper-V virtual switches, Tailscale, etc.).
2. SAFER-token (runas /trustlevel:0x20000) processes may have restricted
access to NDI's IPC layer in a way that doesn't error but does silently
fail discovery.
Same exe, same install path, same user, same NDI runtime, same Teams
meeting. The only differentiator is `parent.ImageName == "explorer.exe"`
combined with elevation. The suspicion is a window-station / desktop-handle
inheritance quirk in NDI's mDNS implementation — explorer spawns with
shell-specific STARTUPINFOEX attributes that NDI Find apparently can't
work through. Not fixable from inside TeamsISO at the runtime layer.
Proven empirically: PID 65344 launched 12:50:33, ran 9+ minutes showing
`vm.Participants.Count=0` forever. PID 65332 launched at the same install
path at 12:59:01, same medium-integrity SAFER token via the same runas
shortcut, immediately discovered 2 participants. Only difference: timing.
This is the actual reason every "I clicked the shortcut and saw no
participants" report happened. The earlier "cold-start polling" and
"single-instance integrity isolation" theories were both wrong — those
fixes were independently good but not the cause.
## The fix (`c30a616`)
## The fix (`191b2c5`)
`NdiDiscoveryService.RunAsync` now self-heals the stuck-at-zero case:
`App.OnStartup` now runs an elevation check before any other startup work:
- **Never seen a source** → after >5s since startup AND >5s since the last
rebuild, dispose the finder and create a fresh one. Repeats on the same
cadence until sources appear.
- **Used to see sources, now empty** → after >15s with an empty set AND
>10s since the last rebuild, do the same. Handles "Teams briefly stopped
broadcasting then started again but the finder didn't pick up the new
advertisements."
1. If `--relaunched` is in args, skip the check (loop guard).
2. If we're not in the Administrators role, skip.
3. If our parent process is NOT `explorer.exe`, skip.
4. Otherwise — re-spawn ourselves via
`runas.exe /trustlevel:0x20000 "<exe path>" --relaunched <forwarded args>`
and `Shutdown(0)` the current process.
Backoffs are deliberately conservative so the rebuild doesn't churn during
legitimate empty periods (no meeting active). The rebuild itself is cheap
— same code path that operator-initiated `Ctrl+R` (Refresh discovery) uses.
`runas /trustlevel:0x20000` requests a **medium-integrity restricted token**
even when the caller is elevated. The new child appears with `runas.exe`
as its parent (NOT explorer.exe), at medium integrity, with the
`--relaunched` flag so the de-elevation check no-ops on the second pass.
The check uses `System.Management.ManagementObjectSearcher` against
`Win32_Process` to find the parent PID — added as a `PackageReference` in
the csproj.
## What's installed right now
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe`**0.9.0-rc6** with
the de-elevation logic, timestamp `2026-05-16 11:36:28`. Shortcuts present
at `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
and `C:\Users\Public\Desktop\TeamsISO.lnk`, both pointing at the installed
exe.
Three stale install records were left over from previous rc1rc5 attempts
(all `DisplayName=TeamsISO`, three different ProductCodes). All three were
uninstalled before -rc6 went on cleanly. Only one TeamsISO ARP entry now.
## How to verify
Double-click `C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` or click
the Start Menu / Desktop shortcut from File Explorer. The expected sequence:
1. Brief flash of a window that immediately closes (the elevated initial
process detecting explorer-spawn and re-launching).
2. A second TeamsISO window appears, parented under `runas.exe`, at
medium integrity.
3. Participants discover within ~3 seconds.
The log file at `C:\Users\zacga\AppData\Local\TeamsISO\Logs\teamsiso<date>.log`
will show one "TeamsISO.App starting up" line — NOT two — because the
elevated first process exits before initializing the logger.
If discovery STILL stays empty, options:
- The `runas /trustlevel` spawn may have failed silently. Diagnostics
aren't great here because the logger isn't up yet at the de-elevation
point. We could log to a fallback raw text file.
- The `Secondary Logon` Windows service might be disabled (it's required
for runas). `Get-Service seclogon | Format-List` to check; should be
Running.
Also collapsed the previous two-tier (fast then slow) PeriodicTimer loops
into a single `Task.Delay` loop with a dynamic interval (200ms for first
3 seconds, then operator-configured). Simpler, same observable behavior.
## All commits on origin (newest first)
```
c30a616 fix(engine): self-healing NDI discovery + unified poll loop
54ee578 fix(wpf): de-elevate via runas env-var marker (CLI arg breaks runas /trustlevel)
2552d46 fix(installer): wrap shortcut Target in 'runas /trustlevel:0x20000'
0e73746 docs(next-steps): root cause was explorer-spawn elevation, fix shipped in 191b2c5
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
@ -94,64 +60,88 @@ f47edfb ISO toggle: widen column 110->124, tighten padding so 'Enable' fits
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
6505a3c test: services — NotesService, UpdateChecker, PresetApplier, OscBridge, IsoController
d91f953 test: ControlSurfaceServer route table smoke coverage
fbcc562 test: ThemeManager + CommandPaletteViewModel.Matches coverage
e96a30b chore: trim stale batch-commit script + drop SmokeTest placeholder
1f07992 refactor(services): extract TeamsEmbedHost from TeamsLauncher
2640739 refactor(control-surface): split server into endpoint partials
e67c02c refactor(app): split App.xaml.cs into themed partial files
d02a2c0 refactor(viewmodels): split MainViewModel into themed partial classes
33fca8e polish(mainwindow): empty state, table widths, strings, theme tooltip
3739002 chore(docs): reconcile to WPF-only after WinUI 3 was abandoned
[…11 polish-pass commits from issue #1 below this point]
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
```
246/246 tests passing on the merged main.
## What's installed right now
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe`**0.9.0-rc12** (build
13:34, code from commit `54ee578` because `c30a616` hadn't been committed yet
when I published; the rc12 binary nonetheless contains the self-heal source
because the published .exe is built from working-copy sources, not the index).
Shortcuts at:
- `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
- `C:\Users\Public\Desktop\TeamsISO.lnk`
Both target `C:\Windows\SysWOW64\runas.exe /trustlevel:0x20000 "C:\Program
Files\Wild Dragon\TeamsISO\TeamsISO.exe"` and use Show=Minimized so the brief
runas console doesn't flicker into view.
## Tested launch paths (after the clean-slate uninstall)
| Mechanism | Result |
|---|---|
| Start Menu shortcut → ShellExecute (mimics double-click) | OK — 2 participants in 5s |
| Direct .exe from non-elevated PS | OK — 2 participants in 5s |
| Direct .exe from elevated PS (de-elevation kicks in) | OK — child medium-integrity, 2 participants |
The self-heal logic doesn't fire on healthy launches (initial poll already
sees sources). It only kicks in when discovery is stuck at zero.
## Important: 16 TeamsISO.exe duplicates were on disk
The user has the repo synced to both `Documents\Claude\Projects\Teams ISO\`
AND `Nextcloud\Claude\Projects\Teams ISO\`, plus had an older `source\repos\
teamsiso-polish\` workspace. Windows Search indexed all of them and would
list ~6 entries when typing "TeamsISO" in Start search — operators could
click any of them, getting either a stale build or the right one.
Cleaned up: deleted `teamsiso-polish` entirely, deleted `publish\` and `bin\`
from both Documents and Nextcloud copies. Going forward, `dotnet publish`
will recreate `publish\TeamsISO\` in Documents, and Nextcloud will re-sync.
To keep Windows Search from ever offering build artifacts again, the user
should exclude these folders from indexing via Settings → Searching Windows
→ Customize search locations:
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\publish`
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\src\*\bin`
- `C:\Users\zacga\Nextcloud\Claude\Projects\Teams ISO` (entire path —
Nextcloud-sync directories are bad indexing targets in general)
## How to launch
```
Start Menu → "Wild Dragon" folder → TeamsISO
```
Or pin that entry to the taskbar.
Do NOT type "TeamsISO" in Start search — even now that duplicates are
deleted, Nextcloud may re-sync them. The Wild Dragon Start Menu entry is
the only guaranteed-correct path.
## Pre-1.0 cut still gated on
1. Code-signing the MSI. `SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
need to go into Forgejo Actions Secrets for `release.yml` to start
producing signed MSIs. Without that, downstream operators get the
"Windows protected your PC" SmartScreen warning.
1. Code-signing the MSI (`SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
Forgejo Secrets wired in `release.yml`).
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
## Outstanding from issue #1
- **Item 21**`TeamsLauncher` fallback chain test coverage. Still
needs the `IProcessLauncher` seam refactor before the URI handler →
AppX → process-exe order can be unit-pinned. Half-day of work.
## Build / install cheatsheet
```powershell
cd "C:\Users\zacga\Documents\Claude\Projects\Teams ISO"
# Build + test
dotnet build TeamsISO.sln -c Release # 0 warnings / 0 errors
dotnet test TeamsISO.sln -c Release --no-build # 246/246 passing
# Publish + MSI
$v = "0.9.0-rcN"
dotnet publish src/TeamsISO.App/TeamsISO.App.csproj `
-c Release -r win-x64 --self-contained false `
-o publish/TeamsISO /p:Version=$v
dotnet build installer/TeamsISO.Installer.wixproj -c Release /p:Version=$v
# Install (uninstall first if upgrading from same Version="1.0.0.0"!)
Get-ItemProperty 'HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\*' |
Where-Object DisplayName -like '*TeamsISO*' |
ForEach-Object {
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList "/x $($_.PSChildName) /qn /norestart"
}
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList `
'/i', '"installer\bin\x64\Release\TeamsISO-Setup-' + $v + '.msi"', '/qn', '/norestart'
```
- **Item 21**`TeamsLauncher` fallback chain test coverage. Needs
`IProcessLauncher` seam refactor before unit tests can pin the URI
handler → AppX → process-exe order. Half-day.
## Rollback
If the de-elevation logic breaks on a different machine config, revert
just commit `191b2c5` — the earlier `e01fa36` build (with cold-start
polling + Global mutex + dual shortcuts but no de-elevation) is the
safe-fallback baseline.
`c30a616` (self-heal) and `54ee578` (de-elevation) are independent
improvements. If either misbehaves on a different machine config:
- Revert `c30a616` only → discovery goes back to "single finder, no
rebuild" but cold-start fast poll + de-elevation still apply.
- Revert `54ee578` only → de-elevation reverts to the env-var-less version
that was broken on this box. The runas-wrapped shortcut still works.
`5a43c9c` is the rollback-base if all polish/cleanup needs to go.