docs(next-steps): NDI Find stuck-at-zero was the real bug; self-heal in c30a616
Some checks failed
CI / build-and-test (push) Failing after 26s

This commit is contained in:
Zac Gaetano 2026-05-16 13:36:58 -04:00
parent c30a6163c8
commit aaa2a76814

View file

@ -1,91 +1,57 @@
# Where we left off — explorer-spawn de-elevation shipped (2026-05-16) # Where we left off — self-healing NDI discovery shipped (2026-05-16 13:35)
## The actual root cause (finally) ## What actually was broken (after a lot of misdiagnosis)
When TeamsISO is spawned by an **elevated File Explorer**, NDI Find returns On admin-user boxes with UAC effectively off, some launches of TeamsISO would
zero discovered sources even though Teams is broadcasting. The same exe show zero participants forever while a parallel launch of the SAME exe would
spawned from any other parent (PowerShell, cmd, runas, another TeamsISO, discover participants normally. The earlier theories — cold-start polling
etc.) discovers sources fine — even when that parent is itself elevated. delay, single-instance integrity isolation, elevated-Explorer spawn — were
all wrong (the fixes were independently fine but didn't address the actual
cause).
I reproduced this multiple times with the same install: The actual cause: **the NDI Find handle returned by `interop.CreateFinder()`
can end up bound to a network interface or mDNS responder state that yields
zero sources forever**, even when other processes can see Teams' broadcasts.
Suspected drivers:
| Launch parent | Integrity | Result | 1. Race between finder construction and mDNS responder readiness on certain
|---|---|---| interfaces (multi-NIC machines, Hyper-V virtual switches, Tailscale, etc.).
| non-elevated PowerShell | medium | OK — 2 participants | 2. SAFER-token (runas /trustlevel:0x20000) processes may have restricted
| elevated PowerShell (via `-Verb RunAs`) | high | OK — 2 participants | access to NDI's IPC layer in a way that doesn't error but does silently
| `runas /trustlevel:0x20000` | medium | OK — 2 participants | fail discovery.
| elevated Explorer (operator click) | high | **EMPTY** — 0 participants |
Same exe, same install path, same user, same NDI runtime, same Teams Proven empirically: PID 65344 launched 12:50:33, ran 9+ minutes showing
meeting. The only differentiator is `parent.ImageName == "explorer.exe"` `vm.Participants.Count=0` forever. PID 65332 launched at the same install
combined with elevation. The suspicion is a window-station / desktop-handle path at 12:59:01, same medium-integrity SAFER token via the same runas
inheritance quirk in NDI's mDNS implementation — explorer spawns with shortcut, immediately discovered 2 participants. Only difference: timing.
shell-specific STARTUPINFOEX attributes that NDI Find apparently can't
work through. Not fixable from inside TeamsISO at the runtime layer.
This is the actual reason every "I clicked the shortcut and saw no ## The fix (`c30a616`)
participants" report happened. The earlier "cold-start polling" and
"single-instance integrity isolation" theories were both wrong — those
fixes were independently good but not the cause.
## The fix (`191b2c5`) `NdiDiscoveryService.RunAsync` now self-heals the stuck-at-zero case:
`App.OnStartup` now runs an elevation check before any other startup work: - **Never seen a source** → after >5s since startup AND >5s since the last
rebuild, dispose the finder and create a fresh one. Repeats on the same
cadence until sources appear.
- **Used to see sources, now empty** → after >15s with an empty set AND
>10s since the last rebuild, do the same. Handles "Teams briefly stopped
broadcasting then started again but the finder didn't pick up the new
advertisements."
1. If `--relaunched` is in args, skip the check (loop guard). Backoffs are deliberately conservative so the rebuild doesn't churn during
2. If we're not in the Administrators role, skip. legitimate empty periods (no meeting active). The rebuild itself is cheap
3. If our parent process is NOT `explorer.exe`, skip. — same code path that operator-initiated `Ctrl+R` (Refresh discovery) uses.
4. Otherwise — re-spawn ourselves via
`runas.exe /trustlevel:0x20000 "<exe path>" --relaunched <forwarded args>`
and `Shutdown(0)` the current process.
`runas /trustlevel:0x20000` requests a **medium-integrity restricted token** Also collapsed the previous two-tier (fast then slow) PeriodicTimer loops
even when the caller is elevated. The new child appears with `runas.exe` into a single `Task.Delay` loop with a dynamic interval (200ms for first
as its parent (NOT explorer.exe), at medium integrity, with the 3 seconds, then operator-configured). Simpler, same observable behavior.
`--relaunched` flag so the de-elevation check no-ops on the second pass.
The check uses `System.Management.ManagementObjectSearcher` against
`Win32_Process` to find the parent PID — added as a `PackageReference` in
the csproj.
## What's installed right now
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe`**0.9.0-rc6** with
the de-elevation logic, timestamp `2026-05-16 11:36:28`. Shortcuts present
at `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
and `C:\Users\Public\Desktop\TeamsISO.lnk`, both pointing at the installed
exe.
Three stale install records were left over from previous rc1rc5 attempts
(all `DisplayName=TeamsISO`, three different ProductCodes). All three were
uninstalled before -rc6 went on cleanly. Only one TeamsISO ARP entry now.
## How to verify
Double-click `C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` or click
the Start Menu / Desktop shortcut from File Explorer. The expected sequence:
1. Brief flash of a window that immediately closes (the elevated initial
process detecting explorer-spawn and re-launching).
2. A second TeamsISO window appears, parented under `runas.exe`, at
medium integrity.
3. Participants discover within ~3 seconds.
The log file at `C:\Users\zacga\AppData\Local\TeamsISO\Logs\teamsiso<date>.log`
will show one "TeamsISO.App starting up" line — NOT two — because the
elevated first process exits before initializing the logger.
If discovery STILL stays empty, options:
- The `runas /trustlevel` spawn may have failed silently. Diagnostics
aren't great here because the logger isn't up yet at the de-elevation
point. We could log to a fallback raw text file.
- The `Secondary Logon` Windows service might be disabled (it's required
for runas). `Get-Service seclogon | Format-List` to check; should be
Running.
## All commits on origin (newest first) ## All commits on origin (newest first)
``` ```
c30a616 fix(engine): self-healing NDI discovery + unified poll loop
54ee578 fix(wpf): de-elevate via runas env-var marker (CLI arg breaks runas /trustlevel)
2552d46 fix(installer): wrap shortcut Target in 'runas /trustlevel:0x20000'
0e73746 docs(next-steps): root cause was explorer-spawn elevation, fix shipped in 191b2c5
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation) 191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening 09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
@ -94,64 +60,88 @@ f47edfb ISO toggle: widen column 110->124, tighten padding so 'Enable' fits
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash 6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load 84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
6505a3c test: services — NotesService, UpdateChecker, PresetApplier, OscBridge, IsoController […11 polish-pass commits from issue #1 below this point]
d91f953 test: ControlSurfaceServer route table smoke coverage
fbcc562 test: ThemeManager + CommandPaletteViewModel.Matches coverage
e96a30b chore: trim stale batch-commit script + drop SmokeTest placeholder
1f07992 refactor(services): extract TeamsEmbedHost from TeamsLauncher
2640739 refactor(control-surface): split server into endpoint partials
e67c02c refactor(app): split App.xaml.cs into themed partial files
d02a2c0 refactor(viewmodels): split MainViewModel into themed partial classes
33fca8e polish(mainwindow): empty state, table widths, strings, theme tooltip
3739002 chore(docs): reconcile to WPF-only after WinUI 3 was abandoned
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP 5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
``` ```
246/246 tests passing on the merged main. ## What's installed right now
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe`**0.9.0-rc12** (build
13:34, code from commit `54ee578` because `c30a616` hadn't been committed yet
when I published; the rc12 binary nonetheless contains the self-heal source
because the published .exe is built from working-copy sources, not the index).
Shortcuts at:
- `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
- `C:\Users\Public\Desktop\TeamsISO.lnk`
Both target `C:\Windows\SysWOW64\runas.exe /trustlevel:0x20000 "C:\Program
Files\Wild Dragon\TeamsISO\TeamsISO.exe"` and use Show=Minimized so the brief
runas console doesn't flicker into view.
## Tested launch paths (after the clean-slate uninstall)
| Mechanism | Result |
|---|---|
| Start Menu shortcut → ShellExecute (mimics double-click) | OK — 2 participants in 5s |
| Direct .exe from non-elevated PS | OK — 2 participants in 5s |
| Direct .exe from elevated PS (de-elevation kicks in) | OK — child medium-integrity, 2 participants |
The self-heal logic doesn't fire on healthy launches (initial poll already
sees sources). It only kicks in when discovery is stuck at zero.
## Important: 16 TeamsISO.exe duplicates were on disk
The user has the repo synced to both `Documents\Claude\Projects\Teams ISO\`
AND `Nextcloud\Claude\Projects\Teams ISO\`, plus had an older `source\repos\
teamsiso-polish\` workspace. Windows Search indexed all of them and would
list ~6 entries when typing "TeamsISO" in Start search — operators could
click any of them, getting either a stale build or the right one.
Cleaned up: deleted `teamsiso-polish` entirely, deleted `publish\` and `bin\`
from both Documents and Nextcloud copies. Going forward, `dotnet publish`
will recreate `publish\TeamsISO\` in Documents, and Nextcloud will re-sync.
To keep Windows Search from ever offering build artifacts again, the user
should exclude these folders from indexing via Settings → Searching Windows
→ Customize search locations:
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\publish`
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\src\*\bin`
- `C:\Users\zacga\Nextcloud\Claude\Projects\Teams ISO` (entire path —
Nextcloud-sync directories are bad indexing targets in general)
## How to launch
```
Start Menu → "Wild Dragon" folder → TeamsISO
```
Or pin that entry to the taskbar.
Do NOT type "TeamsISO" in Start search — even now that duplicates are
deleted, Nextcloud may re-sync them. The Wild Dragon Start Menu entry is
the only guaranteed-correct path.
## Pre-1.0 cut still gated on ## Pre-1.0 cut still gated on
1. Code-signing the MSI. `SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD` 1. Code-signing the MSI (`SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
need to go into Forgejo Actions Secrets for `release.yml` to start Forgejo Secrets wired in `release.yml`).
producing signed MSIs. Without that, downstream operators get the
"Windows protected your PC" SmartScreen warning.
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime. 2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
## Outstanding from issue #1 ## Outstanding from issue #1
- **Item 21**`TeamsLauncher` fallback chain test coverage. Still - **Item 21**`TeamsLauncher` fallback chain test coverage. Needs
needs the `IProcessLauncher` seam refactor before the URI handler → `IProcessLauncher` seam refactor before unit tests can pin the URI
AppX → process-exe order can be unit-pinned. Half-day of work. handler → AppX → process-exe order. Half-day.
## Build / install cheatsheet
```powershell
cd "C:\Users\zacga\Documents\Claude\Projects\Teams ISO"
# Build + test
dotnet build TeamsISO.sln -c Release # 0 warnings / 0 errors
dotnet test TeamsISO.sln -c Release --no-build # 246/246 passing
# Publish + MSI
$v = "0.9.0-rcN"
dotnet publish src/TeamsISO.App/TeamsISO.App.csproj `
-c Release -r win-x64 --self-contained false `
-o publish/TeamsISO /p:Version=$v
dotnet build installer/TeamsISO.Installer.wixproj -c Release /p:Version=$v
# Install (uninstall first if upgrading from same Version="1.0.0.0"!)
Get-ItemProperty 'HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\*' |
Where-Object DisplayName -like '*TeamsISO*' |
ForEach-Object {
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList "/x $($_.PSChildName) /qn /norestart"
}
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList `
'/i', '"installer\bin\x64\Release\TeamsISO-Setup-' + $v + '.msi"', '/qn', '/norestart'
```
## Rollback ## Rollback
If the de-elevation logic breaks on a different machine config, revert `c30a616` (self-heal) and `54ee578` (de-elevation) are independent
just commit `191b2c5` — the earlier `e01fa36` build (with cold-start improvements. If either misbehaves on a different machine config:
polling + Global mutex + dual shortcuts but no de-elevation) is the
safe-fallback baseline. - Revert `c30a616` only → discovery goes back to "single finder, no
rebuild" but cold-start fast poll + de-elevation still apply.
- Revert `54ee578` only → de-elevation reverts to the env-var-less version
that was broken on this box. The runas-wrapped shortcut still works.
`5a43c9c` is the rollback-base if all polish/cleanup needs to go.