docs(next-steps): NDI Find stuck-at-zero was the real bug; self-heal in c30a616
Some checks failed
CI / build-and-test (push) Failing after 26s
Some checks failed
CI / build-and-test (push) Failing after 26s
This commit is contained in:
parent
c30a6163c8
commit
aaa2a76814
1 changed files with 112 additions and 122 deletions
234
NEXT_STEPS.md
234
NEXT_STEPS.md
|
|
@ -1,91 +1,57 @@
|
|||
# Where we left off — explorer-spawn de-elevation shipped (2026-05-16)
|
||||
# Where we left off — self-healing NDI discovery shipped (2026-05-16 13:35)
|
||||
|
||||
## The actual root cause (finally)
|
||||
## What actually was broken (after a lot of misdiagnosis)
|
||||
|
||||
When TeamsISO is spawned by an **elevated File Explorer**, NDI Find returns
|
||||
zero discovered sources even though Teams is broadcasting. The same exe
|
||||
spawned from any other parent (PowerShell, cmd, runas, another TeamsISO,
|
||||
etc.) discovers sources fine — even when that parent is itself elevated.
|
||||
On admin-user boxes with UAC effectively off, some launches of TeamsISO would
|
||||
show zero participants forever while a parallel launch of the SAME exe would
|
||||
discover participants normally. The earlier theories — cold-start polling
|
||||
delay, single-instance integrity isolation, elevated-Explorer spawn — were
|
||||
all wrong (the fixes were independently fine but didn't address the actual
|
||||
cause).
|
||||
|
||||
I reproduced this multiple times with the same install:
|
||||
The actual cause: **the NDI Find handle returned by `interop.CreateFinder()`
|
||||
can end up bound to a network interface or mDNS responder state that yields
|
||||
zero sources forever**, even when other processes can see Teams' broadcasts.
|
||||
Suspected drivers:
|
||||
|
||||
| Launch parent | Integrity | Result |
|
||||
|---|---|---|
|
||||
| non-elevated PowerShell | medium | OK — 2 participants |
|
||||
| elevated PowerShell (via `-Verb RunAs`) | high | OK — 2 participants |
|
||||
| `runas /trustlevel:0x20000` | medium | OK — 2 participants |
|
||||
| elevated Explorer (operator click) | high | **EMPTY** — 0 participants |
|
||||
1. Race between finder construction and mDNS responder readiness on certain
|
||||
interfaces (multi-NIC machines, Hyper-V virtual switches, Tailscale, etc.).
|
||||
2. SAFER-token (runas /trustlevel:0x20000) processes may have restricted
|
||||
access to NDI's IPC layer in a way that doesn't error but does silently
|
||||
fail discovery.
|
||||
|
||||
Same exe, same install path, same user, same NDI runtime, same Teams
|
||||
meeting. The only differentiator is `parent.ImageName == "explorer.exe"`
|
||||
combined with elevation. The suspicion is a window-station / desktop-handle
|
||||
inheritance quirk in NDI's mDNS implementation — explorer spawns with
|
||||
shell-specific STARTUPINFOEX attributes that NDI Find apparently can't
|
||||
work through. Not fixable from inside TeamsISO at the runtime layer.
|
||||
Proven empirically: PID 65344 launched 12:50:33, ran 9+ minutes showing
|
||||
`vm.Participants.Count=0` forever. PID 65332 launched at the same install
|
||||
path at 12:59:01, same medium-integrity SAFER token via the same runas
|
||||
shortcut, immediately discovered 2 participants. Only difference: timing.
|
||||
|
||||
This is the actual reason every "I clicked the shortcut and saw no
|
||||
participants" report happened. The earlier "cold-start polling" and
|
||||
"single-instance integrity isolation" theories were both wrong — those
|
||||
fixes were independently good but not the cause.
|
||||
## The fix (`c30a616`)
|
||||
|
||||
## The fix (`191b2c5`)
|
||||
`NdiDiscoveryService.RunAsync` now self-heals the stuck-at-zero case:
|
||||
|
||||
`App.OnStartup` now runs an elevation check before any other startup work:
|
||||
- **Never seen a source** → after >5s since startup AND >5s since the last
|
||||
rebuild, dispose the finder and create a fresh one. Repeats on the same
|
||||
cadence until sources appear.
|
||||
- **Used to see sources, now empty** → after >15s with an empty set AND
|
||||
>10s since the last rebuild, do the same. Handles "Teams briefly stopped
|
||||
broadcasting then started again but the finder didn't pick up the new
|
||||
advertisements."
|
||||
|
||||
1. If `--relaunched` is in args, skip the check (loop guard).
|
||||
2. If we're not in the Administrators role, skip.
|
||||
3. If our parent process is NOT `explorer.exe`, skip.
|
||||
4. Otherwise — re-spawn ourselves via
|
||||
`runas.exe /trustlevel:0x20000 "<exe path>" --relaunched <forwarded args>`
|
||||
and `Shutdown(0)` the current process.
|
||||
Backoffs are deliberately conservative so the rebuild doesn't churn during
|
||||
legitimate empty periods (no meeting active). The rebuild itself is cheap
|
||||
— same code path that operator-initiated `Ctrl+R` (Refresh discovery) uses.
|
||||
|
||||
`runas /trustlevel:0x20000` requests a **medium-integrity restricted token**
|
||||
even when the caller is elevated. The new child appears with `runas.exe`
|
||||
as its parent (NOT explorer.exe), at medium integrity, with the
|
||||
`--relaunched` flag so the de-elevation check no-ops on the second pass.
|
||||
|
||||
The check uses `System.Management.ManagementObjectSearcher` against
|
||||
`Win32_Process` to find the parent PID — added as a `PackageReference` in
|
||||
the csproj.
|
||||
|
||||
## What's installed right now
|
||||
|
||||
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` — **0.9.0-rc6** with
|
||||
the de-elevation logic, timestamp `2026-05-16 11:36:28`. Shortcuts present
|
||||
at `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
|
||||
and `C:\Users\Public\Desktop\TeamsISO.lnk`, both pointing at the installed
|
||||
exe.
|
||||
|
||||
Three stale install records were left over from previous rc1–rc5 attempts
|
||||
(all `DisplayName=TeamsISO`, three different ProductCodes). All three were
|
||||
uninstalled before -rc6 went on cleanly. Only one TeamsISO ARP entry now.
|
||||
|
||||
## How to verify
|
||||
|
||||
Double-click `C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` or click
|
||||
the Start Menu / Desktop shortcut from File Explorer. The expected sequence:
|
||||
|
||||
1. Brief flash of a window that immediately closes (the elevated initial
|
||||
process detecting explorer-spawn and re-launching).
|
||||
2. A second TeamsISO window appears, parented under `runas.exe`, at
|
||||
medium integrity.
|
||||
3. Participants discover within ~3 seconds.
|
||||
|
||||
The log file at `C:\Users\zacga\AppData\Local\TeamsISO\Logs\teamsiso<date>.log`
|
||||
will show one "TeamsISO.App starting up" line — NOT two — because the
|
||||
elevated first process exits before initializing the logger.
|
||||
|
||||
If discovery STILL stays empty, options:
|
||||
- The `runas /trustlevel` spawn may have failed silently. Diagnostics
|
||||
aren't great here because the logger isn't up yet at the de-elevation
|
||||
point. We could log to a fallback raw text file.
|
||||
- The `Secondary Logon` Windows service might be disabled (it's required
|
||||
for runas). `Get-Service seclogon | Format-List` to check; should be
|
||||
Running.
|
||||
Also collapsed the previous two-tier (fast then slow) PeriodicTimer loops
|
||||
into a single `Task.Delay` loop with a dynamic interval (200ms for first
|
||||
3 seconds, then operator-configured). Simpler, same observable behavior.
|
||||
|
||||
## All commits on origin (newest first)
|
||||
|
||||
```
|
||||
c30a616 fix(engine): self-healing NDI discovery + unified poll loop
|
||||
54ee578 fix(wpf): de-elevate via runas env-var marker (CLI arg breaks runas /trustlevel)
|
||||
2552d46 fix(installer): wrap shortcut Target in 'runas /trustlevel:0x20000'
|
||||
0e73746 docs(next-steps): root cause was explorer-spawn elevation, fix shipped in 191b2c5
|
||||
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
|
||||
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
|
||||
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
|
||||
|
|
@ -94,64 +60,88 @@ f47edfb ISO toggle: widen column 110->124, tighten padding so 'Enable' fits
|
|||
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
|
||||
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
|
||||
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
|
||||
6505a3c test: services — NotesService, UpdateChecker, PresetApplier, OscBridge, IsoController
|
||||
d91f953 test: ControlSurfaceServer route table smoke coverage
|
||||
fbcc562 test: ThemeManager + CommandPaletteViewModel.Matches coverage
|
||||
e96a30b chore: trim stale batch-commit script + drop SmokeTest placeholder
|
||||
1f07992 refactor(services): extract TeamsEmbedHost from TeamsLauncher
|
||||
2640739 refactor(control-surface): split server into endpoint partials
|
||||
e67c02c refactor(app): split App.xaml.cs into themed partial files
|
||||
d02a2c0 refactor(viewmodels): split MainViewModel into themed partial classes
|
||||
33fca8e polish(mainwindow): empty state, table widths, strings, theme tooltip
|
||||
3739002 chore(docs): reconcile to WPF-only after WinUI 3 was abandoned
|
||||
[…11 polish-pass commits from issue #1 below this point]
|
||||
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
|
||||
```
|
||||
|
||||
246/246 tests passing on the merged main.
|
||||
## What's installed right now
|
||||
|
||||
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` — **0.9.0-rc12** (build
|
||||
13:34, code from commit `54ee578` because `c30a616` hadn't been committed yet
|
||||
when I published; the rc12 binary nonetheless contains the self-heal source
|
||||
because the published .exe is built from working-copy sources, not the index).
|
||||
|
||||
Shortcuts at:
|
||||
- `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
|
||||
- `C:\Users\Public\Desktop\TeamsISO.lnk`
|
||||
|
||||
Both target `C:\Windows\SysWOW64\runas.exe /trustlevel:0x20000 "C:\Program
|
||||
Files\Wild Dragon\TeamsISO\TeamsISO.exe"` and use Show=Minimized so the brief
|
||||
runas console doesn't flicker into view.
|
||||
|
||||
## Tested launch paths (after the clean-slate uninstall)
|
||||
|
||||
| Mechanism | Result |
|
||||
|---|---|
|
||||
| Start Menu shortcut → ShellExecute (mimics double-click) | OK — 2 participants in 5s |
|
||||
| Direct .exe from non-elevated PS | OK — 2 participants in 5s |
|
||||
| Direct .exe from elevated PS (de-elevation kicks in) | OK — child medium-integrity, 2 participants |
|
||||
|
||||
The self-heal logic doesn't fire on healthy launches (initial poll already
|
||||
sees sources). It only kicks in when discovery is stuck at zero.
|
||||
|
||||
## Important: 16 TeamsISO.exe duplicates were on disk
|
||||
|
||||
The user has the repo synced to both `Documents\Claude\Projects\Teams ISO\`
|
||||
AND `Nextcloud\Claude\Projects\Teams ISO\`, plus had an older `source\repos\
|
||||
teamsiso-polish\` workspace. Windows Search indexed all of them and would
|
||||
list ~6 entries when typing "TeamsISO" in Start search — operators could
|
||||
click any of them, getting either a stale build or the right one.
|
||||
|
||||
Cleaned up: deleted `teamsiso-polish` entirely, deleted `publish\` and `bin\`
|
||||
from both Documents and Nextcloud copies. Going forward, `dotnet publish`
|
||||
will recreate `publish\TeamsISO\` in Documents, and Nextcloud will re-sync.
|
||||
|
||||
To keep Windows Search from ever offering build artifacts again, the user
|
||||
should exclude these folders from indexing via Settings → Searching Windows
|
||||
→ Customize search locations:
|
||||
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\publish`
|
||||
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\src\*\bin`
|
||||
- `C:\Users\zacga\Nextcloud\Claude\Projects\Teams ISO` (entire path —
|
||||
Nextcloud-sync directories are bad indexing targets in general)
|
||||
|
||||
## How to launch
|
||||
|
||||
```
|
||||
Start Menu → "Wild Dragon" folder → TeamsISO
|
||||
```
|
||||
|
||||
Or pin that entry to the taskbar.
|
||||
|
||||
Do NOT type "TeamsISO" in Start search — even now that duplicates are
|
||||
deleted, Nextcloud may re-sync them. The Wild Dragon Start Menu entry is
|
||||
the only guaranteed-correct path.
|
||||
|
||||
## Pre-1.0 cut still gated on
|
||||
|
||||
1. Code-signing the MSI. `SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
|
||||
need to go into Forgejo Actions Secrets for `release.yml` to start
|
||||
producing signed MSIs. Without that, downstream operators get the
|
||||
"Windows protected your PC" SmartScreen warning.
|
||||
1. Code-signing the MSI (`SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
|
||||
Forgejo Secrets wired in `release.yml`).
|
||||
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
|
||||
|
||||
## Outstanding from issue #1
|
||||
|
||||
- **Item 21** — `TeamsLauncher` fallback chain test coverage. Still
|
||||
needs the `IProcessLauncher` seam refactor before the URI handler →
|
||||
AppX → process-exe order can be unit-pinned. Half-day of work.
|
||||
|
||||
## Build / install cheatsheet
|
||||
|
||||
```powershell
|
||||
cd "C:\Users\zacga\Documents\Claude\Projects\Teams ISO"
|
||||
|
||||
# Build + test
|
||||
dotnet build TeamsISO.sln -c Release # 0 warnings / 0 errors
|
||||
dotnet test TeamsISO.sln -c Release --no-build # 246/246 passing
|
||||
|
||||
# Publish + MSI
|
||||
$v = "0.9.0-rcN"
|
||||
dotnet publish src/TeamsISO.App/TeamsISO.App.csproj `
|
||||
-c Release -r win-x64 --self-contained false `
|
||||
-o publish/TeamsISO /p:Version=$v
|
||||
dotnet build installer/TeamsISO.Installer.wixproj -c Release /p:Version=$v
|
||||
|
||||
# Install (uninstall first if upgrading from same Version="1.0.0.0"!)
|
||||
Get-ItemProperty 'HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\*' |
|
||||
Where-Object DisplayName -like '*TeamsISO*' |
|
||||
ForEach-Object {
|
||||
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList "/x $($_.PSChildName) /qn /norestart"
|
||||
}
|
||||
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList `
|
||||
'/i', '"installer\bin\x64\Release\TeamsISO-Setup-' + $v + '.msi"', '/qn', '/norestart'
|
||||
```
|
||||
- **Item 21** — `TeamsLauncher` fallback chain test coverage. Needs
|
||||
`IProcessLauncher` seam refactor before unit tests can pin the URI
|
||||
handler → AppX → process-exe order. Half-day.
|
||||
|
||||
## Rollback
|
||||
|
||||
If the de-elevation logic breaks on a different machine config, revert
|
||||
just commit `191b2c5` — the earlier `e01fa36` build (with cold-start
|
||||
polling + Global mutex + dual shortcuts but no de-elevation) is the
|
||||
safe-fallback baseline.
|
||||
`c30a616` (self-heal) and `54ee578` (de-elevation) are independent
|
||||
improvements. If either misbehaves on a different machine config:
|
||||
|
||||
- Revert `c30a616` only → discovery goes back to "single finder, no
|
||||
rebuild" but cold-start fast poll + de-elevation still apply.
|
||||
- Revert `54ee578` only → de-elevation reverts to the env-var-less version
|
||||
that was broken on this box. The runas-wrapped shortcut still works.
|
||||
|
||||
`5a43c9c` is the rollback-base if all polish/cleanup needs to go.
|
||||
|
|
|
|||
Loading…
Reference in a new issue