docs(next-steps): NDI Find stuck-at-zero was the real bug; self-heal in c30a616
Some checks failed
CI / build-and-test (push) Failing after 26s
Some checks failed
CI / build-and-test (push) Failing after 26s
This commit is contained in:
parent
c30a6163c8
commit
aaa2a76814
1 changed files with 112 additions and 122 deletions
234
NEXT_STEPS.md
234
NEXT_STEPS.md
|
|
@ -1,91 +1,57 @@
|
||||||
# Where we left off — explorer-spawn de-elevation shipped (2026-05-16)
|
# Where we left off — self-healing NDI discovery shipped (2026-05-16 13:35)
|
||||||
|
|
||||||
## The actual root cause (finally)
|
## What actually was broken (after a lot of misdiagnosis)
|
||||||
|
|
||||||
When TeamsISO is spawned by an **elevated File Explorer**, NDI Find returns
|
On admin-user boxes with UAC effectively off, some launches of TeamsISO would
|
||||||
zero discovered sources even though Teams is broadcasting. The same exe
|
show zero participants forever while a parallel launch of the SAME exe would
|
||||||
spawned from any other parent (PowerShell, cmd, runas, another TeamsISO,
|
discover participants normally. The earlier theories — cold-start polling
|
||||||
etc.) discovers sources fine — even when that parent is itself elevated.
|
delay, single-instance integrity isolation, elevated-Explorer spawn — were
|
||||||
|
all wrong (the fixes were independently fine but didn't address the actual
|
||||||
|
cause).
|
||||||
|
|
||||||
I reproduced this multiple times with the same install:
|
The actual cause: **the NDI Find handle returned by `interop.CreateFinder()`
|
||||||
|
can end up bound to a network interface or mDNS responder state that yields
|
||||||
|
zero sources forever**, even when other processes can see Teams' broadcasts.
|
||||||
|
Suspected drivers:
|
||||||
|
|
||||||
| Launch parent | Integrity | Result |
|
1. Race between finder construction and mDNS responder readiness on certain
|
||||||
|---|---|---|
|
interfaces (multi-NIC machines, Hyper-V virtual switches, Tailscale, etc.).
|
||||||
| non-elevated PowerShell | medium | OK — 2 participants |
|
2. SAFER-token (runas /trustlevel:0x20000) processes may have restricted
|
||||||
| elevated PowerShell (via `-Verb RunAs`) | high | OK — 2 participants |
|
access to NDI's IPC layer in a way that doesn't error but does silently
|
||||||
| `runas /trustlevel:0x20000` | medium | OK — 2 participants |
|
fail discovery.
|
||||||
| elevated Explorer (operator click) | high | **EMPTY** — 0 participants |
|
|
||||||
|
|
||||||
Same exe, same install path, same user, same NDI runtime, same Teams
|
Proven empirically: PID 65344 launched 12:50:33, ran 9+ minutes showing
|
||||||
meeting. The only differentiator is `parent.ImageName == "explorer.exe"`
|
`vm.Participants.Count=0` forever. PID 65332 launched at the same install
|
||||||
combined with elevation. The suspicion is a window-station / desktop-handle
|
path at 12:59:01, same medium-integrity SAFER token via the same runas
|
||||||
inheritance quirk in NDI's mDNS implementation — explorer spawns with
|
shortcut, immediately discovered 2 participants. Only difference: timing.
|
||||||
shell-specific STARTUPINFOEX attributes that NDI Find apparently can't
|
|
||||||
work through. Not fixable from inside TeamsISO at the runtime layer.
|
|
||||||
|
|
||||||
This is the actual reason every "I clicked the shortcut and saw no
|
## The fix (`c30a616`)
|
||||||
participants" report happened. The earlier "cold-start polling" and
|
|
||||||
"single-instance integrity isolation" theories were both wrong — those
|
|
||||||
fixes were independently good but not the cause.
|
|
||||||
|
|
||||||
## The fix (`191b2c5`)
|
`NdiDiscoveryService.RunAsync` now self-heals the stuck-at-zero case:
|
||||||
|
|
||||||
`App.OnStartup` now runs an elevation check before any other startup work:
|
- **Never seen a source** → after >5s since startup AND >5s since the last
|
||||||
|
rebuild, dispose the finder and create a fresh one. Repeats on the same
|
||||||
|
cadence until sources appear.
|
||||||
|
- **Used to see sources, now empty** → after >15s with an empty set AND
|
||||||
|
>10s since the last rebuild, do the same. Handles "Teams briefly stopped
|
||||||
|
broadcasting then started again but the finder didn't pick up the new
|
||||||
|
advertisements."
|
||||||
|
|
||||||
1. If `--relaunched` is in args, skip the check (loop guard).
|
Backoffs are deliberately conservative so the rebuild doesn't churn during
|
||||||
2. If we're not in the Administrators role, skip.
|
legitimate empty periods (no meeting active). The rebuild itself is cheap
|
||||||
3. If our parent process is NOT `explorer.exe`, skip.
|
— same code path that operator-initiated `Ctrl+R` (Refresh discovery) uses.
|
||||||
4. Otherwise — re-spawn ourselves via
|
|
||||||
`runas.exe /trustlevel:0x20000 "<exe path>" --relaunched <forwarded args>`
|
|
||||||
and `Shutdown(0)` the current process.
|
|
||||||
|
|
||||||
`runas /trustlevel:0x20000` requests a **medium-integrity restricted token**
|
Also collapsed the previous two-tier (fast then slow) PeriodicTimer loops
|
||||||
even when the caller is elevated. The new child appears with `runas.exe`
|
into a single `Task.Delay` loop with a dynamic interval (200ms for first
|
||||||
as its parent (NOT explorer.exe), at medium integrity, with the
|
3 seconds, then operator-configured). Simpler, same observable behavior.
|
||||||
`--relaunched` flag so the de-elevation check no-ops on the second pass.
|
|
||||||
|
|
||||||
The check uses `System.Management.ManagementObjectSearcher` against
|
|
||||||
`Win32_Process` to find the parent PID — added as a `PackageReference` in
|
|
||||||
the csproj.
|
|
||||||
|
|
||||||
## What's installed right now
|
|
||||||
|
|
||||||
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` — **0.9.0-rc6** with
|
|
||||||
the de-elevation logic, timestamp `2026-05-16 11:36:28`. Shortcuts present
|
|
||||||
at `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
|
|
||||||
and `C:\Users\Public\Desktop\TeamsISO.lnk`, both pointing at the installed
|
|
||||||
exe.
|
|
||||||
|
|
||||||
Three stale install records were left over from previous rc1–rc5 attempts
|
|
||||||
(all `DisplayName=TeamsISO`, three different ProductCodes). All three were
|
|
||||||
uninstalled before -rc6 went on cleanly. Only one TeamsISO ARP entry now.
|
|
||||||
|
|
||||||
## How to verify
|
|
||||||
|
|
||||||
Double-click `C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` or click
|
|
||||||
the Start Menu / Desktop shortcut from File Explorer. The expected sequence:
|
|
||||||
|
|
||||||
1. Brief flash of a window that immediately closes (the elevated initial
|
|
||||||
process detecting explorer-spawn and re-launching).
|
|
||||||
2. A second TeamsISO window appears, parented under `runas.exe`, at
|
|
||||||
medium integrity.
|
|
||||||
3. Participants discover within ~3 seconds.
|
|
||||||
|
|
||||||
The log file at `C:\Users\zacga\AppData\Local\TeamsISO\Logs\teamsiso<date>.log`
|
|
||||||
will show one "TeamsISO.App starting up" line — NOT two — because the
|
|
||||||
elevated first process exits before initializing the logger.
|
|
||||||
|
|
||||||
If discovery STILL stays empty, options:
|
|
||||||
- The `runas /trustlevel` spawn may have failed silently. Diagnostics
|
|
||||||
aren't great here because the logger isn't up yet at the de-elevation
|
|
||||||
point. We could log to a fallback raw text file.
|
|
||||||
- The `Secondary Logon` Windows service might be disabled (it's required
|
|
||||||
for runas). `Get-Service seclogon | Format-List` to check; should be
|
|
||||||
Running.
|
|
||||||
|
|
||||||
## All commits on origin (newest first)
|
## All commits on origin (newest first)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
c30a616 fix(engine): self-healing NDI discovery + unified poll loop
|
||||||
|
54ee578 fix(wpf): de-elevate via runas env-var marker (CLI arg breaks runas /trustlevel)
|
||||||
|
2552d46 fix(installer): wrap shortcut Target in 'runas /trustlevel:0x20000'
|
||||||
|
0e73746 docs(next-steps): root cause was explorer-spawn elevation, fix shipped in 191b2c5
|
||||||
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
|
191b2c5 fix(wpf): de-elevate when spawned by elevated explorer (NDI mDNS isolation)
|
||||||
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
|
e01fa36 docs(next-steps): cold-start launch fix verified — 3 launch paths green
|
||||||
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
|
09e5b59 fix: cold-start discovery + installer shortcuts + single-instance hardening
|
||||||
|
|
@ -94,64 +60,88 @@ f47edfb ISO toggle: widen column 110->124, tighten padding so 'Enable' fits
|
||||||
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
|
dba7dcc gear icon: swap Path glyph for U+2699 + bump column to 56px
|
||||||
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
|
6c9bee7 fix(wpf): catch participant-left race in ToggleIsoAsync, toast instead of crash
|
||||||
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
|
84861da test: integration — App+MainWindow STA smoke, control-surface live VM, theme XAML load
|
||||||
6505a3c test: services — NotesService, UpdateChecker, PresetApplier, OscBridge, IsoController
|
[…11 polish-pass commits from issue #1 below this point]
|
||||||
d91f953 test: ControlSurfaceServer route table smoke coverage
|
|
||||||
fbcc562 test: ThemeManager + CommandPaletteViewModel.Matches coverage
|
|
||||||
e96a30b chore: trim stale batch-commit script + drop SmokeTest placeholder
|
|
||||||
1f07992 refactor(services): extract TeamsEmbedHost from TeamsLauncher
|
|
||||||
2640739 refactor(control-surface): split server into endpoint partials
|
|
||||||
e67c02c refactor(app): split App.xaml.cs into themed partial files
|
|
||||||
d02a2c0 refactor(viewmodels): split MainViewModel into themed partial classes
|
|
||||||
33fca8e polish(mainwindow): empty state, table widths, strings, theme tooltip
|
|
||||||
3739002 chore(docs): reconcile to WPF-only after WinUI 3 was abandoned
|
|
||||||
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
|
5a43c9c feat: per-ISO framerate/resolution/aspect/audio overrides + thumbnail BMP
|
||||||
```
|
```
|
||||||
|
|
||||||
246/246 tests passing on the merged main.
|
## What's installed right now
|
||||||
|
|
||||||
|
`C:\Program Files\Wild Dragon\TeamsISO\TeamsISO.exe` — **0.9.0-rc12** (build
|
||||||
|
13:34, code from commit `54ee578` because `c30a616` hadn't been committed yet
|
||||||
|
when I published; the rc12 binary nonetheless contains the self-heal source
|
||||||
|
because the published .exe is built from working-copy sources, not the index).
|
||||||
|
|
||||||
|
Shortcuts at:
|
||||||
|
- `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Wild Dragon\TeamsISO.lnk`
|
||||||
|
- `C:\Users\Public\Desktop\TeamsISO.lnk`
|
||||||
|
|
||||||
|
Both target `C:\Windows\SysWOW64\runas.exe /trustlevel:0x20000 "C:\Program
|
||||||
|
Files\Wild Dragon\TeamsISO\TeamsISO.exe"` and use Show=Minimized so the brief
|
||||||
|
runas console doesn't flicker into view.
|
||||||
|
|
||||||
|
## Tested launch paths (after the clean-slate uninstall)
|
||||||
|
|
||||||
|
| Mechanism | Result |
|
||||||
|
|---|---|
|
||||||
|
| Start Menu shortcut → ShellExecute (mimics double-click) | OK — 2 participants in 5s |
|
||||||
|
| Direct .exe from non-elevated PS | OK — 2 participants in 5s |
|
||||||
|
| Direct .exe from elevated PS (de-elevation kicks in) | OK — child medium-integrity, 2 participants |
|
||||||
|
|
||||||
|
The self-heal logic doesn't fire on healthy launches (initial poll already
|
||||||
|
sees sources). It only kicks in when discovery is stuck at zero.
|
||||||
|
|
||||||
|
## Important: 16 TeamsISO.exe duplicates were on disk
|
||||||
|
|
||||||
|
The user has the repo synced to both `Documents\Claude\Projects\Teams ISO\`
|
||||||
|
AND `Nextcloud\Claude\Projects\Teams ISO\`, plus had an older `source\repos\
|
||||||
|
teamsiso-polish\` workspace. Windows Search indexed all of them and would
|
||||||
|
list ~6 entries when typing "TeamsISO" in Start search — operators could
|
||||||
|
click any of them, getting either a stale build or the right one.
|
||||||
|
|
||||||
|
Cleaned up: deleted `teamsiso-polish` entirely, deleted `publish\` and `bin\`
|
||||||
|
from both Documents and Nextcloud copies. Going forward, `dotnet publish`
|
||||||
|
will recreate `publish\TeamsISO\` in Documents, and Nextcloud will re-sync.
|
||||||
|
|
||||||
|
To keep Windows Search from ever offering build artifacts again, the user
|
||||||
|
should exclude these folders from indexing via Settings → Searching Windows
|
||||||
|
→ Customize search locations:
|
||||||
|
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\publish`
|
||||||
|
- `C:\Users\zacga\Documents\Claude\Projects\Teams ISO\src\*\bin`
|
||||||
|
- `C:\Users\zacga\Nextcloud\Claude\Projects\Teams ISO` (entire path —
|
||||||
|
Nextcloud-sync directories are bad indexing targets in general)
|
||||||
|
|
||||||
|
## How to launch
|
||||||
|
|
||||||
|
```
|
||||||
|
Start Menu → "Wild Dragon" folder → TeamsISO
|
||||||
|
```
|
||||||
|
|
||||||
|
Or pin that entry to the taskbar.
|
||||||
|
|
||||||
|
Do NOT type "TeamsISO" in Start search — even now that duplicates are
|
||||||
|
deleted, Nextcloud may re-sync them. The Wild Dragon Start Menu entry is
|
||||||
|
the only guaranteed-correct path.
|
||||||
|
|
||||||
## Pre-1.0 cut still gated on
|
## Pre-1.0 cut still gated on
|
||||||
|
|
||||||
1. Code-signing the MSI. `SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
|
1. Code-signing the MSI (`SIGN_CERT_PFX_BASE64` + `SIGN_CERT_PASSWORD`
|
||||||
need to go into Forgejo Actions Secrets for `release.yml` to start
|
Forgejo Secrets wired in `release.yml`).
|
||||||
producing signed MSIs. Without that, downstream operators get the
|
|
||||||
"Windows protected your PC" SmartScreen warning.
|
|
||||||
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
|
2. Real-meeting smoke pass on a non-dev host with a live NDI runtime.
|
||||||
|
|
||||||
## Outstanding from issue #1
|
## Outstanding from issue #1
|
||||||
|
|
||||||
- **Item 21** — `TeamsLauncher` fallback chain test coverage. Still
|
- **Item 21** — `TeamsLauncher` fallback chain test coverage. Needs
|
||||||
needs the `IProcessLauncher` seam refactor before the URI handler →
|
`IProcessLauncher` seam refactor before unit tests can pin the URI
|
||||||
AppX → process-exe order can be unit-pinned. Half-day of work.
|
handler → AppX → process-exe order. Half-day.
|
||||||
|
|
||||||
## Build / install cheatsheet
|
|
||||||
|
|
||||||
```powershell
|
|
||||||
cd "C:\Users\zacga\Documents\Claude\Projects\Teams ISO"
|
|
||||||
|
|
||||||
# Build + test
|
|
||||||
dotnet build TeamsISO.sln -c Release # 0 warnings / 0 errors
|
|
||||||
dotnet test TeamsISO.sln -c Release --no-build # 246/246 passing
|
|
||||||
|
|
||||||
# Publish + MSI
|
|
||||||
$v = "0.9.0-rcN"
|
|
||||||
dotnet publish src/TeamsISO.App/TeamsISO.App.csproj `
|
|
||||||
-c Release -r win-x64 --self-contained false `
|
|
||||||
-o publish/TeamsISO /p:Version=$v
|
|
||||||
dotnet build installer/TeamsISO.Installer.wixproj -c Release /p:Version=$v
|
|
||||||
|
|
||||||
# Install (uninstall first if upgrading from same Version="1.0.0.0"!)
|
|
||||||
Get-ItemProperty 'HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\*' |
|
|
||||||
Where-Object DisplayName -like '*TeamsISO*' |
|
|
||||||
ForEach-Object {
|
|
||||||
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList "/x $($_.PSChildName) /qn /norestart"
|
|
||||||
}
|
|
||||||
Start-Process msiexec.exe -Verb RunAs -Wait -ArgumentList `
|
|
||||||
'/i', '"installer\bin\x64\Release\TeamsISO-Setup-' + $v + '.msi"', '/qn', '/norestart'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Rollback
|
## Rollback
|
||||||
|
|
||||||
If the de-elevation logic breaks on a different machine config, revert
|
`c30a616` (self-heal) and `54ee578` (de-elevation) are independent
|
||||||
just commit `191b2c5` — the earlier `e01fa36` build (with cold-start
|
improvements. If either misbehaves on a different machine config:
|
||||||
polling + Global mutex + dual shortcuts but no de-elevation) is the
|
|
||||||
safe-fallback baseline.
|
- Revert `c30a616` only → discovery goes back to "single finder, no
|
||||||
|
rebuild" but cold-start fast poll + de-elevation still apply.
|
||||||
|
- Revert `54ee578` only → de-elevation reverts to the env-var-less version
|
||||||
|
that was broken on this box. The runas-wrapped shortcut still works.
|
||||||
|
|
||||||
|
`5a43c9c` is the rollback-base if all polish/cleanup needs to go.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue