A personal macOS desktop avatar project — a floating, transparent, always-on-top window with sprite-based animations of a cartoon avatar. Built as a fun gift; not for distribution.
This document captures both the graphics workflow journey (how to actually get usable AI-generated character animations) and the engineering pipeline journey (how to get those frames cleanly into a real macOS app).
Final Outcome
- 16 sprite frames across 4 sequences: cheers, wave, blow kiss, wink
- All frames at 701×561 resolution with clean alpha
- Pixel-perfect 262×245 borderless transparent NSWindow
- Click → random sprite sequence with small hop
- Random idle every 9–18s (transform animations + sprite sequences)
- Native AppKit drag-to-reposition; right-click → Quit
- Crisp edges, consistent character framing, stable across animations
Part 1 — Building a Consistent Animated Avatar with AI
I set out to create a simple animated avatar (wink, cheers, wave, blow kiss) from a single image of my fiancee Wendy. On paper this sounded straightforward. In practice it turned into a deep dive into the limits of current AI tools.
❌ What Didn’t Work
1. Generating Everything in One Image (4×4 Grid)
First approach: a full animation sheet in a single prompt.
Problems:
- Hands switched randomly between frames
- Wine glass disappeared mid-sequence
- Extra fingers and duplicate hands appeared
- Facial consistency broke across panels
Conclusion: AI image models don’t handle multi-frame continuity well. Each panel is treated semi-independently.
2. Repeated Prompt Tweaking
I tried increasingly strict prompts — locking hand positions, forcing object consistency (wine glass, ring), defining animation steps precisely.
Result: slight improvements, still inconsistent, same core issues kept resurfacing. Prompting alone cannot enforce strict animation logic.
3. Switching Tools (Midjourney, Leonardo, etc.)
Tried other tools hoping they’d handle consistency better.
Problems: confusing or unstable interfaces, identity drift (didn’t look like Wendy anymore), setup overhead outweighed benefits. Tool-switching didn’t solve the underlying problem. Another thing I ran into is a staggering amount of “pop-up” AI imaging sites that use, in my opinion, shady marketing tools. For example ChatGPT suggested I try midjourney. Instead of getting the URL from chatGPT I did a quick google search and at first glance the top result said Midjourney but upon closer inspection it was a site called use.ai. Many of these new sites change names etc so I thought that might be it so I tried it and it was terrible. After another google search, low and behold, that was NOT the midjourney site. After that I found the same false looking results across many tools.
4. Regenerating with Transparent / White Backgrounds
Attempted to generate images with clean backgrounds from the start.
Problems: increased randomness, lost character consistency, broke previously working elements. Regenerating “cleaner” images made results worse, not better.
5. Using AI Prompts for Background Removal
Tools like PhotoRoom allow prompt-based edits, so I tried prompts like “remove background”, “remove table and keep subject”.
Results: cropped the image incorrectly, removed parts of the subject (like the head), even hallucinated new body parts. Generative editing is unreliable for precise cutouts.
✅ What Actually Worked
1. Use a Strong Base Image
Once I had a version of Wendy that looked exactly right, I stopped regenerating it. Treat that image as the source of truth — never try to recreate it.
2. Generate Animations in Small Sets (4 Frames at a Time)
Instead of a full 4×4 grid, I generated each gesture separately:
- Wink (4 frames) → Cheers (4 frames) → Wave (4 frames) → Blow Kiss (4 frames)
Why it worked: simpler problem for the model, reduced drift, easier to control hand behavior.
3. Lock Critical Constraints
Explicitly enforced rules:
- Wine glass always in the left hand
- Left hand never performs gestures
- Right hand performs all gestures
- Engagement ring always visible
Result: no disappearing props, no hand confusion, consistent animation logic.
4. Keep the Original Background During Generation
Avoided removing the background during generation — it helped maintain consistency, reduced visual drift, kept lighting and composition stable.
5. Remove Background AFTER Generation (PhotoRoom)
Instead of prompting AI to remove backgrounds, used PhotoRoom for post-processing. Automatic background removal got me most of the way; manual erase/restore tools handled edge cases.
Not perfect (removing the table was tricky and sometimes cut into the subject), but far more reliable than generative background removal.
Part 2 — Getting It Into the App
Once I had clean lifted PNGs from PhotoRoom, I figured the rest would be plumbing. It wasn’t.
❌ What Tripped Me Up
The Avatar Wouldn’t Drag Smoothly
When I tried to drag Wendy around the screen, she stuttered and lagged instead of following the cursor. Turns out SwiftUI’s drag gesture works in the view’s coordinate space — and as the window moves, the view moves with it, so the math goes sideways.
Fix: drop SwiftUI for the drag interaction and use AppKit’s native performDrag(with:). The OS handles dragging, smooth as silk.
Apple’s Foreground Detection Was Useless
I tried Apple’s Vision framework first to lift Wendy out of the original photo. It did a fine job on her body but kept dropping things — her hand, the wine glass, a heart in one of the kiss frames. Vision’s model picks what it thinks is “the person” and silently excludes everything else.
Conclusion: gave up on Vision and moved to chroma-key approaches, eventually to user-side tools (PhotoRoom).
Background Removal Stole Her Teeth
My first chroma-key approach was a flat threshold — anything over a certain brightness becomes transparent. Worked great for the white background. Also turned her teeth, eye whites, and white dress transparent.
Fix: flood-fill chroma-key. Mark only pixels that are connected to the image edge. Teeth aren’t connected to the edge, so they survive. Same for eye whites and dress folds.
The Mystery Halo
This one took the longest. The PNG files looked perfectly clean against white. But when I dropped Wendy onto my desktop, she had a faint white outline around her edges. I kept blaming my chroma-key for leaving residue.
It wasn’t the chroma-key. Lifted PNGs from white-background sources retain semi-transparent pixels at the boundary that have white-ish RGB values. The compositing math visible = alpha * fg + (1-alpha) * bg makes those pixels invisible against white viewers but visible as a faint tint against any other color.
Fix: a small Swift script that walks the alpha channel and kills any pixel that’s both partly-transparent and white-ish. Surgical, fast, didn’t touch the character interior. The halo disappeared instantly.
Soft Edges Even After Cleanup
After fixing the halo, edges still looked slightly soft on my Retina display. Source images were around 250 pixels wide. macOS upscales them ~2× for the physical display, and the upscale adds blur.
Fix: regenerated all 16 frames at 701×561 — roughly 3× the resolution. Crispness problem solved immediately.
Auto-Wandering Was Annoying
I built in a feature for Wendy to wander to a random spot on screen every few minutes. Cute for thirty seconds, then aggressively distracting because she’d walk away from wherever I’d just put her.
Fix: ripped it out. She stays where I put her now.
The Wink Looked Like a Blink
First attempt at a wink had her closing both eyes in the second frame. Read as “blinking,” not “winking.”
Fix: regenerated that frame set with one eye properly closed. At least that was the plan. After spending way more time than I wanted on the image manipulation, she ended up doing a double wink and that’s how she’ll stay :)
✅ The Workflow That Stuck
- Lift the subject in PhotoRoom — PNG with alpha
- Run
clean_edges.swift— kills the white-fringe halo - Drop into the app’s
Resources/sprites/folder - Build, run, done
For a new gesture set, I also update BuddyView.swift to register the sequence in the random-animation pool.
Key Lessons
On AI image generation:
- AI is great at creating images, not at maintaining consistency across frames
- Breaking the problem into smaller pieces dramatically improves results
- Prompts have limits — some problems require workflow changes, not better wording
- Background removal belongs in post-processing, not in the generation step
On the desktop app pipeline:
- Always color-decontaminate lifted PNGs before bundling into a UI — otherwise you’ll get a halo on any non-white background
- Floating desktop UI should never reposition itself; manual drag only
- Match window size to sprite canvas exactly so the image renders at native pixel size — no scaling, no softness
- AppKit’s native drag handling beats SwiftUI’s
DragGesturefor borderless windows
If You’re Trying This Yourself
Do this:
- Get one perfect base image of your subject
- Generate animations in small sets (one gesture at a time)
- Lock constraints explicitly in the prompt (which hand, which prop)
- Remove backgrounds afterward (not during generation)
- Apply color decontamination to lifted PNGs before bundling them into a UI
Avoid:
- Large multi-frame prompts (4×4 sprite sheets in one shot)
- Over-relying on prompt engineering when the workflow itself is the problem
- Regenerating once you have a good base image
- Using AI for background removal (it’ll cut into the subject)
Project Structure
always-wendy/
├── AlwaysWendy/
│ ├── Package.swift
│ └── Sources/AlwaysWendy/
│ ├── main.swift # app entry
│ ├── AppDelegate.swift # creates the window
│ ├── BuddyWindow.swift # borderless transparent NSWindow
│ ├── BuddyView.swift # SwiftUI view + animations + sprite system
│ ├── HitArea.swift # AppKit click/drag detection
│ └── Resources/sprites/ # 16 cleaned PNG sprites bundled into the app
├── scripts/ # Swift CLI image-processing tools
│ ├── clean_edges.swift # ★ color decontamination (final pipeline)
│ ├── chroma_key_flood.swift # flood-fill chroma-key (earlier approach)
│ ├── chroma_key.swift # flat-threshold chroma-key (earliest approach)
│ ├── slice_sheet.swift # sprite-sheet auto-slicer (early approach)
│ ├── normalize.swift # bbox + centroid + uniform canvas (mid-iteration)
│ └── remove_background.swift # Vision framework approach (abandoned)
├── wendy_original.png # original portrait (starting point)
├── wendy_sprite_sheet.png # original ChatGPT 4×4 sheet
└── SUMMARY.md # this document
This was way way more work than expected — but once the workflow clicked, everything fell into place and was a positive learning experience overall.