Tool-Plugin — komplette Face-Restoration-Pipeline (detect + align + restore + blend)

Repliziert die Q-engineering GFPGAN-ncnn-Pipeline mit den nativen OpticScript-Operationen für alles ausser den beiden ML-Schritten. Erkennt Gesichter mit 3-Punkt-Landmarks (gfpgan-detect), berechnet in JS eine Procrustes-Ähnlichkeitstransformation, verzerrt jedes Gesicht auf 512×512, restauriert mit GFPGAN, dann zurück per erodiertem + Gauss-weichgezeichnetem Mask-Blend. Ersetzt drei C++-Binaries (align, merge, upscale) durch native Engine-Calls — mit live-justierbarem BLEND_FEATHER-Slider.

INPUT
INPUT — Tool-Plugin — komplette Face-Restoration-Pipeline (detect + align + restore + blend)
JavaScript
// Tool plugin demo — full face restoration pipeline orchestrated in JS
// tool_face_restore_full.js
//!INPUT:  INPUT
//!OUTPUT: OUTPUT
//!PARAM:  THRESHOLD:number=0.35,min=0.05,max=0.9,step=0.05
//!PARAM:  BLEND_FEATHER:number=2.0,min=0.2,max=3.0,step=0.1
//!PARAM:  UPSCALE_BG:boolean=true
//!PARAM:  COLOR_MATCH:boolean=true
//!PARAM:  BLEND:enum(soft|tight|cpp-style)=cpp-style
//!SAMPLE  fast(UPSCALE_BG=false, COLOR_MATCH=false, BLEND=soft)
//!SAMPLE  quality(UPSCALE_BG=true, COLOR_MATCH=true, BLEND=cpp-style, BLEND_FEATHER=2.0)
//!SAMPLE  newsprint(UPSCALE_BG=true, COLOR_MATCH=true, BLEND=cpp-style, BLEND_FEATHER=2.5, THRESHOLD=0.3)

// Replicates the Q-engineering GFPGAN pipeline using OpticScript's
// native ops for everything except the two ML steps:
//
//   1. Engine.tool('gfpgan-detect')   → faces + 3-point landmarks
//   2. JS similarity-transform fit    → forward affine (orig→512)
//   3. img.warpPerspective(512, 512)  → aligned face
//   4. Engine.tool('gfpgan-restore')  → restored 512×512
//   5. inverse affine + warpPerspective → paste back to orig coords
//   6. erode + gaussianBlur on a warped white square → soft mask
//   7. applyMask + blendAt            → composite restored face onto orig
//
// Result: we replace the C++ `gfpgan_align` + `gfpgan_merge` binaries
// (28 MB total) with ~80 lines of JS, gaining live-tunable feather
// width and a pipeline the user can inspect / fork.

// ── Linear algebra: the built-in Matrix toolbox ────────────────────────────
//
// The orig→512 face-alignment fit and its inverse used to be ~55 lines
// of hand-rolled Procrustes math here. They are now one-liners on the
// built-in `Matrix.Mat3` class:
//
//   const fwd = Matrix.Mat3.estimateSimilarity(srcPts, FACE_TEMPLATE);
//   const inv = Matrix.Mat3.clone(fwd).invert();
//
// `Mat3` is column-major (gl-matrix); `img.warpPerspective()` accepts a
// `Mat3` directly and transposes to the engine's row-major layout for
// us — so we never touch raw matrix indices.

// ── Reference template (matches gfpgan_align in mlc-ncnn-img2img) ──────────
// Three keypoints in the 512×512 GFPGAN-trained coordinate system:
//   - left eye, right eye, mouth-centre
// These exact values come from the Q-engineering reference impl.

const FACE_TEMPLATE = [
    [192, 240],  // left eye
    [319, 240],  // right eye
    [257, 371],  // mouth centre
];

// ── Pipeline ──────────────────────────────────────────────────────────────

const orig = Engine.loadImage(INPUT);
const W = orig.width, H = orig.height;

// 1. Detect faces with 3-point landmarks (always in original coords)
const faces = Engine.tool('gfpgan-detect').applyJSON(orig, {
    threshold: THRESHOLD,
});
Engine.log(`detected ${faces.length} face(s)`);

if (faces.length === 0) {
    orig.save(OUTPUT);
    orig.free();
} else {
    // 2. Optional background upscale via Real-ESRGAN. Two reasons it
    //    matters:
    //      (a) GFPGAN always produces 512×512 output. Without upscale,
    //          that gets compressed back to a small face area —
    //          downsampling kills most of the added detail.
    //      (b) The C++ reference pipeline (gfpgan_upscale) ALWAYS
    //          runs Real-ESRGAN on the BG, which subtly denoises +
    //          smooths the photo so the GFPGAN face doesn't visually
    //          stand out as "AI-look-on-raw-photo". Without it you
    //          get a noticeable boundary around restored faces.
    //
    //    No size gate any more — trust the user. For a 2048×1536
    //    input, output is 8192×6144 (~50 MP), ~60 MB PNG. Disable
    //    via UPSCALE_BG=false if memory or time is tight.
    let composite, scale = 1, Wout = W, Hout = H;
    if (UPSCALE_BG) {
        composite = Engine.tool('mlcupscale').apply(orig);
        scale = 4;
        Wout = W * scale;
        Hout = H * scale;
        Engine.log(`background upscaled ${W}×${H} → ${Wout}×${Hout} (4×)`);
    } else {
        composite = orig.clone();
        Engine.log(`background kept at ${W}×${H} (no upscale)`);
    }

    for (let i = 0; i < faces.length; i++) {
        const face = faces[i];

        // 3. Compute the orig→512 similarity transform. `forward`
        //    maps a pixel in the ORIGINAL image to its 512×512 aligned
        //    position; `inverse` is its inverse, mapping aligned
        //    coords back to original coords.
        //
        //    `warpPerspective` takes a **src→dst** (forward) transform
        //    — for each src pixel, where it lands in the dst-grid —
        //    and accepts a `Mat3` directly.
        const srcLandmarks = face.pts.map(p => [p.x, p.y]);
        const forward = Matrix.Mat3.estimateSimilarity(srcLandmarks, FACE_TEMPLATE);
        const inverse = Matrix.Mat3.clone(forward).invert();

        // 4. Warp orig → aligned 512×512. src=orig, dst=512×512, so
        //    we pass the orig→512 forward matrix. Keep this around —
        //    it doubles as a colour-reference for the COLOR_MATCH
        //    step in (5b) below.
        const aligned = orig.clone().warpPerspective(512, 512, forward);

        // 5. Restore via GFPGAN — produces a fresh 512×512 ImageHandle.
        const restored = Engine.tool('gfpgan-restore').apply(aligned);

        // 5b. Optional Reinhard colour transfer: align the restored
        //     face's per-channel mean/std to the original aligned
        //     face's stats. GFPGAN tends to brighten skin + reduce
        //     colour variance — without this step the restored face
        //     shows a distinct tonal jump against the surrounding
        //     photo. Alpha-weighted internally so the
        //     warpPerspective transparent border doesn't skew the
        //     stats.
        if (COLOR_MATCH) {
            restored.colorMatch(aligned);
        }
        aligned.free();

        // 6. Warp restored → composite coords. src=512, dst=Wout×Hout,
        //    so we need the 512→Wout-coords forward matrix. That is the
        //    composition (orig→Wout) · (512→orig), where the first
        //    factor is the uniform background upscale and the second
        //    is `inverse`. Mat3.multiply does `this = this · b`, so
        //    `scaleM.multiply(inverse)` gives exactly scaleM · inverse.
        const inverseScaled = scale === 1
            ? inverse
            : new Matrix.Mat3().fromScaling([scale, scale]).multiply(inverse);
        restored.warpPerspective(Wout, Hout, inverseScaled);

        // 7. Build the blend mask. Three strategies, switchable via
        //    BLEND param so we can compare visually:
        //
        //    "soft"      — single erode + small blur. Quickest, but the
        //                  blur tail can leak into the warpPerspective
        //                  "outside-source = transparent black" zone and
        //                  produce a dark fringe.
        //
        //    "tight"     — same as soft but with a bigger erode, so the
        //                  blur stays strictly inside the warped quad.
        //                  Smaller darkening risk, slightly less smooth.
        //
        //    "cpp-style" — replicates Q-engineering's two-mask trick
        //                  from merge.cpp:
        //                    1. hardMask = warped quad ⊖ small erode
        //                       (defines where the restored face is
        //                       allowed to contribute at all)
        //                    2. RGB is hard-clipped to hardMask via
        //                       applyMask + premultiplyAlpha — so any
        //                       blur-tail outside this boundary sees
        //                       RGB=0 and can't darken the composite
        //                    3. softMask = warped quad ⊖ (small + big)
        //                       eroded, then Gaussian-blurred. Its
        //                       0-fade lies strictly INSIDE hardMask,
        //                       so where alpha > 0 RGB is always valid
        //
        //    Feather scales with face AREA in the upscaled space
        //    (× scale²) so the blend looks the same whether you
        //    upscale or not.
        const faceArea = face.rect.w * face.rect.h * scale * scale;
        const wEdge = Math.max(2, Math.round(Math.sqrt(faceArea) / 20 * BLEND_FEATHER));

        if (BLEND === "cpp-style") {
            // ── C++-style two-mask blend ─────────────────────────────
            const warpedQuad = Engine.createColoredImage(512, 512, [1, 1, 1, 1]);
            warpedQuad.warpPerspective(Wout, Hout, inverseScaled);

            // Hard mask: tiny erode (~2 px) — strips warp anti-aliasing
            // but keeps face content full-strength right up to the
            // boundary, matching cv::bitwise_and(…, inv_mask_erosion).
            const hardMask = warpedQuad.clone().erode(2);

            // Soft mask: bigger erode (wEdge*2 further) + blur. The
            // gradient zone of softMask lies between the inner-erode
            // boundary and the hardMask boundary, never beyond.
            const softMask = warpedQuad.clone().erode(2 + wEdge * 2).gaussianBlur(wEdge);
            warpedQuad.free();

            restored.applyMask(hardMask);     // alpha = hardMask (0 outside)
            restored.premultiplyAlpha();      // RGB *= alpha → 0 outside hardMask
            restored.applyMask(softMask);     // re-set alpha to soft gradient
            hardMask.free();
            softMask.free();

            composite.blendAt(restored, px(0, 0), 1.0, BlendMode.Over);
        } else {
            // ── Original single-mask blend (BLEND="soft" | "tight") ──
            const erodeSize = BLEND === "tight" ? wEdge * 4 : wEdge * 2;
            const blurSigma = BLEND === "tight"
                ? Math.max(0.5, wEdge * 0.3)
                : Math.max(0.5, wEdge * 0.5);

            const mask = Engine.createColoredImage(512, 512, [1, 1, 1, 1]);
            mask.warpPerspective(Wout, Hout, inverseScaled);
            mask.erode(erodeSize).gaussianBlur(blurSigma);

            restored.applyMask(mask);
            mask.free();
            composite.blendAt(restored, px(0, 0), 1.0, BlendMode.Over);
        }

        restored.free();

        Engine.log(`face ${i + 1}/${faces.length}: area=${faceArea.toFixed(0)} feather=${wEdge}px`);
    }

    composite.save(OUTPUT);
    composite.free();
    orig.free();
}

// © 2026 Michael Lechner · mlc OpticScript · https://mlcgo.eu · Elastic License 2.0