Generate sitemap.xml from PageRepository

Problem

Search engines expect a sitemap at https://yoursite/sitemap.xml listing every public URL with its last-modified timestamp. The content already lives in the page tree; you just need an XML view of it.

Recipe

sitemap.xml is not a page (it returns application/xml, not HTML), so the ?Page slots on PageResolving and RouteNotFound do not fit. The right shape is a plugin that subscribes to RouteNotFound, recognises the path, emits the XML directly via header() + echo + exit, and never touches the resolution slot. The listener short-circuits Scriptor's pipeline before the 404 fires.

namespace Acme\Sitemap;

use League\Container\Container;
use Scriptor\Boot\Events\Frontend\RouteNotFound;
use Scriptor\Boot\Frontend\PageRepository;
use Scriptor\Boot\Plugin\Plugin as ScriptorPlugin;
use Scriptor\Boot\Plugin\PluginContext;

final class Plugin implements ScriptorPlugin
{
    public function __construct(private readonly Container $container) {}

    public function register(PluginContext $context): void
    {
        $context->subscribe(RouteNotFound::class, [$this, 'onUnresolved']);
    }

    public function version(): string { return '0.1.0'; }

    public function onUnresolved(RouteNotFound $event): void
    {
        $path = '/' . $event->urlSegments->path(false);
        if ($path !== '/sitemap.xml') return;

        $pages   = $this->container->get(PageRepository::class)->findAll();
        $siteUrl = rtrim(self::detectSiteUrl(), '/');

        $xml  = '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
        $xml .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">' . "\n";
        foreach ($pages as $page) {
            if (! $page->active()) continue;
            $loc = $siteUrl . self::pathFor($page);
            $xml .= sprintf(
                "  <url><loc>%s</loc><lastmod>%s</lastmod></url>\n",
                htmlspecialchars($loc, ENT_XML1 | ENT_QUOTES, 'UTF-8'),
                date('c', $page->updated()),
            );
        }
        $xml .= '</urlset>' . "\n";

        header('Content-Type: application/xml; charset=utf-8');
        echo $xml;
        exit;
    }

    private static function pathFor($page): string
    {
        return $page->slug === '' ? '/' : '/' . $page->slug . '/';
    }

    private static function detectSiteUrl(): string
    {
        $scheme = (($_SERVER['HTTPS'] ?? '') === 'on') ? 'https' : 'http';
        return $scheme . '://' . ($_SERVER['HTTP_HOST'] ?? 'localhost');
    }
}

Three pieces a reader cannot infer:

No resolution slot is filled. The listener short-circuits with exit; the slot stays null but throw404() never runs because the request is already over. Same pattern as the legacy-redirect variant of the Replace 404 with a fallback handler recipe.
pathFor() honours the empty-slug home convention. Since Scriptor's home page is identified by slug = '' (not by id = 1), the sitemap entry for the home page reads /, not //. Themes that use $site->getPageUrl($page) get the same shape for free.
detectSiteUrl() reads the live request rather than the container-bound Site::siteUrl. The Site instance is never constructed when the listener short-circuits; this small helper re-derives the base URL from $_SERVER. For sites with a pinned canonical hostname, hardcode it instead.

Variants

Filter pages by template or pagetype

Public sitemap entries usually exclude internals (legal pages, preview-only templates). Gate the loop:

$publicTemplates = ['basic', 'longform', 'post'];

foreach ($pages as $page) {
    if (! $page->active()) continue;
    if (! in_array($page->template, $publicTemplates, strict: true)) continue;
    // ...
}

Add changefreq + priority

The sitemaps.org schema accepts <changefreq> and <priority> when you can derive them. Most sites do not bother; when you do:

$xml .= sprintf(
    "  <url><loc>%s</loc><lastmod>%s</lastmod>"
  . "<changefreq>%s</changefreq><priority>%s</priority></url>\n",
    htmlspecialchars($loc, ENT_XML1 | ENT_QUOTES, 'UTF-8'),
    date('c', $page->updated()),
    $page->template === 'post' ? 'weekly' : 'monthly',
    $page->slug === '' ? '1.0' : '0.7',
);

Search engines treat these as hints, not directives, so the sitemap stays accurate enough even when the values drift.

Cache the output

The full-tree iteration is cheap for the page counts most Scriptor sites carry (hundreds, not hundreds of thousands), so runtime regeneration on every request is fine. For larger sites, write the XML to a file on the first request of the day and serve the file from a static path; the entire listener becomes a file-existence check plus a readfile + exit.