Generate Sitemap and RSS for Vercel Next.js App with Dynamic Routes
Recently, I had an interesting problem to solve — generate a sitemap.xml
and an RSS feed endpoint for this blog which is built with Next.js and uses Sanity as the content backend. The post pages are dynamic routes that fetch content from Sanity for pre-rendering. The tricky part is to make sure the sitemap and RSS are in sync with the pages dynamically generated. In this post, I will share how I solve it by generating the files using a postbuild
script and config Vercel Routes to handle the request.
The solution I took is specific to my setup in which I deploy my serverless Next.js app to the Vercel platform. But the concept should be transferable to your use cases.
The "Problem"
There are no APIs in Next.js that generate a sitemap or RSS based on the complied build results. Solutions suggested usually only work without dynamic routes.
Requirements
- The generation of the files(
sitemap.xml
andfeed.json
) should be automated in the build process. - The generation should base on the pre-rendered HTML files since the content stored in my CMS needed to be serialized into React components(same if you are using MDX) and I don't want to duplicate the work that Next.js has done for me.
- The files should be built ahead of request. That means I am not using Next.js API routes to perform the task even though it is possible and valid.
Solution
Overview
I created a postbuild
script that searches for any pre-rendered HTML files inside the output pages
folder and parses the files using cheerio
to extract data. The data will then be used to create sitemap.xml
and feed.json
which will be written to the output static
folder. Lastly, I configured the vercel.json
file to route the requests to /sitemap.xml
and /feed.json
to the corresponding files in the static folder. Below are some implementation details:
Add a postbuild
script in package.json
:
It will be run by automatically npm
or yarn
after build
script is executed. (All scripts support pre
and post
hooks. Learn more about that here)
{"scripts": {"dev": "next dev","build": "next build","start": "next start","postbuild": "node ./scripts/postbuild"},"...": "..."}
Create /scripts/postbuild.js
I will break it down piece by piece. First the main function:
function main() {const pagesDir = './.next/serverless/pages';const pageFiles = getPageFiles(pagesDir);buildRss(pageFiles, pagesDir);buildSiteMap(pageFiles);}
The output pages directory in Vercel environment is located in ./.next/serverless/pages
. If you want to test in local environment, the output pages directory is located in ./.next/server/static/${buildId}/pages
and you can find the buildId
in ./.next/BUILD_ID
.
The getPageFiles
simply collect all the HTML files (excluding 404.html) in the output pages
directory.
const fs = require('fs');const path = require('path');function getPageFiles(directory, files = []) {const entries = fs.readdirSync(directory, { withFileTypes: true });entries.forEach(entry => {const absolutePath = path.resolve(directory, entry.name);if (entry.isDirectory()) {// wow recursive 🐍getPageFiles(absolutePath, files);} else if (isPageFile(absolutePath)) {files.push(absolutePath);}});return files;}function isPageFile(filename) {return (path.extname(filename) === '.html' &&!filename.endsWith('404.html'));}
After I collect all absolute paths of HTML files, I pass them to buildRSS
and buildSitemap
and use cheerio
to parse the content. (Learn more about cheerio) It is very unlikely that you can use the following code without modifying because how I use cheerio
to parse the data is based on the HTML structure of my React components.
function buildRss(pageFiles, pagesDir) {// use the reduce method to collect all RSS dataconst rssData = pageFiles.reduce((data, file) => {// the pathname is the relative path from '/pages' to the HTML fileconst pathname = path.relative(pagesDir, file).slice(0, -'.html'.length);// collect all RSS top level info in the index pageif (pathname === 'index') {const htmlString = fs.readFileSync(file, 'utf8');const $ = cheerio.load(htmlString);data.title = $('title').text();data.home_page_url = $(`meta[property='og:url']`).attr('content');data.feed_url = $(`link[rel='alternate'][type='application/json']`).attr('href');data.description = $(`meta[name='description']`).attr('content');data.icon = $(`link[sizes='512x512']`).attr('href');data.favicon = $(`link[sizes='64x64']`).attr('href');}// only add to RSS if the pathname is '/blog/*'if (pathname.startsWith('blog')) {const htmlString = fs.readFileSync(file, 'utf8');const $ = cheerio.load(htmlString);// remove the placeholder image for lazy loading images$(`#Content img[aria-hidden='true']`).remove();data.items.push({url: $(`meta[property='og:url']`).attr('content'),id: pathname.substring('blog/'.length),content_html: $('#Content').html(),title: $('article h1').text(),summary: $(`meta[name='description']`).attr('content'),image: $(`meta[property='og:image']`).attr('content'),banner_image: $(`meta[property='og:image']`).attr('content'),date_published: $('time').attr('datetime'),author: {name: $(`a[rel='author']`).text(),url: $(`a[rel='author']`).attr('href'),avatar: $(`img#Avatar`).attr('src'),},});}return data;},{version: 'https://jsonfeed.org/version/1',items: [],});// sort the items by the publishing daterssData.items.sort(byDateDesc);// write to the output static folderfs.writeFileSync(path.join('./.next/static', 'feed.json'),JSON.stringify(rssData, null, 2));}
function buildSiteMap(pageFiles) {// I am using the open graph URL tag as the url// but you can simply concat the base Url with the relative pathconst urls = pageFiles.map(file => {const htmlString = fs.readFileSync(file, 'utf8');const $ = cheerio.load(htmlString);return $(`meta[property='og:url']`).attr('content');});const sitemap = `<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"xmlns:xhtml="http://www.w3.org/1999/xhtml"xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0"xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">${urls.map(url => `<url><loc>${url}</loc><changefreq>daily</changefreq><priority>0.7</priority></url>`).join('')}</urlset>`;fs.writeFileSync(path.join('./.next/static', 'sitemap.xml'), sitemap);}
Finally, here is the vercel.json
routing configuration which you can learn more here.
{"routes": [{ "src": "/sitemap.xml", "dest": "/_next/static/sitemap.xml" },{ "src": "/feed.json", "dest": "/_next/static/feed.json" },]}
You can find the complete code in this gist.
Wrap up
So there you have it, my hack(or not?) to put together the sitemap and RSS generation after countless trials and errors and googling. It relies on knowing the location of the output files in the Vercel environment which may change in the future as Next.js and Vercel update so keep an eye on it. This might not be the most straightforward method so please let me know your implementation. Thank you for reading. Until next time! 👋
I am redesigning my blog to integrate a CMS. (before that I used MDX to write my posts) I figured it's another good chance to learn new tech stacks. After playing with several headless CMSs, I picked Sanity.io to be used with my go-to SSG/SSR framework Next.js. I also tried out TailwindCSS to learn about what's going on with this trendy utility-first CSS. I will share my learning progress and tips I found regularly.