πŸ—ΊοΈ Sitemap API

Automatically generate sitemaps by crawling any website

API Endpoints

GET /api/sitemap/:domain

Crawls a website and returns all discovered page paths in JSON format.

GET /api/sitemap/:domain/json

Returns sitemap in JSON format (same as above).

GET /api/sitemap/:domain/xml

Returns sitemap in XML format (standard sitemap.xml format for search engines).

Parameters

Path Parameters

Parameter Type Description
domain string The domain to crawl (with or without http/https)

Query Parameters

Parameter Type Default Range Description
depth integer 5 0-10 Maximum crawling depth
limit integer 500 1-1000 Maximum pages to crawl
seeds string[] none - Additional page paths to use as crawl starting points. Useful for discovering pages that aren't linked from the homepage (e.g. hidden or unlisted pages). Can be repeated.

Features

πŸ›‘οΈ Infinite Loop Protection
Tracks visited URLs to prevent crawling the same page twice
⏱️ Timeout Protection
Automatically aborts after 30 seconds
🎯 Same-Domain Only
Only follows links within the target domain
πŸ“„ HTML Pages Only
Skips images, PDFs, and other non-HTML content
πŸ”— Smart URL Handling
Handles relative, absolute, and protocol-relative URLs
🌱 Seed Paths
Add custom starting points to discover unlisted or hidden pages that aren't linked from the homepage

Example Requests

JSON Format

GET /api/sitemap/example.com/json?depth=5&limit=500

XML Format

GET /api/sitemap/example.com/xml?depth=5&limit=500

With Seed Paths

Include unlisted pages as additional crawl starting points:

GET /api/sitemap/example.com/json?seeds=/secret/blog&seeds=/hidden/landing-page

Example Response (JSON)

{
  "domain": "example.com",
  "url": "https://example.com",
  "maxDepth": 5,
  "maxPages": 500,
  "pagesCrawled": 23,
  "totalPaths": 23,
  "paths": [
    "/",
    "/about",
    "/blog",
    "/blog/post-1",
    "/blog/post-2",
    "/contact",
    "/products",
    "/services"
  ]
}

Example Response (XML)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
  <url>
    <loc>https://example.com/blog</loc>
  </url>
  <url>
    <loc>https://example.com/contact</loc>
  </url>
</urlset>

Response Fields

Field Type Description
domain string The domain that was crawled
url string The full URL of the starting page
maxDepth integer Maximum depth used for crawling
maxPages integer Maximum pages limit
pagesCrawled integer Number of pages actually crawled
totalPaths integer Total unique paths found
paths array Sorted array of all discovered paths

Try It Out

Click an example to test the API:

JSON Format:
example.com (JSON) monzoor.framer.website (JSON)
XML Format:
example.com (XML) monzoor.framer.website (XML)

Error Responses

400 Bad Request

{
  "error": "Domain parameter is required"
}

502 Bad Gateway

{
  "error": "Failed to fetch https://example.com: Not Found"
}

500 Internal Server Error

{
  "error": "Failed to generate sitemap",
  "message": "Error details..."
}

Usage Examples

cURL

curl "https://sitemapper.cybercla.dev/api/sitemap/example.com?depth=5&limit=500"

# With seed paths for unlisted pages
curl "https://sitemapper.cybercla.dev/api/sitemap/example.com?seeds=/secret/blog&seeds=/hidden/page"

JavaScript (fetch)

const params = new URLSearchParams({depth: '5', limit: '500'});
// Add seed paths for pages not linked from the homepage
params.append('seeds', '/secret/blog');
params.append('seeds', '/hidden/landing-page');

const response = await fetch(`/api/sitemap/example.com?${params}`);
const data = await response.json();
console.log(data.paths);

Python (requests)

import requests

response = requests.get(
    'https://sitemapper.cybercla.dev/api/sitemap/example.com',
    params={
        'depth': 5,
        'limit': 500,
        'seeds': ['/secret/blog', '/hidden/landing-page']
    }
)
data = response.json()
print(data['paths'])