Sitemap API - Documentation

API Endpoints

GET /api/sitemap/:domain

Crawls a website and returns all discovered page paths in JSON format.

GET /api/sitemap/:domain/json

Returns sitemap in JSON format (same as above).

GET /api/sitemap/:domain/xml

Returns sitemap in XML format (standard sitemap.xml format for search engines).

Parameters

Path Parameters

Parameter	Type	Description
`domain`	string	The domain to crawl (with or without http/https)

Query Parameters

Parameter	Type	Default	Range	Description
`depth`	integer	5	0-10	Maximum crawling depth
`limit`	integer	500	1-1000	Maximum pages to crawl
`seeds`	string[]	none	-	Additional page paths to use as crawl starting points. Useful for discovering pages that aren't linked from the homepage (e.g. hidden or unlisted pages). Can be repeated.

Features

🛡️ Infinite Loop Protection

Tracks visited URLs to prevent crawling the same page twice

⏱️ Timeout Protection

Automatically aborts after 30 seconds

🎯 Same-Domain Only

Only follows links within the target domain

📄 HTML Pages Only

Skips images, PDFs, and other non-HTML content

🔗 Smart URL Handling

Handles relative, absolute, and protocol-relative URLs

🌱 Seed Paths

Add custom starting points to discover unlisted or hidden pages that aren't linked from the homepage

Example Requests

JSON Format

GET /api/sitemap/example.com/json?depth=5&limit=500

XML Format

GET /api/sitemap/example.com/xml?depth=5&limit=500

With Seed Paths

Include unlisted pages as additional crawl starting points:

GET /api/sitemap/example.com/json?seeds=/secret/blog&seeds=/hidden/landing-page

Example Response (JSON)

{
  "domain": "example.com",
  "url": "https://example.com",
  "maxDepth": 5,
  "maxPages": 500,
  "pagesCrawled": 23,
  "totalPaths": 23,
  "paths": [
    "/",
    "/about",
    "/blog",
    "/blog/post-1",
    "/blog/post-2",
    "/contact",
    "/products",
    "/services"
  ]
}

Example Response (XML)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
  <url>
    <loc>https://example.com/blog</loc>
  </url>
  <url>
    <loc>https://example.com/contact</loc>
  </url>
</urlset>

Response Fields

Field	Type	Description
`domain`	string	The domain that was crawled
`url`	string	The full URL of the starting page
`maxDepth`	integer	Maximum depth used for crawling
`maxPages`	integer	Maximum pages limit
`pagesCrawled`	integer	Number of pages actually crawled
`totalPaths`	integer	Total unique paths found
`paths`	array	Sorted array of all discovered paths

Try It Out

Click an example to test the API:

JSON Format:
example.com (JSON) monzoor.framer.website (JSON)

XML Format:
example.com (XML) monzoor.framer.website (XML)

Error Responses

400 Bad Request

{
  "error": "Domain parameter is required"
}

502 Bad Gateway

{
  "error": "Failed to fetch https://example.com: Not Found"
}

500 Internal Server Error

{
  "error": "Failed to generate sitemap",
  "message": "Error details..."
}

Usage Examples

cURL

curl "https://sitemapper.cybercla.dev/api/sitemap/example.com?depth=5&limit=500"

# With seed paths for unlisted pages
curl "https://sitemapper.cybercla.dev/api/sitemap/example.com?seeds=/secret/blog&seeds=/hidden/page"

JavaScript (fetch)

const params = new URLSearchParams({depth: '5', limit: '500'});
// Add seed paths for pages not linked from the homepage
params.append('seeds', '/secret/blog');
params.append('seeds', '/hidden/landing-page');

const response = await fetch(`/api/sitemap/example.com?${params}`);
const data = await response.json();
console.log(data.paths);

Python (requests)

import requests

response = requests.get(
    'https://sitemapper.cybercla.dev/api/sitemap/example.com',
    params={
        'depth': 5,
        'limit': 500,
        'seeds': ['/secret/blog', '/hidden/landing-page']
    }
)
data = response.json()
print(data['paths'])