🏛️ Website Archiver

Archive entire websites as static snapshots before shutdown.

🗂️ Browse archived sites:
https://archive.helsingborg.io/archive/

🌐 Live archive site:
https://archive.helsingborg.io/

⚙️ How It Works

Provide the website’s domain manually.
The GitHub Action fetches the sitemap (/sitemap.xml).
Every page listed is downloaded using wget:
- All HTML, images, CSS, JS, and assets are saved.
- Links are converted to relative URLs.
- External media (e.g. CDN images) are included if listed in EXTRA_DOMAINS.
The archive is stored as:
/domain/YYYY-MM-DD/
The workflow commits the archived files to the repository.

Ideal for municipal or organizational website decommissioning.
Run once to permanently preserve a static version for archival or legal purposes.

SITE_URL="https://example.com" \
EXTRA_DOMAINS=("media.example.com" "cdn.example.com") \
MAX_DEPTH=1 \
bash download.sh