archive

πŸ›οΈ Website Archiver

Archive entire websites as static snapshots before shutdown.

License: MIT Status: Stable Archive Tool

πŸ—‚οΈ Browse archived sites:
https://archive.helsingborg.io/archive/

🌐 Live archive site:
https://archive.helsingborg.io/


βš™οΈ How It Works

  1. Provide the website’s domain manually.
  2. The GitHub Action fetches the sitemap (/sitemap.xml).
  3. Every page listed is downloaded using wget:
    • All HTML, images, CSS, JS, and assets are saved.
    • Links are converted to relative URLs.
    • External media (e.g. CDN images) are included if listed in EXTRA_DOMAINS.
  4. The archive is stored as:
    /domain/YYYY-MM-DD/
  5. The workflow commits the archived files to the repository.

🧰 Requirements


πŸš€ Usage

  1. Go to Actions β†’ Archive Website.
  2. Enter the URL to archive.
  3. Wait for the workflow to finish.
  4. Find the snapshot under /domain/YYYY-MM-DD/.
  5. View the result in the archive browser.

🧩 Technical Details


πŸ•°οΈ Typical Use Case

Ideal for municipal or organizational website decommissioning.
Run once to permanently preserve a static version for archival or legal purposes.


⚠️ Limitations


πŸ§‘β€πŸ’» Run Locally

SITE_URL="https://example.com" \
EXTRA_DOMAINS=("media.example.com" "cdn.example.com") \
MAX_DEPTH=1 \
bash download.sh

πŸ“œ License

MIT Β© Helsingborg Stad