ArchiveBox

Active

Country: United States (US)
Founded: 2017
Developer: ArchiveBox (open-source project, lead developer not specified)
Pricing: Free (open-source)
Open source: Yes
Platforms: Linux, macOS, Docker

Indexes

Read Later Web Clipping

Overview

ArchiveBox is a self-hosted tool that archives web content from URLs, bookmarks, RSS feeds, and browser history by rendering pages in a headless browser and saving them in formats like HTML, PDF, PNG, and WARC. It extracts media, git repositories, audio, video, and other assets into accessible folders. Designed for individuals preserving personal browsing data, researchers, and organizations for compliance, it supports private content via Chrome profiles but emphasizes local storage over centralized services.

Key Features

Multi-format Archiving - Saves pages in HTML, PDF, PNG, WARC, and other formats using headless Chrome, wget, and yt-dlp.
Content Extraction - Automatically pulls media, articles, git repos, audio, video, subtitles, images, and PDFs from pages.
Import Sources - Supports bulk imports from bookmarks, RSS feeds, Pocket, Pinboard, and browser history.
CLI and Web UI - Operates via command-line interface or optional Django-based web interface for adding and viewing archives.
Scheduled Archiving - Enables scheduled or realtime importing from various sources.
Authenticated Archiving - Uses Chrome user profiles with cookies to access login-required or paywalled content.
Modular Dependencies - Bundles tools like Chrome, wget, readability, and supports storage backends like S3 or Google Drive.

Pricing

Plan	Price	Includes
Community	Free	Full open-source features, self-hosted on own hardware.
Self-Hosted	Free	CLI, web UI, API access with custom configuration and storage.

Platforms & Requirements

Runs on Linux, macOS, and via Docker on any system supporting it; requires Node.js, Python 3, and Chrome/Chromium. Minimum hardware includes 2GB RAM for basic use, more for large archives. Windows support is limited to Docker or WSL.

Integrations & Ecosystem

Browser history (Chrome, Firefox)
Bookmarks (HTML, JSON)
RSS feeds
Pocket/Pinboard
yt-dlp for media
S3, Google Drive, NFS/SMB storage
REST API (alpha)
Python API (beta)

Alternatives

App	Difference
Webrecorder	Browser-based recorder focused on interactive session capture rather than bulk CLI imports.
SingleFile	Lightweight single-HTML archiver without multi-format extraction or scheduling.
Wallabag	Read-it-later service with article extraction but less emphasis on full-page and media archiving.
Archive.org's Save Page Now	Public web service for single URLs without private content or local storage support.

Reputation

ArchiveBox is regarded as a robust, privacy-focused archiving solution for power users comfortable with self-hosting and CLI tools. Strengths include comprehensive format support and extraction capabilities beyond public services. Criticisms center on setup complexity, dependency on Chrome, and security caveats for authenticated archiving, with warnings against using personal profiles until fixes are implemented.

Sources (10)