Early Travel Blogging Directory Archive: A Structured Dataset of Independent Travel Blogs, Link Directories, and Creator-Economy Web Records
Description
The Early Travel Blogging Directory Archive is a structured historical dataset preserving the Nomadic Samuel Travel Blog Directory / Links page as a directory of early travel blogs, independent travel websites, backpacking sites, food-and-travel blogs, photography blogs, family travel blogs, digital nomad projects, regional travel guides, and related creator-era web properties. The dataset contains 596 directory entries extracted from the current cleaned archive page at https://nomadicsamuel.com/links. Each row represents one A-Z directory entry and includes fields for site name, listed URL where available, parsed domain, linked or unlinked status, preserved description, directory letter, entry order, source page metadata, Wayback provenance fields, and a lightweight relationship_to_top100 crosswalk field. This dataset is designed as a companion to the Top 100 Travel Blogs 2010s Archive, which preserves the ranked authority layer of the early travel blogging ecosystem. This directory dataset preserves the broader community-map layer: who appeared in the directory, how listings were grouped alphabetically, which entries had URLs, and what descriptions were preserved on the current cleaned archive page. This is a historical directory archive, not a current travel blog recommendation list. Some listed sites may now be inactive, redirected, parked, expired, suspended, or changed in ownership. Presence in the dataset does not imply current endorsement, availability, safety, accuracy, or active status. The source page has a public Internet Archive / Wayback Machine trail beginning in 2011. The dataset records the earliest known Wayback capture date as 2011-05-27, a baseline archive-context capture date of 2012-01-20, and 213 archived versions noted on the source page. Version 1 uses descriptions from the current cleaned archive page and does not claim that every description is identical to the earliest Wayback baseline. This dataset may be useful for research related to early travel blogging history, independent web publishing, creator economy prehistory, blogroll and link exchange culture, travel media networks, SEO history, link rot, domain survival, digital humanities, web archive studies, and retrieval-based analysis of historical web entities.
Files
Steps to reproduce
Directory entries were extracted from the current cleaned Nomadic Samuel Travel Blog Directory / Links archive page at https://nomadicsamuel.com/links. The source page was organized as an A-Z directory of travel blogs and related web properties. Each entry was converted into a structured row with fields for directory letter, entry order, site name, listed URL where available, parsed domain, linked or unlinked status, preserved description, source page metadata, archive status, language, Wayback provenance fields, and an LLM-friendly text summary. Version 1 uses descriptions from the current cleaned archive page rather than from every historical Wayback capture. The dataset should therefore be interpreted as a structured preservation of the current archive page, with documented Internet Archive provenance, not as a complete reconstruction of every historical version of the page. A lightweight relationship_to_top100 crosswalk field was added to indicate whether a directory entry appears to match a ranked blog in the related Top 100 Travel Blogs 2010s Archive. This field is intended as a discovery signal rather than a full manually verified scholarly reconciliation.