LAT Database Producer Archives Newspaper Website Home Pages

Interesting sideline project from Ben Welsh, a database producer at the LA Times. It’s called PastPages and has been set up to record hourly snapshots of the home pages of various newspaper websites.

There are currently about 80 publications being tracked, everything from the LAT and TMZ to Le Monde and The Guardian. Welsh has managed to quickly exceed a Kickstarter fundraising goal of $5,000 and tells he’s looking forward to building out his non-profit venture:

Currently, the site just takes an image snapshot of the front pages but in the future PastPages will scrape and host all the HTML, images and code running on the website. This will create an archive which is searchable by keyword and there are also plans to create an API which would allow other programmers to create new projects and mash-ups with the site’s data…

“I would view the site as a success if someone was studying the media coverage of the US election and came to me and said, ‘could you give me the database of everything you have?’ That would be really satisfying.”

Welsh, who launched PastPages earlier this month, says he got the idea last year during Arab spring. He’s sketching out his expansion plans here.