02.03.08
Complete Site Archives, Posts as PDF Downloads

One reader has suggested that we make this Web site easier to download or explore in PDF format. An archives page was born just minutes ago:
This is a large page, so be patient.
Each post can be downloaded as a PDF, but it remains rather complicated to produce PDFs for entire months, weeks, categories, or even one PDF containing the entire site. Would there be any interest in this? It requires a great amount of effort and time. █





















anonymous said,
February 3, 2008 at 12:34 pm
“but it remains rather complicated to produce PDFs for entire months, weeks, categories, or even one PDF containing the entire site. Would there be any interest in this? It requires a great amount of effort and time.”
Yes, it would be worth it, even if you were to work on archiving one week at a time and putting it up slowly. Consider how many worthwhile pro-Linux sites go down and how many articles and links are lost, not everything can be picked up by archive.org and even then sometimes articles there are lost or (gasp) requested for removal by the former site owner or even the enemy posing as someone legit! Take for example a lot of the old Corel Linux and Microsoft articles which are vanishing from the web at a fast pace.
Your site is a goldmine, please preserve it so people may pour over past articles just as we search the web for old news and research about old events which tie into new and current events!
Thank YOU! It will be worth the effort, I assure you. While it will probably require manual efforts, couldn’t the process be automated somehow via script, perhaps perl? Ask on Freenode IRC someone should know.
Cheers!
Repre Hendor said,
February 3, 2008 at 4:16 pm
PDFs for all the site would only be worth while having if you could make sure they had working hyperlinks embedded. (And also, if they would have a readable hyperlink target attached to each link, so that it doesn’t get lost when printed).
Otherwise, consider all advices to go for PDFS as just a weapon used by your adversaries to make you do useless work and waste time instead of doing useful research and publications…
Roy Schestowitz said,
February 3, 2008 at 7:51 pm
The bigger concern I’ve had (same problem at Groklaw) is the 404 graveyard that is left over time where external links are concerned. The PDFs preserve none of the peripheral resources, so maybe it’s worth automating archival of hyperlinks with wget. Me and some friends thought about scripting this about a year ago, but it never materialised. I’ll have a think about it and see what takes the least effort and preserves the most.