So I am supposed to be studying for my Wilderness First Aid Course. But, most of the content is online… to make things worse it turns out it is a super long weekend so the libraries are all closed. I didn’t want to impose on the Hart’s but I was kind of reliant on an internet connection. Because of this I set about creating a mirror. I tried HTTrack, Offline ExplorerTeleport Pro but none of these worked and it looked like I would have to resort to a fairly lengthy process using wget as described on this blog: http://lotomas.net/2012/09/how-to-mirror-a-moodle-site-with-wget-and-dont-die-trying-it/comment-page-1/#comment-199485

And so I started…

Wow! After many many hours I finally have this kind of working. Thanks for information Tomas. I would never have got this working without you!

So…. I installed Virtualbox and then a Linux OS. Made a share folder. Downloaded wget source. The latest version is wget-1.19.1… but the last one in the list is 1.19 which turns out has some bugs and wouldn’t compile, so lots of time wasted there… in 1.19.1 line 422 is the one you’ll need to find (link to my modified file).

Change:

if (descend)
to:
if (descend && acceptable (file)) //TRM

Spent a long time downloading dependencies… here is a possibly incomplete list that I needed along with the error/warning:

CONFIGURE:

configure: error: C compiler cannot create executables
sudo apt install libc6-dev

configure: error: Package requirements (gnutls) were not met:
No package ‘gnutls’ found
apt-get install libgnutls-dev

MAKE: (maybe these weren’t that important… maybe I was using 1.19.1?)

WARNING: ‘aclocal-1.15’ is missing on your system.
sudo apt-get install automake

WARNING: `makeinfo’ is missing on your system. You should only need it if
sudo apt-get install texinfo

I might have needed to run autoreconfigure as well… but I did so many things I can’t be sure this was necessary…

So after that I still had some issues…

Length: unspecified [text/html] Protocol error
Cannot write to (Protocol error).

This ended up being because the Virtualbox share was on Windows so there were some invalid character in filenames. Adding –restrict-file-names=windows to the wget call resolved this.

After that things started working. I don’t know if I retrieved the cookies file with wget or the Chrome extension cookies.txt. I think both ended up being ok, but I was having problems with the site policy page for a while and the mirror seemed not to be getting past that…

Once I get the mirror happening I noted that it seemed to be downloading the calendar pages still so added *?time=* to the –reject list. I probably should have added some other things (maybe Moodle has changed how they structure pages or something), but I let it run…
I was a little concerned/disappointed when I opened the just finished mirror… Most of the content was missing… but the images and sound files where in the root directory!?
Some digging, and I found that pages I wanted were in ./mod/page. For some reason wget hadn’t mirrored the site properly and many of the links weren’t working. BUT the html files in the ./mod/page directory are all ordered and the links in each “lesson” in the tree on the left of the Moodle page worked, so I have a mostly useful mirror. Enough for me to study without internet… it just took longer than I would have liked and the long weekend is nearly finished! So much procrastinateneering!

Thanks!