GNU’s wget command line program for downloading is very popular, and not without reason. While you can use it simply to retrieve a single file from a server, it is much more powerful than that and offers many more features.
One of the more advanced features in wget is the mirror feature. This allows you to create a complete local copy of a website, including any stylesheets, supporting images and other support files. All the (internal) links will be followed and downloaded as well (and their resources), until you have a complete copy of the site on your local machine.
In its most basic form, you use the mirror functionality like so:
$ wget -m http://www.example.com/
There are several issues you might have with this approach, however.
First of all, it’s not very useful for local browsing, as the links in the pages themselves still point to the real URLs and not your local downloads. What that means is that, if, say, you downloaded http://www.example.com/, the link on that page to http://www.example.com/page2.html would still point to example.com’s server and so would be a right pain if you’re trying to browse your local copy of the site while being offline for some reason.
To fix this, you can use the -k option in conjunction with the mirror option:
$ wget -mk http://www.example.com/
Now, that link I talked about earlier will point to the relative page2.html. The same happens with all images, stylesheets and resources, so you should be able to now get an authentic offline browsing experience.
There’s one other major issue I haven’t covered here yet - bandwidth. Disregarding the bandwidth you’ll be using on your connection to pull down a whole site, you’re going to be putting some strain on the remote server. You should think about being kind and reduce the load on them (and you) especially if the site is small and bandwidth comes at a premium. Play nice.
One of the ways in which you can do this is to deliberately slow down the download by placing a delay between requests to the server.
$ wget -mk -w 20 http://www.example.com/
This places a delay of 20 seconds between requests. Replace that number, and optionally you can add a suffix of m for minutes, h for hours, and d for … yes, days, if you want to slow down the mirror even further.
Now if you want to make a backup of something, or download your favourite website for viewing when you’re offline, you can do so with wget’s mirror feature. To delve even further into this, check out wget’s man page (man wget) where there are further options, such as random delays, setting a custom user agent, sending cookies to the site and lots more.



rsync wrote:
rsync is a far more reasonable and well-suited tool for this purpose. Using the right tool for the right job is a key to being a better admin.
# Posted on 22-Apr-08 at 2:38 pm
fsdaily.com wrote:
Story added…
This story has been submitted to fsdaily.com! If you think this story should be read by the free software community, come vote it up and discuss it here:
http://www.fsdaily.com/EndUser/Create_a_mirror_of_a_website_with_Wget…
# Posted on 22-Apr-08 at 2:48 pm
Kyle wrote:
rsync is used for backing up a file system when you have ssh access to it. wget on the other hand can be used on any public website even if you dont have ssh/ftp access.
# Posted on 22-Apr-08 at 3:07 pm
Shiv wrote:
Awesome Tip !
It is very good for web developers who want to develop a similar kind of website.
Thanks for the post.
# Posted on 22-Apr-08 at 3:19 pm
Stuart wrote:
@Shiv: remember, the author(s) of the website you’re downloading has copyright over the *design*, including whatever code or markup powers it. Copyright does NOT just cover content!
So changing all the content but keeping the exact same layout and code may, in some cases, be an infringement that leads to somebody getting angry and asking you to remove your all-too-similar website.
Of course, if the design is very common, this probably doesn’t apply, or if the site design is open-sourced, e.g. Wordpress.
# Posted on 22-Apr-08 at 3:48 pm
IKTeroak :: Egizu zure webgunearen segurtasun kopia bat Wget-ekin :: April :: 2008 wrote:
[…] FOSSwire gunean wget komandoaren erabilera azaltzen dute zerbitzari batean duzun webgune baten kopia bat zure ekipora ekartzeko. Tutorial txiki honen bitartez zure web orriaren CSS fitxategiak, irudiak eta bestelako fitxategiak gorde ahal izango dituzu lokalean, barne lotura guztiak errespetatzen direlarik. Prozesuan kontutan hartu beharreko hainbat xehetasun ere oso modu garbian egiten dira. […]
# Posted on 22-Apr-08 at 5:11 pm
E wrote:
This is only good if you want to make the end result static. If your site is a true dynamic site running something like PHP and you run these commands you only end up with a static representation of the site as it was at that time. It’s not a true mirror. This is only good if you want to mirror static content like downloads or pictures to another site.
# Posted on 22-Apr-08 at 5:24 pm
Simon Hibbs wrote:
httrack is more feature complete for web site mirroring, but also more complex.
# Posted on 22-Apr-08 at 6:18 pm
Todd wrote:
Another valuable option is -np for no parent. Say you just want to mirror http://example.com/subfolder/. by default wget will mirror http://example.com/subfolder/ and go up to the parent folder (example.com in this case) and grab everything there. So the final command that I usually use on sites:
wget -mk -w 20 -np http://example.com/subfolder/
Also look into the screen command so you can “background” this and check on the status every so often.
# Posted on 22-Apr-08 at 8:44 pm
Create a Local Website Mirror with Wget [Linux Tip] · TechBlogger wrote:
[…] is both considerate and wise. Hit the link for details on using wget for offline website access. Create a mirror of a website with Wget […]
# Posted on 22-Apr-08 at 10:14 pm
Paul William Tenny wrote:
You probably want -N if you intend on doing subsequent mirrors, it’ll only refresh local copies of files if the remote version is newer.
# Posted on 22-Apr-08 at 10:58 pm
用wget创建网站的镜像 - 冰古blog wrote:
[…] 更详细,请访问FOSSwire Tags: linux, shell, SSH, wget You can follow any responses to this entry through the RSS 2.0 […]
# Posted on 23-Apr-08 at 5:28 am
Zhenyi wrote:
… Mirror the whole internet
# Posted on 23-Apr-08 at 11:59 am
FOSSwire » More advanced wget usage wrote:
[…] recently covered how to make a mirror of a website with GNU’s wget command line program and in the comments of that post there were several […]
# Posted on 23-Apr-08 at 5:25 pm
Sharjeel Sayed wrote:
Any idea how we can use this to mirror del.icio.us ?
# Posted on 24-Apr-08 at 5:26 am
Create a Local Website Mirror with Wget [Linux Tip] | SyncEXPERT :: Synchronizes Data in a Bliss wrote:
[…] is both considerate and wise. Hit the link for details on using wget for offline website access. Create a mirror of a website with Wget […]
# Posted on 25-Apr-08 at 3:18 pm
Mirror sites with wget « 0ddn1x: tricks with *nix wrote:
[…] Mirror sites with wget Filed under: Linux — 0ddn1x @ 2008-04-25 17:35:03 +0000 http://fosswire.com/2008/04/21/create-a-mirror-of-a-website-with-wget/ […]
# Posted on 25-Apr-08 at 5:35 pm
xajckop wrote:
Serbian version of that tip added to my blog.
# Posted on 30-Apr-08 at 1:25 pm
links from dupola’s bookmarks.» Blog Archive » links for 2008-05-08 wrote:
[…] FOSSwire » Create a mirror of a website with Wget (tags: wget putty ssh) […]
# Posted on 08-May-08 at 4:36 pm
links for 2008-05-08 « dupola’s weblog(en) wrote:
[…] FOSSwire » Create a mirror of a website with Wget (tags: wget putty ssh) Possibly related posts: (automatically generated)links for 2008-04-05links for 2008-03-07 Posted by dupola Filed in bookmarks […]
# Posted on 08-May-08 at 4:37 pm