Sign In

    Enjoy FOSSwire's content? Have it delivered! Subscribe

    Create a mirror of a website with Wget

    GNU's wget command line program for downloading is very popular, and not without reason. While you can use it simply to retrieve a single file from a server, it is much more powerful than that and offers many more features.

    One of the more advanced features in wget is the mirror feature. This allows you to create a complete local copy of a website, including any stylesheets, supporting images and other support files. All the (internal) links will be followed and downloaded as well (and their resources), until you have a complete copy of the site on your local machine.

    In its most basic form, you use the mirror functionality like so:

    $ wget -m http://www.example.com/

    There are several issues you might have with this approach, however.

    First of all, it's not very useful for local browsing, as the links in the pages themselves still point to the real URLs and not your local downloads. What that means is that, if, say, you downloaded http://www.example.com/, the link on that page to http://www.example.com/page2.html would still point to example.com's server and so would be a right pain if you're trying to browse your local copy of the site while being offline for some reason.

    To fix this, you can use the -k option in conjunction with the mirror option:

    $ wget -mk http://www.example.com/

    Now, that link I talked about earlier will point to the relative page2.html. The same happens with all images, stylesheets and resources, so you should be able to now get an authentic offline browsing experience.

    There's one other major issue I haven't covered here yet - bandwidth. Disregarding the bandwidth you'll be using on your connection to pull down a whole site, you're going to be putting some strain on the remote server. You should think about being kind and reduce the load on them (and you) especially if the site is small and bandwidth comes at a premium. Play nice.

    One of the ways in which you can do this is to deliberately slow down the download by placing a delay between requests to the server.

    $ wget -mk -w 20 http://www.example.com/

    This places a delay of 20 seconds between requests. Replace that number, and optionally you can add a suffix of m for minutes, h for hours, and d for ... yes, days, if you want to slow down the mirror even further.

    Now if you want to make a backup of something, or download your favourite website for viewing when you're offline, you can do so with wget's mirror feature. To delve even further into this, check out wget's man page (man wget) where there are further options, such as random delays, setting a custom user agent, sending cookies to the site and lots more.


    MPAA hit with DMCA takedown after GPL violation

    Ars Technica is reporting that the Motion Picture Association of America have been hit with a DMCA takedown notice after offering a download of an Ubuntu-based networking monitoring tool without source code, a direct violation of the GPL.

    The MPAA software was supposedly designed to assist universities in detecting people using software to download and share copyrighted material and was apparently based on the Ubuntu Linux distribution.

    The obvious irony here is that in trying to distribute a piece of software designed to prevent copyright infringement, the MPAA have - infringed copyright by breaking the GPL licence. Leaving that irony aside for a moment though, and it's clear that Matthew Garret, who filed the DMCA takedown on behalf of the Ubuntu team, is sending out a clear message - that GPL violation is exactly the same as violating any other copyright.

    Those who aren't fans of the free software/open source movement often cite their doubt as to whether the GPL is enforceable, especially when you are dealing with individuals rights, who don't necessarily have the legal resources to chase up all possible violators. In this case, the MPAA obviously realised that not complying was definitely not an option, as the commercial arms involved in Ubuntu (and possibly others) would have piped up and put pressure on them.

    Would this have happened with a small GPL'd project with limited resources, however, if a similar violation happened? It's not really an easy question to answer, although there are efforts such as GPL Violations to raise awareness of smaller violations of this nature.

    Enforcing the licences is definitely important. It's important to send the message out to potential exploiters that there are consequences for not following those terms.


    Getting acquainted with GNU screen

    GNU screen is a very neat tool that's included in most Unix-like operating systems. It's a utility that acts as a basic command line window manager, so you can maintain several open terminal sessions within one physical terminal.

    You may ask me at this point why you need to do that when most graphical terminal programs have tabs for multiple sessions, but there are many occasions where GNU screen can be a better choice (for example, when working over SSH so you don't have to open multiple SSH connections). On top of that screen also boasts features such as being able to save console sessions even when your connection is lost.

    It can be a tad tricky to get started with, but thankfully over at Kuro5hin, there is a great tutorial available that can quickly get you up and running using screen.

    Screen is best described as a terminal multiplexer. Using it, you can run any number of console-based applications--interactive command shells, curses-based applications, text editors, etc.--within a single terminal. The desire to do this is what gets most people hooked on screen. I used to start a half-dozen terminal emulators when I logged into my machine: I wanted one to read my email, one to edit my code, one to compile my code, one for my newsreader, one for a shell into my web host, and so on. Now I start one terminal emulator, and run screen in it. Problem solved.

    The other main cool feature of screen is its ability to decouple the terminal emulator from the running programs. This means that you can use screen to keep programs running after you accidentally close the terminal emulator, or even after you log out, and later resume right where you were. It means that the idea of a "session" in which you are running a number of console programs is a free-floating entity that you can bind to any terminal anywhere, or no terminal at all if you want.

    The tutorial is well worth a look especially if you're a fairly new command line user that is starting to become proficient and wants to learn about the benefits of using screen.

    Read it here.

    Calling all FOSSwire readers! We’re looking for your feedback - answer our reader survey and you could win an awesome FOSSwire t-shirt (we’ll ship anywhere on this planet).


    1. 1
    2. 2
    3. 3
    4. 4
    5. 5