Saving Bandwidth With Apt-Cacher : Revisited

By | 2007/10/03

Those of you that are long time readers may remember my previous article on Saving Bandwidth With Multiple Machines Using Apt-Cacher. With the next Ubuntu release coming down the pike in just a few weeks I wanted to revisit this article for those of you that will be upgrading via aptitude. If you have multiple machines you’ll really want to look into setting this up!

I’ve been testing and retesting different hardware running Ubuntu 7.10 alpha and now beta releases. Nothing is worse than installing two machines and downloading the same updates twice, one for each machine. Apt-Cacher steps in at this point and lets us just download the update once to then be shared by all. Let’s revisit the setup:

Installing Apt-Cacher

The first thing you’ll need to do is select a central machine that you’ll want to act as your apt-caching service. I use one of my old servers sitting in the closet, but this could be one of your desktops if you like. You should note that this machine will need to be on any time another networked machine wants to request an update. On this single machine we’ll install the apt-caching service:

sudo aptitude install apt-cacher

Configure Apt-Cacher

You’ll also want to set this service to auto start at boot time. To do this make the following change:

sudo vim /etc/default/apt-cacher

change AUTOSTART=0 to AUTOSTART=1

You’ll also want to updated the access restrictions to allow your local machines access to this service. By default only localhost (127.0.0.1) will be allowed.

sudo vim /etc/apt-cacher/apt-cacher.conf

change allowed_hosts=127.0.0.1 to allowed_hosts=192.168.0.0/24 (update to your range as appropriate)

Once you’ve made these changes you’ll need to restart the apt-cacher service on the central machine:

sudo /etc/init.d/apt-cacher restart

Configure the Clients

You can configure your client machines (the other machines on your network) to use this apt-caching system with a simple edit to the apt sources.list file. Make sure you know the IP address of the apt-caching server you configured above.

If your sources.list currently looks something like this:

deb http://archive.ubuntu.com/ubuntu/ gutsy main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ gutsy-updates main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu/ gutsy-security main restricted universe multiverse

prefix the address to include the caching-server IP address: (replace 192.168.0.3 with your local server IP)

deb http://192.168.0.3:3142/archive.ubuntu.com/ubuntu/ gutsy main restricted universe multiverse
deb http://192.168.0.3:3142/archive.ubuntu.com/ubuntu/ gutsy-updates main restricted universe multiverse
deb http://192.168.0.3:3142/security.ubuntu.com/ubuntu/ gutsy-security main restricted universe multiverse

Once you’ve done this to the clients you should be able to fetch and install updates via the central machine and save bandwidth. Once a package has been fetched for one machine it will stay available centrally for each of the others therefore saving bandwidth and increasing update times by using LAN speeds vs WAN speeds.

If you’re interested in more information see my previous post on the topic or see the man page for apt-cacher (man apt-cacher).

15 thoughts on “Saving Bandwidth With Apt-Cacher : Revisited

  1. Matt Mossholder

    The alternate approach on your clients is to leave the urls in sources.list alone, and configure a proxy server in apt.conf that points to apt-cacher. It makes it a lot easier to turn it on and off, as well, as requiring less effort.

    Reply
  2. Soren Stoutner

    Mr Troll,

    When it comes to something as important as package management, I prefer to not have my computers automatically doing anything, especially selecting new repositories. Perhaps having that functionality automated would be nice, but it should never be enabled by default. The setup here is truly not that difficult and basically involves 1) installing the apt-cacher package, 2) turning it on, and 3) pointing the other boxes to it. That’s exactly how it should be.

    Now, if you want to make the argument that there should be a nice little gui for the process that shows you all the options and automatically scans for available cache servers on the network and provides them in a list that you can choose from, that is all well and good (as long as you are offering to write the gui yourself). But please don’t try to make the computer smarter than I am and do a bunch of dumb things because it is trying to “help” me. I work on networks for a living and get paid a whole bunch of money to fix those exact types of problems in other operating systems.

    Reply
  3. Karl Bowden

    Hey Christer,
    I would also recommend approx.
    It seems the approx and apt-cacher do almost exactly that same thing and setup of approx was very much the same except the domain is defined in the approx setup file instead of on the client sources.list.
    The only advantage I had with this is that when one of the mirrors I was using started timing out, I simply changed the mirror name in approx.conf, restarted approx and then all of the machines using it as it’s source were still functioning as normal.

    Karl

    Reply
  4. Richard

    The key line to put in a file under /etc/apt/apt.conf.d is:

    Acquire::http::Proxy “http://cache-host:3142″;

    Reply
  5. Thorne

    Well I need some info about this apt-cacher.
    I have dapper (lts) and feisty and guttsy. how can I set it up to use this “apt-cacher”. Do i i use different sub directory’s for each of have them all going in the same directory??
    I would very much like to use this program for my repository cache but just don’t know how to set it up to use it. can anyone please help…

    Reply
  6. Ivan

    Also, if you change sources.list, sudo apt-get update is required.

    Reply
  7. Mark Waters

    I’d like to thank both Christer for the article and Brian for his comments.

    I got apt-cacher-ng installed last night for our small network and its working perfectly saving us bandwidth and time.

    Good work guys!
    Mark Waters

    Reply
  8. Alec

    Hi – just wanted to say thanks to Christer and everyone else for their comments.

    I’ve just installed apt-cacher again, and I came back to these same instructions!

    Very helpful instructions, so I was pleased to find that they are still around.

    Many thanks,
    Alec Simpson

    Reply
  9. Alec

    @Alec

    PS – one thing I did do different to the script was to use the hostname in place of the host IP address.

    Seems to be working well for me, and it removes dependency on static IP addresses.

    Any comments on whether this was a good idea are welcome!

    Alec

    Reply
  10. Alex

    What follows is a workaround applicable for two identical machines, one of the two being guest on the other through VirtualBox, but I suppose it works even if the two machines are physically divided.
    I’m new to Linux therefore this guide cannot be perfect as I would like it to be, but I’m sharing this idea as far I did not find any equivalent around the web.
    I face the case where I’m running a Debian 64 machine used to learn Linux while playing around, and a VirtualBox guest (Debian 64 as well) to prove my experiments were not flushing my disks (as it used to happen almost twice a week before using VB as a scapegoat!).
    It is a pity to download every .deb package twice, one time for the host and one time for the guest machine, while copying the files periodically by hand is too error prone to be a proper solution. I do not want to cope with rsync (I’m not yet there, but one day…) and I found that apt-cacher is MISSing more than 90% of the requests (the log is showing one HIT out of 30 MISSES… whatever I did try to make it work).
    Actually apt-cacher is working better with the guest machine bridged on the eth0 connection, therefore having its own IP address coming form the DHCP of the home router, but the improvement was still too far from optimal.
    While trying to have apt-cacher ready to deliver the expected service (I found somebody advising to pass to apt-cacher-nd, but I am still convinced to have the former working as it should be doing) I tried to link the two folders, in order to practically merge the archives directories.
    The only requirement (in my initial thought) was to have one shared folder between the host and the guest and have the /var/cache/apt/archives pointing one another but this cannot work because the two file systems are definitively separate, resulting in a name clash.
    In fact, suppose to have the host machine being the primary package down-loader, therefore deciding to have the guest pointing at its /archives folder, and a folder (name it /home/barackobama/VB_swap) available to be shared with the guest (e.g. mounting it as /media/sf_VB_swap).
    The solution would be to delete the folder /home/barackobama/VB_swap and substitute it with a sym link called VB_swap -> (pointing to) /var/cache/apt/archives while at the same time delete the /archives folder on the guest machine and substitute it with a sym link called /archives -> (pointing to) /media/sf_VB_swap/.
    The problem arises when the guest tries to access its /archives folder, being then directed to /media/sf_VB_swap/ and, at its turn, redirected again to /var/cache/apt/archives. This last targeted /archives folder is still the one of the guest, for sure not the aimed host’s /archives folder, resulting in an infinite loop among links. Moreover, the host archives cannot be seen by the guest without some ../../../whatsoever/../root/../var/cache/apt/archives link, climbing back to the root of the host’s file system and so on and on and on….(supposing it can be reached, read and opened by the guest).
    In fact the guest’s root user name is in general different from the host’s root user, may the standard permissions won’t allow access nor even reading rights (to solve this apply a chmod “o+xrw” to /var/cache/apt/archives/ on the host machine).

    After a few paper sketches on my wall I found the simple way to do it without an other deamon running to serve the purpose.
    Just add to the /etc/apt/conf.d/ folder a file to change the standard repository, and do it for both the guest and host. This way worked for me:
    FOR THE HOST:
    – su
    – touch /etc/apt/apt.conf.d/00ANewRepDir
    – nano this file to add the line: Dir::Cache::archives “/data/virtualBoxSwap/current_apt_cache/archives/” and save it
    – create the folder /data/virtualBoxSwap/current_apt_cache/archives and give full permissions to everybody, even osama…. otherwise it won’t work
    – move all the actual repository in /var/cache/apt/archives into this new folder
    – log out and log in again (may not really necessary, but no more time to lose for trial and error!) to be sure apt gets the green light to the change.

    FOR THE GUEST
    – su
    – touch /etc/apt/apt.conf.d/00ANewRepDir
    – nano this file to add the line: Dir::Cache::archives “/media/sf_virtualBoxSwap/current_apt_cache/archives/” and save it (suppose the shared directory is mounted on media and have a name starting with sf_ as it is in most recent versions of Virtualbox)
    – log out an in again for the same old silly reason (remember you are now root, so log out and exit twice!)
    – launch an “apt-cahe gencaches”

    that’s don! Try to install a new package, if it is already been installed on the host, the guest will simply expand it and install, otherwise it will dowloaded first from the genuine apt reposistory on debian.org, making it available for the host to be installed whenever required without any further httprequest.
    Easy as much as stealing sweets from Mr Ballmer.

    There are a lot of pros in this solution:
    – no services or deamons running, consuming ram and watts,
    – no more useless duplication of repositories (note that even apt-cacher is doubling everything, it just saves you the download, for sure it does not saves you from storing the second file, to be available on the file system of the guest.

    Anyhow I admit it is not the less questionable solution, mainly for two reasons:
    – there are two different systems that are reading (and sometime writing) to the same location, the kernel is ready to handle it, but the two programs (both called apt-get) will try to lock the folder adding into it an empty file called Lock to be sure nobody else is concurring on the folder. In that case permissions to get the handle for one process will be blocked. It happened to me already twice in a couple of weeks, so I’ll delete the file manually and possibly one day I will write a Python script to automatically alias a call to apt-get with a sequence of “rm Lock” and then appending the real apt-get request.
    This just to be sure this misbehaving case won’t be reported again on my Konsole screen.
    – there is a big (evidently too big) compromise on the principle of data segregation. The idea to allow everybody to write to a so sensitive folder is not nice.
    In most cases our home machines won’t subjected to cracking, but this is a clear way to have whoever changing your .deb packages with something you may did not expect to be downloaded automatically by your PC.

    I really hope this (first) post of mine will be useful for somebody else.

    EDIT:
    After having written this acticle, i found this (http://techsleek.wordpress.com/2013/04/21/how-to-set-up-a-repository-cache-with-apt-cacher/) but did not try it yet. I will give it a try to see if it is better that sharing folders.

    Cheers
    Alessandro Parma

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *