Showing posts with label backup tools. Show all posts
Showing posts with label backup tools. Show all posts

Wednesday, August 4, 2010

System Administration - Overview

Taming the Beast

From Issue #191
March 2010

The right plan can determine the difference between a large-scale system administration nightmare and a good night's sleep for you and your sysadmin team.

As the appetite for raw computing power continues to grow, so do the challenges associated with managing large numbers of systems, both physical and virtual. Private industry, government and scientific research organizations are leveraging larger and larger Linux environments for everything from high-energy physics data analysis to cloud computing. Clusters containing hundreds or even thousands of systems are becoming commonplace. System administrators are finding that the old way of doing things no longer works when confronted with massive Linux deployments. We are forced to rethink common tasks because the tools and strategies that served us well in the past are now crushed by an army of penguins. As someone who has worked in scientific computing for the past nine years, I know that large-scale system administration can at times be a nightmarish endeavor, but for those brave enough to tame the monster, it can be a hugely rewarding and satisfying experience.

People often ask me, “How is your department able to manage so many machines with such a small number of sysadmins?” The answer is that my basic philosophy of large-scale system administration is “keep things simple”. Complexity is the enemy. It almost always means more system management overhead and more failures. It's fairly straightforward for a single experienced Linux sysadmin to single-handedly manage a cluster of a thousand machines, as long as all of the systems are identical (or nearly identical). Start throwing in one-off servers with custom partitioning or additional NICs, and things start to become more difficult, and the number of sysadmins required to keep things running starts to increase.

An arsenal of weapons in the form of a complete box of system administration tools and techniques is vital if you plan to manage a large Linux environment effectively. In the past, you probably would be forced to roll your own large-scale system administration utilities. The good news is that compared to five or six years ago, many open-source applications now make managing even large clusters relatively straightforward.

Monitoring

System administrators know that monitoring is essential. I think Linux sysadmins especially have a natural tendency to be concerned with every possible aspect of their systems. We love to watch the number of running processes, memory consumption and network throughput on all our machines, but in the world of large-scale system administration, this mindset can be a liability. This is especially true when it comes to alerting. The problem with alerting on every potential hiccup is that you'll either go insane from the constant flood of e-mail and pages, or even worse, you'll start ignoring the alerts. Neither of those situations is desirable. The solution? Configure your monitoring system to alert only on actionable conditions—things that cause an interruption in service. For every monitoring check you enable, ask yourself “What action must be taken if this check triggers an alert?” If the answer is “nothing”, it's probably better not to enable the check.

Monitoring Tools

If you were asked to name the first monitoring application that comes to mind, it probably would be Nagios. Used by just about everyone, Nagios is currently the king of open-source monitoring tools.

Zabbix sports a slick Web interface that is sure to make any manager happy. Zabbix scales well and might be posed to give Nagios a run for its money.

Ganglia is one of those must-have tools for Linux environments of any size. Its strengths include trending and performance monitoring.

I think it's smart to differentiate monitoring further into critical and noncritical alerts. E-mail and pager alerts should be reserved for things that require immediate action—for example, important systems that aren't pingable, full filesystems, degraded RAIDs and so on. Noncritical things, like NIS timeouts, instead should be displayed on a Web page that can be viewed when you get back from lunch. Also consider writing checks that automatically correct whatever condition they are monitoring. Instead of your script sending you an e-mail when Apache dies, why not have it try restarting httpd automatically? If you go the auto-correcting “self-healing” route, I'd recommend logging whatever action your script takes so you can troubleshoot the failure later.

When selecting a monitoring tool in a large environment, you have to think about scalability. I have seen both Zabbix and Nagios used to monitor in excess of 1,500 machines and implement tens of thousands of checks. Even with these tools, you might want to scale horizontally by dividing your machines into logical groups and then running a single monitoring server per group. This will increase complexity, but if done correctly, it will also prevent your monitoring infrastructure from going up in flames.

Configuration Management

In small environments, you can maintain Linux systems successfully without a configuration management tool. This is not the case in large environments. If you plan on running a large number of Linux systems efficiently, I strongly encourage you to consider a configuration management system. There are currently two heavyweights in this area, Cfengine and Puppet. Cfengine is a mature product that has been around for years, and it works well. The new kid on the block is Puppet, a Ruby-based tool that is quickly gaining popularity. Your configuration management tools should, obviously, allow you to add or modify system or application configuration files to a single system or groups of machines. Some examples of files you might want to manage are /etc/fstab, ntpd.conf, httpd.conf or /etc/password. Your tool also should be able to manage symlinks and software packages or any other node attributes that change frequently.

Configuration Management Tools

Cfengine is the grandfather of configuration management systems. The project started in 1993 and continues to be actively developed. Although I personally find some aspects of Cfengine a little clunky, I've been using it successfully for many years.

Puppet is a highly regarded Ruby-based tool that should be considered by anyone considering a configuration management solution.

Regardless of which configuration management tool you use, it's important to implement it early. Managing Linux configurations is something that should be set up as the node is being installed. Retrofitting configuration management on a node that is already in production can be a dangerous endeavor. Imagine pushing out an incorrect fstab or password file, and you get an idea of what can go wrong. Despite the obvious hazards of fat-fingering a configuration management tool, the benefits far outweigh the dangers. Configuration management tools provide a highly effective way of managing Linux systems and can reduce system administration overhead dramatically.

As an added bonus, configuration management systems also can be used as a system backup mechanism of sorts. Granted, you don't want to store large amounts of data in a tool like Cfengine, but in the event of system failure, using a configuration managment tool in conjunction with your node installation tools should allow you to get the system into a known good state in a minimal amount of time.

Provisioning

Provisioning is the process of installing the operating system on a machine and performing basic system configuration. At home, you probably boot your computer from a DVD to install the latest version of your favorite Linux distro. Can you imagine popping a DVD in and out of a data center full of systems? Not appealing. A more efficient approach is to install the OS over the network, and you typically do this with with a combination of PXE and Kickstart. There are numerous tools to assist with large-scale provisioning—Cobbler and Spacewalk are two—but you may prefer to roll your own. Your provisioning tools should be tightly coupled to your configuration management system. The ultimate goal is to be able to sit at your desk, run a couple commands, and see a hundred systems appear on the network a few minutes later, fully configured and ready for production.

Provisioning Tools

Rocks is a Linux distribution with built-in network installation infrastructure. Rocks is great for quickly deploying large clusters of Linux servers though it can be difficult to use in mixed Linux distro environments.

Spacewalk is Red Hat's open-source systems management solution. In addition to provisioning, Spacewalk also offers system monitoring and configuration file management.

Cobbler, part of the Fedora Project, is a lightweight system installation server that works well for installing physical and virtual systems.

Hardware

When it's time to purchase hardware for your new Linux super cluster, there are many things to consider, especially when it comes to choosing a good vendor. When selecting vendors, be sure to understand their support offerings fully. Will they come on-site to troubleshoot issues, or do they expect you to sit for hours on the phone pulling your hair out while they plod through an endless series of troubleshooting scripts? In my experience, the best, most responsive shops have been local whitebox vendors. It doesn't matter which route you go, large corporate or whitebox vendor, but it's important to form a solid business relationship, because you're going to be interacting with each other on a regular basis.

The odds are that old hardware is more likely to fail than newer hardware. In my shop, we typically purchase systems with three-year support contracts and then retire the machines in year four. Sometimes we keep machines around longer and simply discard a system if it experiences any type of failure. This is particularly true in tight budget years.

Purchasing the latest, greatest hardware is always tempting, but I suggest buying widely adopted, field-tested systems. Common hardware usually means better Linux community support. When your network card starts flaking out, you're more likely to find a solution to the problem if 100,000 other Linux users also have the same NIC. In recent years, I've been very happy with the Linux compatibility and affordability of Supermicro systems. If your budget allows, consider purchasing a system with hardware RAID and redundant power supplies to minimize the number of after-hours pages. Spare systems or excess hardware capacity are a must for large shops, because the fact of the matter is regardless of the quality of hardware, systems will fail.

Backups

Rethink backups. More than likely, when confronted with a large Linux deployment, you're going to be dealing with massive amounts of data. Deciding what data to back up requires careful coordination with stakeholders. Communicate with users so they understand backup limitations. Obviously, written policies are a must, but the occasional e-mail reminder is a good idea as well. As a general rule, you want to back up only absolutely essential data, such as home directories, unless requirements dictate otherwise.

Serial Console Access

Although it may seem antiquated, do not underestimate the value of serial console access to your Linux systems. When you find yourself in a situation where you can't access a system via SSH or other remote-access protocol, a good-old serial console potentially could be a lifesaver, particularly if you manage systems in a remote data center. Equally important is the ability to power-cycle a machine remotely. Absolutely nothing is more frustrating than having to drive to the data center at 3am to push the power button on an unresponsive system.

Many hardware devices exist for power-cycling systems remotely. I've had good luck with Avocent and APC products, but your mileage may vary. Going back to our “keep it simple” mantra, no matter what solution you select, try to standardize one particular brand if possible. More than likely, you're going to write a wrapper script around your power-cycling utilities, so you can do things like powercycle node.example.com, and having just a single hardware type keeps implementation more straightforward.

System Administrators

No matter how good your tools are, a solid system administration team is essential to managing any large Linux environment effectively. The number of systems managed by my group has grown from about a dozen Linux nodes eight years ago to roughly 4,000 today. We currently operate with an approximate ratio of 500 Linux servers to every one system administrator, and we do this while maintaining a high level of user satisfaction. This simply wouldn't be possible without a skilled group of individuals.

When hiring new team members, I look for Linux professionals, not enthusiasts. What do I mean by that? Many people might view Linux as a hobby or as a source of entertainment, and that's great! But the people on my team see things a little differently. To them, Linux is an awesomely powerful tool—a giant hammer that can be used to solve massive problems. The professionals on my team are curious and always thinking about more efficient ways of doing things. In my opinion, the best large-scale sysadmin is someone who wants to automate any task that needs to be repeated more than once, and someone who constantly thinks about the big picture, not just the single piece of the puzzle that they happen to be working on. Of course, an intimate knowledge of Linux is mandatory, as is a wide range of other computing skills.

In any large Linux shop, there is going to be a certain amount of mundane, low-level work that needs to be performed on a daily basis: rebooting hung systems, replacing failed hard drives and creating new user accounts. The majority of the time, these routine tasks are better suited to your junior admins, but it's beneficial for more senior people to be involved from time to time as they serve as a fresh set of eyes, potentially identifying areas that can streamlined or automated entirely. Senior admins should focus on improving system management efficiency, solving difficult issues and mentoring other team members.

Conclusion

We've touched a few of the areas that make large-scale Linux system administration challenging. Node installing, configuration management and monitoring are all particularly important, but you still need reliable hardware and great people. Managing a large environment can be nerve-racking at times, but never lose sight of the fact that ultimately, it's just a bunch of Linux boxes.

Jason Allen is CD/SCF/FEF Department Head at Fermi National Accelerator Laboratory, which is managed by Fermi Research Alliance, LLC, under Management and Operating Contract (DE-AC02-07CH11359) with the Department of Energy. He has been working with Linux professionally for the past 12 years and maintains a system administration blog at savvysysadmin.com.

 

Taken From: http://www.linuxjournal.com/article/10665

Friday, June 12, 2009

Backup Files With Rsync and Grsync

There are, of course, numerous backup solutions you can use, from the simple and free to the complex and expensive, as well as everything in between. The technology behind most backup systems, however, tends to be much more limited. Using classic tools, such as tar and gzip, to back up and compress is still very common under the surface of much more complex tools. This is true even when using network resources. In the end, you are backing up from one machine to another. Many people I know, including those with small businesses, do this for their regular backups. Machine A backs to machine B, which backs to C, which backs to A. The machines, and their drives, are all part of a network. Hey, instant cloud, and you probably didn't know you had one.

This is where rsync, another popular backup tool, shows its worth. As the name implies, rsyncs keep a backup copy of your data, in sync with the original. It can do it locally, from one physical drive to another, or across your network. Because only those files that have been modified are transferred, the process can be very quick. You can do this with single files, whole directories and subdirectories, while maintaining file ownership and permissions, links, symbolic links and so on. rsync has its own transport, or you can use OpenSSH to secure the transfer, and (of course) there are some great front-end, graphical tools to make the process a little slicker.

You can find rsync at rsync.samba.org, but you probably don't even have to look that far. Many distributions load it when you install your system. If not, check your installation disks or simply pick it up from your distribution's repositories. Before I explain how to rsync your data to your own personal cloud, let me show you how easy it is to create a synchronized backup of your data from one directory to another (or one drive to another):

rsync -av important_stuff/ is_backup

In the above example, rsync copies everything in the directory important_stuff into another directory (or folder) called is_backup. Most of you will have figured out that the -v means verbose copy. The -a option hides some amount of complexity in that it is the same as using the -rlptgoD flags. In order, this means that rsync should do a recursive copy; copy symbolic links; preserve permissions, modification times and group and owner information; and, with the final D, copy special files (device and block). When you press Enter, files go scrolling by, after which you see something like this:

sending incremental file list
./
CookingJul08.tgz
CookingJul2008_albums.odt
CookingJul2008_albums.txt
igal_page.png
montage.png
shalbum.png
zenphoto_comment.png
zenphoto_go.png
zenphoto_login.png
zenphoto_makepass.png
zenphoto_setup.png
zenphoto_theming_comment.png
zenphoto_upload_photos.png
zenphoto_view_album.png
. . . .

sent 46059880 bytes received 2753 bytes 6141684.40 bytes/sec
total size is 46044132 speedup is 1.00

One other thing that rsync should be able to do in order to be completely useful is delete files. If you are mirroring files and directories, it stands to reason that you want the mirror to represent exactly what is on the original. If files have been deleted, you want them deleted on the backup server as well. This is where the --delete parameter comes into play. Using the earlier example, let's delete that tgz file from the original, then relaunch the command:

$ rsync -av --delete important_stuff/ is_backup
sending incremental file list
./
deleting CookingJul08.tgz

sent 4164 bytes received 25 bytes 8378.00 bytes/sec
total size is 41911050 speedup is 10005.03

From here on, both directories will always be in sync. When doing network backups, this magic synchronization of files and directories is done using a client and server setup. At least one machine must play the role of server (although nothing is stopping you from running an rsync dæmon on every one of your machines). The server gets its information about who can access what from a configuration file called rsyncd.conf. You'll find that it probably lives in the /etc directory. The following partial listing is from one of my rsync servers:

hosts allow = 192.168.1.0/24
use chroot = no
max connections = 10
log file = /var/log/rsyncd.log
gid = nogroup
uid = nobody

[marcel]
path = /media/bigdrive/backups/marcel
read only = no
comment = Marcel's files
[francois]
path = /media/bigdrive/backups/francois
read only = no
comment = Files for the waiter

This configuration file is quite simple once you get the hang of it. Backup areas are identified by a name in square brackets (marcel, website, francois and so on). The chief bits of information there include the path to the disk area and some kind of comment. Notice that I specified read only = no, but I could just as easily have added that to the top section (the one without a name in square brackets). That's the global section. Anything put up there applies to all other sections, but it can be overridden. Pay particular attention to the gid and uid values; these are the group ID and user ID to which the file transfer takes place. The default is nobody, but you need to make sure that is correct for your system. One of my servers does not have a nobody group, but has a nogroup group instead.

The hosts allow section identifies my local subnet as being the only set of addresses from which transfers can take place. The log file line identifies a file to log information from the dæmon. You also can specify a maximum number of connections, specific users who are allowed to transfer files (auth users) and a whole lot more. Run man rsyncd.conf for the full details. When your configuration is set, you can launch the rsync dæmon, which, interestingly enough, is exactly the same program as the rsync command itself. Just do the following:

rsync --daemon

That's it. Now, it's time to put this setup to use. You might want to test your rsync connection by issuing the command:

rsync remote_host::

Note the double colon at the end of the server's name. The result should be something like this, assuming a server called thevault:

$ rsync thevault::
website All our websites
francois Files for the waiter
marcel Backup area for Marcel

Now, pretend I am on the server where my Web site files live. Using the following command, I can launch rsync to back up this entire area:

rsync -av /var/www thevault::website/

building file list ...

The format of the rsync command is rsync options source destination, which means I also could start the command from thevault, assuming my Web site machine also was running an rsync dæmon. The result would look more like this:

rsync -av localbackupdir websitemachine.dom::websites

All this work at the command line is great, but there are some tools for making the process easier, particularly if you will be creating a number of rsync backups or if you want to get into more complex requirements, such as scheduled backups. A friendly graphical front end on your desktop also may be a greater incentive to perform regular backups or take a quick backup when you've added important data and a “right now” backup is desirable. The first tool I want to show you is Piero Orsoni's grsync (Figure 1).

Figure 1. grsync provides an easy-to-use interface with every rsync option you could want.

While providing a great front end to rsync, grsync also works as a teaching tool for the command-line version of the program, or at least it helps as a memory aid. Almost any command-line option available to rsync is covered in one of these three tabs: Basic options, Advanced options and Extra options. What makes it a learning tool is that if you pause over any of those check boxes with your mouse, a tooltip appears showing the command-line option with a brief description of its function.

To start, click the Add button next to the session drop-down dialog and enter a name for your backup. You can define many different rsync backups here, and then launch them again at a later time. Clicking the Browse button brings up the standard Gtk2 file browser window from which you can select your local and destination folders. Unfortunately, you can't browse remote systems, but if you've already set up an rsync server, have no fear. You can enter it manually in the format I showed you earlier (for example, thevault::marcel/). When you are happy with the various options, click Execute. If you only think you are happy, click the Simulation button. (Chef Marcel loves a program with a sense of humor.) When you do click Execute, the program switches to a progress window (Figure 2), so you can see where you are in the process.

Figure 2. Once your grsync backup begins, it switches to a progress report view.

The next item on our rsync menu is Magnus Loef's GAdmin-Rsync. GAdmin-Rsync makes every aspect of creating an rsync backup a matter of filling in the blanks. What's more, the program creates backups using SSH by default, which means you can set up rsync backups to any machine to which you have secure shell access. This also means you don't actually need to have an rsync dæmon running on the remote machine if you have SSH access. Let me show you how it works.

When you start the program for the first time, you'll be asked for a name to give your new backup (Figure 3). You could back up the entire system or select specific folders of filesystems. Choose a name that makes sense to you based on what you want to back up. Enter a name, then click Apply to continue.

Figure 3. GAdmin-Rsync lets you define numerous backup configurations, each with its own identifier.

As you saw when we did this at the command line, rsync backups can be local, to a remote system or from a remote system. The next window looks for that very information (Figure 4). By default, local backup is checked. To back up to a remote server, select Local to remote backup. Because you can swap source and destination easily when using rsync, there's that third option. I routinely use a remote to local backup for my Web sites and remote systems. Click Forward to continue.

Figure 4. Your next step is to define the location of the backup.

Assuming you chose to back up to your cloud, your next step is to enter the server information (Figure 5). This includes the backup path on your networked server as well as your SSH key type and length. When you have entered this information, click Forward.

Figure 5. For remote backups, GAdmin-Rsync uses SSH/SCP for secure transfers.

Now you're ready to start the rsync backup. Click the Backup Progress tab to watch all the action.

What is nice about this program is that you can (as with grsync) store a number of backup definitions, so you can choose to back up your documents, music or digital photographs when it suits you. GAdmin-Rsync goes further though. If you take a look down at the bottom of the window on the Backup settings tab, you'll notice the words “Schedule this backup to run at specific days via cron” and a check box (Figure 6). Check the box, then scroll down to choose the days you want the backup to run. A little further down, you can specify the time as well.

Figure 6. GAdmin-Rsync also provides an easy way to schedule your backups with cron.

Well, mes amis, closing time has caught up to us, and at least for now, time is one thing we can't back up. Despite the hour, I am quite sure we can convince François to refill our glasses one final time before we go our separate ways. Please, mes amis, raise your glasses and let us all drink to one another's health. A votre santé! Bon appétit!

Marcel Gagné is an award-winning writer living in Waterloo, Ontario. He is the author of the Moving to Linux series of books from Addison-Wesley. Marcel is also a pilot, a past Top-40 disc jockey, writes science fiction and fantasy, and folds a mean Origami T-Rex. He can be reached via e-mail at marcel@marcelgagne.com. You can discover lots of other things (including great Wine links) from his Web sites at www.marcelgagne.com and www.cookingwithlinux.com.


Taken From: Linux Journal Contents #180, April 2009

http://www.linuxjournal.com/article/10409