by Steven J. Owens (unless otherwise attributed)
This is an attempt to give a thumbnail, big-picture overview of CVS for people who really don't know anything about it. Mainly it's an attempt to fill in the gaps and assumptions, the stuff that most CVS-competent people may unconsciously assume people already know.
If this is too lightweight for you, or once you finish reading this, I highly recommend Karl Fogel & Moshe Bar's Open Source Development with CVS.
Note: Since writing this several years ago, a new tool, Subversion (often abbreviated svn) has really picked up steam and started to displace CVS in the hearts and minds and servers of the open source community. Subversion aims to be a "CVS work-alike", replacing CVS, looking and feeling a lot like CVS, while being a ground-up redesign that avoids some key problems in the architecture (two big ones being handling directories better and being able to support atomic commits (i.e. commit a set of changes, undo a set of changes)). Thus, most of the following general comments will apply equally well to Subversion.
Karl Fogel didn't write a subversion book, but somebody else did - Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato - and Karl wrote a nice foreword for the book, and has a copy of it on his website:
Also, while I'm at it, there are a couple of really useful pages about SVN. I've had to dig them up a few times, so I'll put the URLs here:
I really like this one, because it talks about what's going on under the hood with SVN, and that really helped me "get" the idea that SVN is a virtual file system and what that implies about how tagging and branching work:
Also, setting up the repository for external use via ssh is a bit tricky, and so far this is the best info I've found on it. He's preoccupied with his authorized_users trick, which I'm not worried about, so I just left that out and set up the repo with the perms he recommended:
To start with, CVS stands for (if I recall correctly) "Concurrent Versioning System" and is a "revision control system", or sometimes called a "source code control system" or "version control system." Any way you slice it, it's a software tool for keeping track of changes to a set of documents.
Since CVS was developed by programmers, for programming, it's geared towards keeping track of plain text files (like source code). It does have some limited ability to include the occasional binary file, but it's really better at tracking changes to text documents.
There's a CVS client for windows, but not a CVS server for windows. For a CVS server you need a unix box.
CVS is free, both free as in liberty (you can get the source) and free as in beer (you don't have to pay anybody if you want to use CVS).
Basically there are four parts to the picture:
Looked at another way, you can think of the client and server software as mediating between your checkout and the repository:
|checkout/working copy||CVS client||CVS server||repository|
You use the client software to "import" a set of project data and source files into CVS, essentially telling the server about them. Importing sends them to the server for the first time. The server sets up a "project", which is a directory hierarchy starting just under the top level of the CVS repository.
Once you've imported the project code, to do any work with the CVS stuff, you have to check out a fresh copy of the project, and work on that. This copy of the project will have a "CVS" directory in each directory of your project.
That CVS directory will contain four files, which is what the CVS client uses to keep track of what you're doing in your directory. This makes later use of CVS a bit easier, since you don't have to specify the repository machine, etc. You can just run the cvs command and it'll look in the "CVS" subdirectory for those details.
You make your edits on the "local checkout" of your project, then you use the CVS client commands, like "cvs update" to make sure you have any code changes that other people have put in, then you build and test your code again to make sure that their changes work with your changes, then "cvs diff" to see what you changed, then finally "cvs commit" to send your changes back to the server to be merged into the repository.
When you commit, the client contacts the server and makes sure that nobody else committed changes to that file between your last update of the file and now. If they have, the client tells you about the problem and aborts the commit.
Seems simple enough, but the part that most people don't emphasize is that you must update, build and test before diffing and committing. The classic goof-up is to miss that something was modified by the update, or to just forget to build and test. The result is that you commit your changes without testing them in combination with the latest changes, and then somebody else updates and gets the broken code and has to deal with your mess.
So really, the traditional sequence (after the initial import and checkout) is:
1) Sit down to work, do "cvs update" to get any new changes.
2) If somebody has changed a file you're also working on, CVS will warn you about the conflict. CVS will insert the changed bits, with special marker lines to show where they conflicted. You go look at the files to examine conflicts and edit them to select the bits you want. If the conflicts are in widely separated bits of the file, CVS may just painlessly merge them.
This process can actually be a little painful if they're complex/ugly changes, so see further notes on diff/merge, below.
3) You update, check any changes that were applied to files you're working on. Then you try to build/run your code and make sure the update didn't break anything :-).
4) Then you work a bit more, until you decide you're done with that bit of owrk.
5) First you update again to make sure any other changes that happened while you were working are applied to your stuff. Then build/run again to make sure your stuff still works. Then and only then you commit.
As I said above, diff/merge is trivial for trivial cases, but for complex cases, it can get hairy. So I like to do "cvs -n update" first to see what's going to happen, then sometimes "cvs diff -D now filename" to compare the file in my checkout against the most recent version in the repository.
You can select a specific version of the file to check out - but you have to be careful, because the CVS client keeps track of that (it makes that checkout "sticky", so later updates never get applied to that file until you remove the "sticky" flag).
If you want to check and see what you've changed, you can use "cvs diff filename".
BIG GOTCHA: Most people, in my limited experience, assume that:
cvs diff filename
...will compare the file against the most recent version in the repository. WRONG WRONG WRONG. It will compare the file against the last version of the file that you checked out (or updated) from the repository. To diff against the most recent version in the repository without actually updating, do:
cvs diff -D now filename
In a sane and healthy software development team, checking in code that does not compile and run and pass tests properly will get you smacked with a rolled up newspaper.
If everybody remembers to update and build/run/test before committing, then mostly everything works out fine.
Also, remember, CVS is NOT a replacement for teamwork and communication.
Also, DO NOT USE PSERVER for remote CVS use. It's highly unsecure. The normal manner of using a remote CVS client is to use SSH tunneling, and the CVS client normally has support for that built in. You just tell it to use SSH... and of course the user has to have a login account on the CVS server. See the "CVS via SSH" article here at notablog.
If I'm stuck on a winbox, I generally use the cvs command line client that comes with cygwin. There're a couple pointy/clicky CVS clients out there. TurtleCVS is one that I can think of off the top of my head. I dislike them because they obscure too much of what's going on. Though they're nice for getting a big picture, they also seem to lead people into committing too many files at once, in my experience.
Don't forget to have fun.
Here are a batch of additional notes I wrote up at various times. I'm including them here more as a cheat sheet than anything else.
Normally you add new files and directories by using "cvs add" followed by "cvs commit", one at a time. If you have to add a whole set of files, this could be laborious. The "cvs import" command is usually used to set up the project in CVS by importing your whole set of project files at once. But, with some care, you can use it to add a new set of directories and files to an existing CVS project.
Let's say you have a project with a file layout like this:
projectname projectname/source projectname/source/com projectname/source/com/darksleep
Now you want to add a directory hierarchy full of data:
projectname/data/a_to_f projectname/data/a_to_f/a projectname/data/a_to_f/b projectname/data/a_to_f/c ...etc...
The big gotcha here is that the import command will import the enclosing directory, i.e. the current working directory when you run the command. It seems counterintuitive to me, but that's the way it is. Therefore, to get the proper outcome, you must cd into the top directory of the hierarchy you want to import. The import command line still has to specify the path in the project where the enclosing directory will be added:
NOTE: But, don't use pserver, as it's not securable. Use ssh tunneling instead (see below).
$ cd projectname/data $ cvs -d :pserver:username@cvshost:/usr/local/cvsrepos import -m "Adding data to projectname" projectname/data someirrelevanttagname anotherirrelevanttagname
The comment and tags are necessary, but not really that important.
The CVS pserver transmits usernames and passwords in the clear. This is because CVS uses rsh to implement the network aspects. To make CVS use ssh instead of rsh, if the box running cvs is not behind a firewall, from the shell:
$ export CVSROOTemail@example.com:/path/to/repository $ export CVSRSH=/usr/bin/ssh
Then use cvs normally, and cvs will use ssh to protect the session.
cvs -d :ext:firstname.lastname@example.org:/var/lib/cvs checkout projectname
NOTE: See the "Using ssh agent" article for a more thorough, detailed description of how to do this.
To use cvs/ssh without being constantly prompted for a password, you need to set up private/public key access on the server and store the private key on your client machine. This is risky because anybody who gets their hands on your private key can get into your server account. Be careful.
The keys can be generated on the client and scp'd to the server, or generated on the server and scp'd down to the client:
1) generate a public/private key pair:
$ ssh-keygen -t dsa
When prompted for a passphrase, enter a fairly long phrase that you can remember.
The passphrase will be used to encrypt your saved keys. The passphrase should be significantly longer and harder to guess than a normal password.
The classic example is the first line of Coleridge's poem, Xanadu:
"In Xanadu did Kublai Khan a stately pleasure dome decree"
Of course, the first time somebody used that as an example, the next day 20,000 users entered that as their passphrase. So don't use that.
Using a blank passphrase is, of course, generally regarded as stupid, since anybody can then copy your private key. Any purpose for which you think you need a blank passphrase is better done using ssh-agent (see below).
3) By default, ssh-keygen should put the private key in ~/.ssh/iddsa and the public key in ~/.ssh/iddsa.pub. Make sure the keys are in the following locations:
client:~username/.ssh/id_dsa client:~username/.ssh/id_dsa.pub server:~username/.ssh/authorized_keys2
Copy the contents of iddsa.pub into ~/.ssh/authorizedkeys2. I've seen conflicting reports as to whether it should be authorizedkeys or authorizedkeys2. I believe it depends on which key generation type (rsa or dsa) you use; authorized_keys2 worked for me and I used -t dsa.
3) Make sure ./ssh/id_dsa is chmodded go-r (006)
To set up ssh portforwarding:
Specifically, for CVS from outside the firewall:
ssh -L2401:internalcvsboxaddress:2401 username@firewallbox
Note that the address is the internal IP address; essentially, what you're doing is telling ssh to "connect me to the firewall, and while you're at it, forward any packets from localhost:2401 to the address foo.bar.baz:2401", where that address is from the perspective of the destination box.
There are two gotchas with this approach.
The first is that you have to set up your current archive using a tunnel, so that it shows as having checked out from localhost.
The second is that you'll also have to use tunneling when inside the firewall.
To tunnel CVS from inside the firewall, just do the same thing, only instead of tunneling to the firewall, tunnel either to the CVS box (best, for security) or to another box that can get to the CVS box (not a good idea, since if your local network has been subverted, a criminal could eavesdrop on your username and password).
If you're doing this, you may find it useful to set up a little shell script to kick off the ssh session, and you may also find it useful to set up a "cvsssh.sh" shell script to direct your cvs commands to the local port (at least for the login and checkout, after that it should just work fine).
#!/bin/sh echo "Make sure you have ssh portforwarding to the appropriate box running..." cvs -d :pserver:username@localhost:/usr/local/cvsrepos $1 $2 $3 $4 $5 $6 $7 $8 $9
NOTE: Again, using pserver is considered insecure and risky, these days. Fortunately, these days ssh is more widely accepted and you should have an easier time using ssh for remote CVS.