The Real Basics of Programming in Java (for non-programmers)


title: The Real Basics of Programming in Java (for non-programmers)
category: Java
date: Oct 14, 2011 

The Real Basics

These details get skipped a lot. If you've played a little bit with some programming, like BASIC, you may want to skip this, but unless you're confident, I suggest you at least skim it. Even if you are confident, I get into some programming in-jokes and stuff, further on, that might help ease the shock of getting into the programming world.

If you're confident, go back to The Elements of Programming In Java.

Note: The following is focused on really basic stuff, but it still assumes the context is the java programming language. For example, when I give a bit of code as example I'm not going to stop and point out the details that are particular to java and maybe don't work the same way in some other programming language. Sorry, but there's too much to explain here already :-).

A Program

A program is a huge, complex list of step-by-step instructions that the computer carries out. These instructions are written in a particular combination of words and punctuation.

A Short Example

Okay, so let's give you a short example of what a bit of program might look like:

System.out.println("Hello.") ;
int somenumber = 2 ;
int anothernumber = 3 ; 
int theothernumber ;
theothernumber = somenumber + anothernumber;
System.out.println("the other number is " + theothernumber) ;
theothernumber = theothernumber + 1 ;
System.out.println("the other number is now " + theothernumber) ;

This sort of human-readable stuff that makes up the program is usually called the source code, not an important detail but if you're curious as to why read this.

If you open up the source code for a program in an editor*, you see a bunch of words, punctuation and whitespace (whitespace is spaces, tabs, or carriage returns/newlines).

By the way "editor" is the programmer term for a word processor. Generally editors (classics like emacs, or vi, or programs like (shudder) Notepad on windows) are much more focused on function and less on form. Editors are about moving words around, word processors are about making them print out pretty. And if you're wondering why programmers have a special term for word processor programs, it's the other way around - what do you think was used to write the source code for the first word processor programs?

While we're at it, in the modern era a lot of programmers don't use just plain old editors anymore, they use IDEs. IDE stands for Integrated Development Environment and is a GUI programming editor with a bunch of other extra tools and features. Some programmers swear by them, some swear at them. A lot of java programmers think that, whether you like IDES or not, a beginner should start with a very simple editor and the java command-line tools.

The first step the computer takes in converting the human-readable words to computer instructions is breaking them down into tokens, chunking them up according to certain rules. For starts, whitespace separates the words, and a change from letters to punctuation usually separates the words.

Almost nobody in the programming world talks about tokens, unless they're messing around with learning how programming languages are built. It's really just a fancier word for "chunks", and the main reason I'm bringing it up is because it lets me say that when you get right down to it, the "instructions" are a series of tokens; and now I get to explain the different kinds of tokens without having to waffle about keywords versus punctuation and various silly crap like that. They're all just tokens.

Identifiers: Keywords, Variables, Literals

Strings of letters like "somenumber", are identifiers.

An identifier is either:

In the example, int is a keyword, while somenumber, anothernumber, and theothernumber are variable names.

Most identifiers in any large given program are usually variable names, since there're a limited number of keywords. On the other hand, those keywords get repeated a lot in any program.

Keywords are sometimes called "reserved words" (meaning that you can't use them for a variable name, because they're reserved to mean something special). Every now and then you even run into a reserved word that doesn't actually do anything, the people who designed that programming language just thought it would be a good idea to keep that keyword reserved for some reason.

There are also rules for what you can put in a variable name, but for now just use normal words (nouns mostly), avoid using a keyword, and avoid doing anything fancy with punctuation or numbers, and you'll be fine.

I list all of the official Java keywords here:.

Java Keywords

Later on, read up on Java Naming Rules and Conventions:

naming rules and conventions

A literal means that instead of having a variable that contains a value, you just literally have the value typed right there in the code, like the number 343243, for example. To put a literal String (programmer speak for a piece of text, as in a string of letters; or really a string of characters, since text can include number characters and punctuation characters, as well as letter characters) in a program, put quotes around it, "like this".

Punctuation

Most punctuation in the code is one of two things, either 1) operators or 2) start/end indicators for an expression, statement or block. See Fun With Punctuation for a little more detail.

Operators

An operator is basically a bit of punctuation that does something, like the plus character + adds two values together, or the minus character - subtracts them.

Most of the usual math symbols do what you'd expect them to, except for the equals sign =, which is an important exception I'll get to in the very next section (which is only a few paragraphs away).

Another exception is that the plus sign + can be used to combine (in programmer speak, "concatenate") two strings of characters together. I'll add a section below on Special Operators

Besides reading the section below on Special Operators, you should read The Java Tutorial section on: operators

Also read the section on:

The Equals Sign =, Assignment and Comparison

And:

Equals Signs, Assignment and Comparison, or Stupid Programmer Mistakes

Syntax

There are rules about how tokens can be used and how they can go together. These rules are called syntax. As you start to program, you'll be hearing about syntax a lot, mainly because a lot of the more common mistakes beginners make are syntax errors, usually finicky typos that are just damned hard to remember, until they become ingrained by habit. Don't get frustrated, it's not you; even experienced programmers often (usually) make stupid typos in their first draft of a piece of code. Fortunately, these days programming editors usually point the error out as soon as you type it.

expressions

An expression is the smallest piece of code that can be treated like a value. So a variable is an expression. So is a literal. An operation, like 1 + 1 is an expression. So is somenumber + anothernumber. I guess you could say, as a rule of thumb, that anything that you could take out and replace with a literal value, is an expression.

Another way to say that is, "anything that returns a value". "Returns a value" is programmer-speak for "results in a value being sent back". There's a return keyword that does this from a method (which is basically a grouping of code, also sometimes called procedures, subroutines, or functions, but in object-oriented languages like java, the tradition is to call them methods). Therefore a call to a method that returns a value is also an expression.

statements

If an expression is a term, a statement is a phrase. Since it's a phrase, not a sentence, you don't put a period at the end of it, you put a semi-colon ; at the end of it.

For example:

int somenumber = 2 + 2 ;

Source Code and Fun With Punctuation

Most punctuation is a single character, so there's nothing complicated as to what separates different punctuation tokens. There are some two-character combinations and there are some matched sets. The matched sets are usually used for organizing things, to start and end sections. Usually a left parentheses "(" has to have a right parentheses ")" somewhere, same for { and }, and [ and ]. Quotes like " and ' and ` usually have to be in a matched set, like "foo", or 'bar', or `baz`. You will almost never see the < and > used like this, because they're usually used for greater-than and less-than.

Note: The ` is usually on the key to the left of the number 1 key, upper-left corner of your keyboard, while ' is usually next to the enter key, middle-right side of your keyboard. Some people like to use ` and ' as opposite sets of a matched pair, like ( and ), but very few programming languages do this.

Multitasking

When you get right down to it, computers can only do one thing at a time. They can just do them so fast it looks like they're doing several things at once. This is called "multitasking." Juggling is a popular metaphor for this, but I prefer the chess master metaphor.

The computer acts sort of like a brilliant chess player who can play against twenty ordinary people at once. But it's important to realize that he isn't really holding all twenty chess games in his head simultaneously. He walks up to each chess board, looks at the layout of the pieces, and decides, based on the context of the pieces on that chess board, on the correct next move. Then he forgets about that chess board, and walks on to the next chess board. He doesn't think about that chess board until he comes back around to it.

A computer does something quite similar - each running program (called a process) has its own context of memory associated with the process. The CPU goes from process to process, looks at the memory associated with that process, does the next step, then moves to the next process.

Source Code: Compiled and Interpreted Programs

The human-readable version of the program is called the source code. Why this is, you don't really have to know, nor do you really have to know what compiled or interpreted mean. But if you're interested, read on.

The computer itself only understands numbers. Everything that you see a computer do that looks like it deals with words, is in the end boiled down to numbers. The program is too - the human-readable version is purely for your convenience. And believe me, it is one heck of a convenience.

Something has to convert that human-readable stuff to machine code (pure 1s and 0s). This is called a compiler, or sometimes an interpreter. A compiler converts just once. The result is a set of 1s and 0s that the machine runs. An interpreter converts on the fly; it's a program that runs your program, interpreting your words and issuing the right 1s-and-0s version of the commands as necessary.

But in real life, nothing is that simple.

First of all, there's a lot of give and take in terms of whether a particular program is a compiler, per se, or an interpreter. Many "interpreters" actually compile the program right when you run the program. Some compilers (like the java compiler) compile the program halfway, to a generic format called bytecode, and then it's run on a virtual machine (like the Java Virtual Machine, JVM for short), which is basically another take on the interpreter, a program that runs a program.

Second of all, there is seldom a one-for-one, word-to-code translation. Most often, a short set of words converts into dozens or more individual machine codes. On top of that, many compilers and interpreters then rearrange commands to make it all run faster, while still doing exactly what you said to do. This all takes place under the covers.

Like I said, you don't really need to know about it now, but it's a good idea to understand it. Sooner or later you'll trip up on something, and understanding this stuff is what will help you figure it out.

RTFM, "Use The Source, Luke", and "ask smart questions"

As you learn and ask people for help, you'll hear two phrases very often, RTFM (Read The Fucking Manual) and "Use the Source, Luke" (sometimes abbreviated UTSL). A growing trend is the third phrase, "learn how to ask smart questions", after an essay written by Eric S. Raymond about how to do just that.

All of these boil down to "do your own homework before you ask me for help." Like reading this page. All of us have been there before, and it sucks, we'll spend tons of energy to help you figure out how to do it yourself, but we're not going to do it for you. Doing it for you is just a waste of our time. Helping you figure out how to do it for yourself an investment in helping you become a new member of the programming community, who someday might help other members of the community - maybe even us!

Special Operators

These are some important gotchas to know about, but they were taking up too much space in the Operators section above, so I moved them down here.

"foo" + "bar"

Works out to have the same value as "foobar". There's not much use for "foo" + "bar" in programming, but there's a lot of use for gluing together strings. It also automatically converts any non-string (like a number value) into a string, so instead of trying to numerically add "foo" + 3 you add "foo" + "3" and you get "foo3".

Another exception is ++ and --, which are called the increment and decrement operators. These get used so often in programs that I wanted to point them out. Java doesn't use them as much as some programming languages, but they still get used a lot. They're just a shortcut for "add one" or "subtract one", which seems kind of stupid at first. After you write a few programs, you'll realize they get used a lot to count down or count up from one number to another, or to step through a bunch of things, and it starts to seem really handy to be able to rewrite this:

count = count + 1 ;

Into this:

count++ ;

The Equals Sign =, Assignment and Comparison

The equals sign is used for assignment, meaning, storing a value into a variable. If you see:

a = 5

That means "store the value 5 in variable a." When programmers write this out, they usually say something like "let a equal five" (and in BASIC, there actually is a LET keyword, "LET A=5") or "a is assigned the value five".

If you try to put a literal on the left side, like this:

5 = b

...the compiler yells at you. This turns out to be a very good thing, we'll talk about it in a moment.

You can also assign one variable to another, which means that the value in the second variable gets copied to the first variable as well:

a = 5 // the variable 'a' now contains the numeric value 5

a = b // the variable 'b' also now contains the numeric value 5.

A Very Short Digression on References

Note, there's a gotcha with variables in some programming languages. If variable a contains a simple value like a number, then assigning variable b the value of variable a just copies that simple value. But sometimes the value in a variable is a _pointer_ or in java a _reference_. In that case, the same thing happens - the value gets copied - but the way the programming language _handles_ that value makes that simple copy have more complicated implications. I don't want to get too side-tracked with this, because this starts to get into more complicated topics like objects, classes, instances and instance references. Those don't belong in a "real basics" tutorial, but the general idea is that besides numbers, strings, etc, you can have a value that is a _reference_. In that case, it's like instead of the actual value being stored in the variable, what's stored in the variable is a note like "the actual value is over _there_". When some code uses that value, it just goes over _there_ and checks. But this can have tricky implications, because you can have two different bits of code that both fiddle with the value stored over _there_. This lets you do really useful things, but it also lets you pull the rug out from under yourself.

Equals Signs, Assignment and Comparison, or Stupid Programmer Mistakes

Using a single equals sign for assignment is a very firmly established tradition in programming. This is sort of unfortunate, because it leads to one of the most common, and most frustrating typos in the programming world, which is "confusing assignment (=) with comparison (==)".

Most people first learn about the equals sign for comparison, like "is a equal to b?", and for finding out the value, like "two plus two equals 4." The motion sort of goes left-to-right, the same way you read english. In programming, when you see a single equals sign, the motion goes the other way, from right to left - the value on the right gets stored in the variable on the left.

In other words, given:

a = 2

A programmer doesn't read this as "does a equal 2?", they read it as "from now on, a does equal 2"

You use a double-equals sign == to do comparison. You write "is a equal to b?" like this:

a == b

It is very, very easy, even for an experienced programmer, to slip and write this instead:

a = b

When this does happen, it can lead to very weird behavior, which is frustratingly hard to track down, and when you finally find it, you feel really stupid. Bear in mind, it's usually buried among pages and pages of other code, so it's really easy to look at it and see what you're expecting to see, not the error.

One really good rule of thumb that I picked up somewhere is "literal on the left". If you're doing a comparison with a literal value, put the literal value on the left side. For example:

5 == a

This is good because it forces you to think in a slightly different way about it, and because the compiler will yell if you slip and write 5 = a, because you're not allowed to redefine the number character ('5') to have the value of whatever's in the variable named 'a'.

Foo, Bar, Baz: Metasyntactic Variables and Stupid Programmer Jokes

A lot of programmers use the words foo, bar, and baz a lot in examples. This is so common that there's actually now a jargon phrase for it, we call these words the "metasyntactic variables".

The reason these show up so often is that often you need to explain something, and you need to explain it with an example, and the example has to refer to some things, purely for the purpose of the example. So programmers use "foo" and "bar" (from the military slang FUBAR, for "Fucked Up Beyond All Recognition"). The third variable "baz", is just a corruption of "bar", because you often need a third example variable.

Why not use real words? Three reasons:


Last modified: Fri Feb 27 03:22:19 EST 2004
See original (unformatted) article

Feedback

Verification Image:
Subject:
Your Email Address:
Confirm Address:
Please Post:
Copyright: By checking the "Please Post" checkbox you agree to having your feedback posted on notablog if the administrator decides it is appropriate content, and grant compilation copyright rights to the administrator.
Message Content: