Java Web Applications, Part 0

by Steven J. Owens (unless otherwise attributed)

Warning: Early Draft Stage

Preface

My first stab at this topic started with: This is an attempt at a concise, high-level overview of web application development in java.

Unfortunately, the topic is really big and invites sprawl. This is an attempt to be really, really concise and fast-moving. If you read this and you don't understand it, don't feel bad.

... Wow. okay, so I came back here to the preface to add another note. Below you'll find three general sections before we dive into the meat of the technical stuff. If you're eager to get geekin', you may want to skip right past Introduction, Plan of Attack, and J2EE (and Tomcat), and start with The Servlet Framework.

Java Web Applications, Part 0
Java Web Applications, Part 1
Java Web Applications, Part 2
Java Web Applications, Part 3
Java Web Applications, Part 4

Introduction

First, let's make sure we're talking about the same thing. When I say a "web application", I don't mean a web site, or a set of pages, or a simple HTML form that submits to a server-side CGI script that does something with the submitted form data. What I do mean is an application that uses a set of related web pages as the user interface. There's usually some permanent data, usually stored in a relational database. There's often some concept of a user session with temporary data that sticks around on the server side, from browser request to browser request, usually until the user logs out.

Plan of Attack

Most of this is going to be about servlets, and about JSP -- which is, when you get right down to it, just another way of coding servlets. There're tons of beginner's books and tutorials on JSP that assume you don't know how to code, so I'm going to assume you know how to code, and the JSP-oriented stuff will be from a servlets perspective.

First I'm going to talk about the basic concept of a web application - this is slightly important, especially because I discuss briefly the evolution of web programming in the large, and I think that having a sense of that really helps to nail down what the heck servlets are.

Then I'm going to talk about J2EE, and about what I'm not going to talk about. This kinda sucks, because it means I have to burn a lot of pixels defining what I'm not going to talk about. But there's a swamp of terminology out there, and if I don't at least skim over it, you're going to wonder where what I'm telling you fits into the picture. Ce la vie.

Then I'm going to dive into the Servlet API and give you a rough idea of what the important classes are and how you get at them. Then, if I get that far, I'm going to talk a little bit about JSP, enough to give you the gist of the underlying technology, because most of the JSP books are really intro-oriented and surface-level. This kinda makes sense, because JSP is a surface-level technology, but for somebody like me -- and like you, if you're the person I'm writing this for -- that can be a pain because it obscures what's really going on.

Then I'm going to talk about basic "normal" java web application design strategies. Well, okay, they're not going to be normal, but they'll be what everybody says should be normal (without getting too deep into the endless layers of abstraction and frameworks and similar wanking that seems to crop up when people start talking about "ought to").

Then, if I get that far, I'm going to try to find time to talk about what makes a webapp really tick, from a user-functional level. This is a tricky thing to talk about, for all sorts of reasons that I can't go into here, or this section will end up being twice as large.

Someday, somewhere, I'm going to try to compare and contrast the design of a web application to a "normal" GUI application.

There are tons of books and tons of subsidiary and related topics. I'm going to do my best to avoid most of them, or to only talk about them in terms of "this is what you should do". Speaking of that, I'm going to take a very prescriptive approach - not "can do", but "should do" - because otherwise I'll end up either being uselessly general or uselessly specific (and so voluminous you'll never read it all). I figure either you're intelligent enough to figure out how and when to diverge from what I say, or you're not and you should be off reading something more detailed and step-by-step anyway.

I'm also going to try to avoid wasting time with lots of caveats and exceptions; for example, when I say below that the HttpSession is basically a Hashtable, I'm going to assume you're smart enough to figure out that I'm not promising that it is a Hashtable, or that it's necessarily backed by a Hashtable, or even that it necessarily has the same method signatures as a Hashtable. Just that it basically works like a thing you can use to store keys and values and then look up the values by key later.

Web Applications

There are three technologies that most people "get" (though you may still need to learn a lot about them): the browser, HTTP, databases. A web app lives in the gap between HTTP and the database. That is, in a concrete sense, the web app code lives and runs in that gap. In an application design sense, you need to think about how your webapp extrudes through HTTP into the browser. I think that not doing this is a fundamental usability mistake that a lot of good programmers make when they start doing web apps. A lot of what makes a good web app design is the proper partitioning of the amount of data and interaction on each given page (UI and information designers call it "chunking"), and what's in each request back to the server.

Note: there's also a very special and specific meaning of "web application" in the java world. In that specific sense, a webapp is a specific directory layout and set of data and class and configuration files (most importantly the web.xml file) that defines a web application in a servlet engine. Add some specific extra configuration and tar it up using the Java jar tool in a very specific way, and give the filename a ".war" extension, and you have a WAR file, where WAR stands for "Webapp ARchive" instead of JAR for "Java ARchive" or tar for "Tape ARchive".

It's also a specific thing in a servlet-engine sense, meaning the set of running objects that the servlet engine sets up for that web application when in use. In this article, I'm throwing the term webapp around pretty loosely, but the fact of the matter is that I mostly do java webapps, so I'm also generally talking about the specific java concept of a web application. For the most part, when reading this you can ignore that very specific concept of a webapp, unless I tell you otherwise.

Here's an example of a run-of-the-mill workflow-oriented webapp: Alice uses the webapp to create a purchase order, then it shows up in Bob's account. Bob is Alice's supervisor, he has to click the "approve purchase" button, then it shows up in Carol's account. Carol is the purchasing department. Carol clicks on the "no way jose" button and it bounces back to Alice's account with a big fat "DENIED" label on it.

The Web Application Model

A web application divides up into a bunch of different components. Typically people see the breakdown fairly simplistically, as follows:

This is, in fact, the way the vast majority of early web applications were organized. The application code actually existed as a bunch of individual, stateless CGI scripts (most often written in perl), that pushed incoming data into the database and pulled it back out for display (and occasionally pulled it out, munged it with incoming data, and pushed it back in).

As the rest of the world started to clue in that maybe there was something to this Internet stuff, they started wanting (not necessarily needing, but that's another story) more complex applications. This started leading to more complexity in the CGI scripts, sequences and arrangements of HTML forms that only made sense in context with each other, more scaffolding code to make the CGI scripts less individual (not as discrete from one another, more stateful).

The most commonly cited - and as far as I can tell, completely imaginary - concern in this time span (about 97-99) was performance. The common performance fixation was on CGI script startup overhead and database churn. Startup overhead comes from the fact that each CGI request is handled as a separate process running a separate program from scratch - forking the separate process, loading the CGI script source file, loading any necessary interpreter or libraries, doing any setup necessary. Database churn is part of that setup - since scripts are stateless and exit after sending back the response, any necessary state has to be saved to a database at the end of each script invocation, and loaded back up on the next request.

Lots of folks came up with tools and frameworks to address these real or imagined needs. The Java servlet specification is one of them, and it's proven to be fairly useful. It is, like many things Java, a little less than you might expect. This means a little more work, but also a little more freedom. Typically in Java, you get just enough that all the hard work has been done, but you're not locked into a specific approach for a problem. I can see the sense in this, but there are times when it bugs me, because you end up "rolling your own" over and over, for so many commonly repeated tasks.

Lots of people have come up with add-on tools and frameworks to fill in that remaining gap. Most of them are useful to some degree or another, some are really, really useful, and many of them are less useful. But all of them seem to add a ton of complexity to the job, so I'm mostly unhappy with them. Mostly I just stick with plain old servlets, and JSP for the shallow stuff.

J2EE

I seem to hear a lot more about J2EE than about J2SE. Maybe you do too, and maybe that confuses you. It's not your fault.

J2SE just basically means "the java stuff", and it's the general term for the all the stuff you get when you install the java developer's kit, whatever we're calling it today (JDK, Java SDK, J2SE, Java 2, etc). This is the JVM, the java core APIs, the compiler and various other bits and pieces of the developer's kit (like the jar tool). Strictly speaking, J2SE is "the platform", so the various developer kit tools aren't part of it. J2SE is everything that every J2SE program can count on having available when it runs.

J2EE is everything that every J2EE program can count on having available when it runs. Although people sometimes talk about J2EE and J2SE as if they're opposites, or contrasts, or something, J2EE is really built on top of J2SE. It's J2SE with a bunch of extra pieces added in to make it easier to develop Enterprise-oriented applications - the extra E in J2EE.

When I say "pieces", I really mean "specifications", which java programmers also call APIs. Strictly speaking, the API is the objects and method calls for an implementation of the spec, but programmers use "API" as shorthand, because it's shorter than specification - two whole syllables! And it's a TLA (three letter acronym), too! Sometimes we call 'em specs.

There are open-source reference implementations of these specs, but there are also a lot of closed-source, commercial implementations. There're a good couple dozen specifications that make up J2EE. Okay, I'm exaggerating; I just went back and counted and there are only 23 at this moment. I could spend pages just trying to summarize them. Here's the link to the sun web page that lists them:

http://java.sun.com/j2ee/1.4/docs/#specs

And here's the J2EE Tutorial, if you have the time:

http://java.sun.com/j2ee/1.4/docs/tutorial/doc/

The J2EE picture is further complicated because all of the various vendors (and many other open source projects) are trying to come up with value-added add-ons to J2EE. But you can carve this list of specs up into some that are fairly specific and narrow, some that are well-defined and well-documented problems, some that are both specific and well-documented, and some that seldom get talked about but get used a lot, and some that seem to get more talk than actual use. I'm going to take a moment and try to carve up that list a little, off the top of my head.

These are the specs that most people think of when you say "J2EE".

They may also bother to think of these, especially if their particular bailiwick makes heavy use of them:

But, more likely they'll lump the above in with these, which they sorta take for granted:

Then there are all of these, some of which are subsidiary to other specs (for example, the EJB-to-CORBA spec) and others of which are oriented at just making life easier in Enterprise Application Land (like the J2EE deployment and management stuff) and still others that are part of the plumbing and special cases that you hopefully never have to see unless you're developing J2EE tools and components (like the J2EE Connector spec):

JNDI, for example, Java Naming and Directory Interface, is basically a hierarchical name-based lookup server that you can publish objects into, then look them up by name (kinda line DNS, only you're looking up java objects instead of looking up IP addresses). In some cases JNDI is the API you use to get at a full-blown LDAP server that is also being hit by a bunch of other software in the rest of the enterprise. One common organizational use for LDAP is to keep all the usernames and passwords in one spot. If you're going to plug your application into the same LDAP server, then you need to know at least a little JNDI and how to use JNDI from other parts of J2EE.

But, unless you're going to build stuff to let other applications interact with your enterprise app via JNDI, or use your JNDI server in new and creative ways, or implement your own JNDI server or JNDI components, you mostly don't need to know about it (except when you need to know just a tiny, tiny bit about JNDI when you configure your JDBC database pooling).

JDBC, Java Data Base Connectivity, is all about how your code interacts with the whatever database software you're using. Not the SQL bits, so much as all the scaffolding surrounding the SQL bits. How you get a connection to the database, how you send a query to the database, how you get the results of that query. There's a lot to say about JDBC and related topics, but it's covered very well in various books and is generally a really, really well-explored area, so go read that stuff to learn about JDBC.

The two highest-profile specs seem to be Servlets and EJB. But EJB seems to be talked about a lot more than it's actually done. There seems to be a large body of web applications that are written using a smallish subset of J2EE, mainly JSP/Servlets/JDBC. That's what I'm going to focus on here, and since there's a ton of documentation out there on JDBC, I'm not going to focus on that.

Tomcat

Tomcat, the reference implementation for servlet engines from jakarta.apache.org, is available for download bundled with a bunch of limited implementations of other J2EE components, to make it feasible to do limited J2EE development with Tomcat and then migrate to a full-blown J2EE environment.

For example, Tomcat's download includes a limited JNDI implementation. The tomcat JNDI server is single-JVM only, whereas a full-blown JNDI server would include a stand-alone server that could serve requests to multiple other applications via network.

Additionally, there are other limitations that mostly don't come up, but may cause problems. For example, if you use the popular Hibernate package for object-relational mapping, you'll find that Tomcat's JNDI implementation does NOT allow you to bind Hibernate's SessionFactory to JNDI.

If I talk about servlet-engine specific stuff at all, assume I'm talking about Tomcat, because it's the reference servlet engine, and because it's easy for you to get your own copy of Tomcat and follow along, and because I know a little bit about Tomcat.

The Servlet Framework

Since JSP is essentially a set of tools to make it easier to write servlets, I'm going to focus on servlets first, then on JSP. I highly recommend reading at least the Servlet Specification. It's very readable. You can get it here:

http://java.sun.com/products/servlet/reference/api/index.html

First, I'll state the blindingly obvious: the Servlet specification is essentially a standard set of tools for developing web applications.

A servlet engine, when you come right down to it, is a program that starts up, reads a configuration from somewhere (usually an XML file), listens for incoming HTTP requests, and then loads and instantiates a specific servlet class according to the configuration, and invokes a method on the servlet instance, passing the incoming HTTP request as a parameter. The servlet engine will keep the instance in memory and route future requests to the same instance, even simultaneously (if requests come in simultaneously).

There are really two or three "big" topics under the umbrella of the servlet spec.

I think the servlet engine concept is where you should start, but almost everything else I've seen on this topic gets bogged down in hyping the concept. I'm going to assume you've gotten about a page into a half-dozen of these articles and are sick of the servlet engine concept. So I'm going to leave it at the single-paragraph description above, for now, and dive right into the classes.

"Getting" the idea of the servlet engine is important, however, so if you're not sure about the general servlet engine concept, skip down to that section, read it, then come back here.

The Servlet Classes

The Servlet Spec defines a number of standard classes and how they're normally expected to interact. This provides a handy framework or set of tools for building a web application. I'm going to list what I think are the most important classes in the servlet API, grouped according to a combination of how I think you'll tend to use them and a hand-wavy "underlying nature" I think the different pieces share. In a sense, each group is a sort of dimension of the servlet spec. When these four dimensions come together, you have a complex and (somewhat) elegant tool for solving a large problem space.

Servlet Stuff

Overuse of Bold Text Alert The first time I wrote this section, I had several key phrases in bold, like "the servlet class does not get instantiated freshly for each HTTP request" and "any servlet must be coded to be either stateless or thread-safe". Then I took another look at it and decided it looked sort of like a used car commerical: "Down here at Crazy Eddie's used car lot, our prices are insane!"

So I took the bolding out and I'll just tell you, there are some really important stumbling blocks that I talk about here, so read this section carefully. I mean, carefully.

The Servlet Stuff are the classes most directly involved with speaking HTTP and dealing with URLs and redirects and the like.

People tend to get sloppy with the noun "servlet". Explaining this stuff would be easier if we had specific words for phrases like "servlet class" and "servlet instance", and there's no correct, commonly-used phrase for the most blatant aspect of all, the mapping that is defined by configuration entry, which maps a URL pattern to a specific instantiation of a specific class. I think this is a Bad Thing(tm) because it's easier to keep in mind what's going on if you keep the arbitrary URL-to-instance mapping in the forefront, because that's the essential nature of a servlet - an entry point into the JVM.

The HttpServlet class defines a superclass for the class that the servlet engine will instantiate and keep in memory, and route HTTP requests to (depending on how the instances of subclasses of HttpServlet are configured with a URL pattern).

Note: Strictly speaking, a servlet doesn't have to be HTTP-oriented (just have your servlet descend from javax.servlet.GenericServlet instead of HttpServlet) but almost nobody does that and the general concept of non-HTTP servlets has been greatly de-emphasized since the servlet spec first came out.

An important stumbling block for beginning servlet developers is that the servlet class does not get instantiated freshly for each HTTP request. At first glance you would think this would be the "proper" OO thing to do, but it's not. It seems like the proper OO thing would be an instance per request, if you think of the servlet as modeling the request handler. But it's not, it's modeling the entry point into the application. It's up to you, in your servlet code, to figure out what handler classes need to be instantiated and/or invoked.

Instead, the servlet engine will instantiate a single servlet instance per configured servlet, and keep it in memory to answer multiple subsequent requests, sometimes simultaneously (you can, of course, configure more than one servlet mapping per class file, each mapping will get its own single instance). What this means is that every servlet must be coded to be either stateless or thread-safe. Any user state should generally go on the HttpSession (see the next section) or elsewhere in the application.

The HttpServletRequest and HttpServletResponse model the HTTP request and response. These objects take care of things like parsing any request parameters and cookies, and formatting any response headers. The servlet engine parses each HTTP request, sets up a pair of HttpServletRequest and HttpServletResponse objects, and passes them in to an HttpServlet's doGet() or doPost() method.

Unlike the servlet instance, you get a fresh request and response instance for each browser request. However, you should avoid hanging onto references to the requests and responses past the servlet invocation. The servlet engine will produce the request and response instances and may be pooling and reusing them or otherwise doing odd stuff with them. Also, each request or response will have references to a lot of stuff, that might otherwise be ordinarily garbage-collected. In general they're meant to be ephemeral objects. For persistence, see the Session Stuff, below.

HttpRequestDispatcher is a sort of oddball class, it's a traffic cop that you you use for server-side includes and forwards. An include transfers control of the request and response temporarily to another servlet. When the dispatched-to servlet's doFoo() method returns back to the request dispatcher, the request dispatcher in turn returns back to the servlet (or other class) that called the request dispatcher. A forward transfers control permanently - when the dispatched-to servlet's doFoo() method returns, the client connection is closed and the request dispatcher never returns control back to the calling servlet.

Session Stuff

The Session Stuff is all about keeping track of semi-persistent data at the server side. Any really persistent stuff should go into a database, but the session stuff makes it convenient to keep objects in memory in between browser requests. They're also the tools you use to share data between different pieces of your web application.

The session is basically a Hashtable of references to objects that you created in earlier servlet invocations and stashed in the session. The servlet engine manages the session for you; making sure a given user has a JSESSIONID cookie, keeping track of the server-side objects associated with that JSESSIONID, and making it easy for you to get a reference to the session via the HttpServletRequest's getSession() method (in fact, that's almost the only way to get a user's session).

Note: The servlet spec also provides some tools for coping if you need to support users with browsers that don't do cookies (for whatever reason). In a nutshell, when generating each web page, you run every webapp link or URL through an encoding method that inserts the JSESSIONID in the URL.

The ServletContext is the webapp-wide version of the session. It's where you stick values that you want to make available to other users in the same session. I don't find I use it very often, because by the time data's ready to be shared webapp-wide, it's also susually in a database, where there are more powerful tools for finding it and displaying it.

Besides HttpSession and ServletContext, you can also stash stuff (very) temporarily on the HttpServletRequest. This is what you use to pass data around between servlets when using server-side includes or forwards via the RequestDispatcher. The request only lives a short while, for the life of the HTTP request/response cycle, basically.

I also included the request.getRemoteUser() and request.getUserPrincipal() methods in this section, because in a sense, the user's authenticated identity is an aspect of the user session (and an important aspect!).

Filter Stuff

The Filter Stuff provides a useful and general tool for a fairly narrowly scoped problem. A filter is sort of a mutant servlet; like a servlet, it gets configured with a URL pattern that it's responsible for. However, the assumption with a filter is that it's going to intercept a browser request, either on the way in, or on the way out, or both, and do something to it. This is where the HttpServletRequestWrapper and HttpServletResponseWrapper come in. They're convenient classes to extend so you can wrap a given request and/or response in your customized class, that does something useful. (This doesn't mean that you have to be inside a filter to use the wrappers, you can use them like any class).

Note: One important difference between a Filter and an HttpServlet is that a Filter is an interface. There's no partial implementation provided the way there is for HttpServlet.

Note: like a servlet, a filter is not something your code should ever directly worry about instantiating or configuring. Instead, you handle that with a configuration entry in the webapp's web.xml.

Filters are sort of odd ducks. They're for a specific need, but when you need 'em, you need 'em (or something like them). The funny thing is, I've never really heard all that many good examples of other uses for filters, besides the two I'm about to describe. Other examples I've seen seem really contrived, like editing out blink tags, or compressing output before sending it back to the browser.

My old martial arts instructor used to use the analogy of a screwdriver versus a pipe-cutter. You can use a screwdriver for a lot of things besides driving screws: prying paint cans open, punching holes in cardboard, etc. A pipe-cutter doesn't do as many things - but when you need to cut pipe, you need a pipe-cutter.

The classic examples for filters are authentication and header-setting. Header setting is fairly easy to understand - make sure that all responses have a no-cache header set on them, for example. Authentication filters usually check to make sure the user's logged in, and if not, redirects them to a login page. There's a J2EE standardized piece of the servlet API to do this (google on J2EE Form Authentication) but it has some annoying design quirks and a lot of folks just skip right past it and develop a filter, or use the fairly popular package called securityfilter.

Listener Stuff

The Listeners are similar to filters and servlets:

Listeners get invoked, for example, when a servlet context is created or destroyed, or a session is created, or destroyed. When an attribute is set on a session, or set on a context.

I'd like to give you some good examples of uses for listeners here, but frankly I've never found a really good use for them. One use that came to mind was tracking when a user session is created and destroyed, but that didn't seem important enough to use a listener for. Then again, maybe it doesn't really have to be that important to justify it. I may be unconsciously equating Listeners with the Servlet level of granularity, when it really wouldn't be a big deal at all to use a listener for that sort of thing.

The Servlet Engine Concept

It's easiest to define a servlet by starting with a servlet engine, which is essentially a web server designed to run applications on the server, instead of designed to serve static data. I'm also going to start by describing the engine in full-on production state, and only later go back to how it got into that state.

The servlet engine listens for incoming HTTP requests on a port. Maybe directly on port 80 if it's acting as the sole webserver on that machine, or customarily 8080 if there's a more normal web server already on port 80. It's typical for people to run the Apache web server on port 80, and use an apache module called modjk to connect it to the Tomcat server for requests to URLs defined in the modjk config.

Note: Since I wrote the above, I'm told the next, shiny new version of apache (version 2.1, I believe), which will be out of beta any day now, deprecates mod_jk and instead has the special servlet proxying code built into mod_proxy. Pre-apache-2.1, mod_proxy was the boring, slow, kludgy way of doing it, since it proxied requests just like any HTTP proxy, which meant you paid the CPU and memory and latency for parsing your text http request into a data-structure, then formatting it as a text http proxy request to tomcat, where it was parsed into a data structure all over again. However, as of 2.1, mod_proxy now has support for doing the smart thing when proxying to a servlet engine.

The servlet engine has, somewhere (we'll get to that in a bit), a set of configurations for which requests get handled by which servlet objects instances. When an HTTP request comes in, the negine checks the URL against the configuration and hands it off to the right servlet object instance. Which particular servlet method it passes them to depends on the HTTP request's verb (most of the time it's GET or POST, but unlike vanilla CGI scripts, it won't be both unless the developer makes sure to code it to handle both).

Before passing the HTTP request into the servlet method, the engine parses it into a javax.servlet.HttpServletRequest object, and also sets up a javax.servlet.HttpServletResponse object for the response.

The servlet class can be stateful but must be thread-safe. Multiple HTTP requests will be passed to the servlet object at any time. Any thread-specific state has to be kept elsewhere - usually in either the request object or in the user's HttpSession.

The actual object instances that handle the requests are instantiated only when the first request comes in for URL that maps to that class. The engine keeps the instance in memory afterward. Theoretically the engine has the option of unloading the instance at any point for resource management, though that doesn't really seem to happen often. The engine will call certain methods on the class when instantiating it and when unloading it, which gives the servlet a chance to do housekeeping tasks like saving state, closing any open resources, etc.

Web Applications and Classloaders and Classpaths, Oh My

You're getting near the end of the stuff I've written so far. So much for ambition. Pride goeth before a fall. Look upon my works, ye mighty, and despair. Into every life, a little mud must slide.

Here, I'm going to try to talk about:

Above I mentioned the special and specific meaning of the phrase web application in the java world. This was first set out in the Servlet Spec 2.3, essentially a way of drawing a line around all the stuff that makes up a given application - all the related elements and pieces - and calling it a web application. The phrase "web application" is usually used in two senses, either in reference to the running application in the servlet engine, or in reference to the collection of directories-and-files that make up the application. The running application sense is useful when you're talking about variable scope and classloaders, and is often called the Servlet Context. The directories-and-files sense is useful when you're talking about putting together the pieces that the servlet engine needs to have in order to create the running application sense.

The directories-and-files sense is about packaging up the various pieces of a set of related web pages, binary assets, JSPs, servlets, class files, resources and configuration. These are usually copied to somewhere under the servlet engine's directory structure. Strictly speaking, they don't have to be copied to under the engine's directory structure, the engine could do whatever it wanted with them - it could stick them all in a database as BLOBs. But practically speaking, pretty much all of the servlet engines I know about just store them as directories and files.

The concept of a WAR file (Web Application Archive) was also introduced, which is more or less a web application in a specialized tar file format, along with some things like manifests and meta files. This makes it even more packaged and easy to ship around, deploy and undeploy etc.

I haven't personally worked with WAR files much at all, so I can't say much about them, except to observe that, strictly speaking, the spec doesn't say how the engine has to handle the WAR file. Practically speaking, just about every servlet engine expands the WAR file into a set of files and directories, under the servlet engine's hierarchy. Of course, you still can't mess with that expanded WAR, because the next deployment will overwrite it.

One implication of this is that the servlet spec doesn't really define anything about configuration persistence. There are some places to stick configuration information - inside web.xml, for example - but if your app needs to save config data somewhere where it will persist across engine restarts and the occasional WAR redeployment, you're going to have to go outside the spec to figure out where to store it. If you store it in a data file in the WAR, any changes will be lost when you redeploy it. So far, the only answer I've come up with is to either set up some sort of JNDI server to store your config data statically, or to put it in a database. That still leaves the question of setting up the database pool name in the web.xml. I may well be wrong, but so far I consider this one of the gotchas of the servlet spec.

The servlet engine runs all web applications inside the same JVM and therefore inside the same OS-level process. However, the servlet spec requires the servlet engine to run the webapps in a fairly isolated sense. The biggest part of this is the classloader. If you don't know much about classloaders in java, I highly recommend you go and read this really useful white paper:

http://www.neward.net/ted/Papers/ClassForName/index.html

Each webapp has its own classloader. This means, for example, that if you have a singleton pattern in a class under one webapp, the code in a second webapp can't get at it. Note that although singletons are incredibly handy for various specialized purposes, strictly speaking you shouldn't use them in webapps. Why not? Because the Servlet Spec leaves the servlet engine free to pull all sorts of shenanigans with classloading and session distribution and clustering. You can't count on a singleton really being a singleton, though in many cases it will be.

The webapp file hierarchy pretty much consists of the following:

appname/
appname/WEB-INF/
appname/WEB-INF/web.xml
appname/WEB-INF/lib/
appname/WEB-INF/classes/

Any files under appname will be served by the servlet engine as if they were under a normal web server's htdocs directory. Any files under the WEB-INF subdirectory will be handled specially. For a start, WEB-INF and everything under it will not be served directly by tomcat. The web.xml file contains the webapp-specific configuation details. The WEB-INF/lib subdirectory is where you store any JAR files that your application especially needs. The WEB-INF/classes subdirectory is where you store any non-JARred class files for your application.

Some JAR files don't get stored under WEB-INF/lib. Any JAR file that needs to be shared across the servlet engine, for example. A JDBC driver JAR file is one example. Due to the way that JDBC connection pooling is usually implemented at the servlet engine level and then connections are provided to each webapp via JNDI, the JDBC driver JAR has to be shared across the entire servlet engine. Servlet engine wide JAR files go under the servlet engine, wherever that particular servlet engine dictates (in Tomcat it's in tomcat/common/lib).

While web.xml is where the webapp specific configuration details go, just as with JDBC drivers, some configuration details really are specific to the servlet engine. Where these go and how they're configured are specific to any servlet engine. With tomcat, for example, you have a tag set that defines the details for each webapp context, and it's called, appropriately enough, <CONTEXT>.

Best Practices

If I ever get this far, I'm going to use this section to discuss and document some (ugh) "Best Practices" for webapps. I really don't like that phrase, but I can't think of a more appropriate phrase for "the basically sane/good way to do it that has emerged through common use and consensus."

For now, I'm just listing these concepts as I remember them, in the (perhaps vain) hope that this simple list will be enough to remind me of what I was thinking of. Maybe I'll also try to keep track of Failed Memes, like sandboxing and dynamic class reloading, things that were heralded as soon-to-be Best Practices but never really worked out that way in practice.


See original (unformatted) article

Feedback

Verification Image:
Subject:
Your Email Address:
Confirm Address:
Please Post:
Copyright: By checking the "Please Post" checkbox you agree to having your feedback posted on notablog if the administrator decides it is appropriate content, and grant compilation copyright rights to the administrator.
Message Content: