Architecture of the World Wide Web
The success of the Web can be attributed in part to luck and timing, but some of the
credit for its success can be attributed to its architecture. The architecture of the Web
is based on a few fundamental principles that have taken it from its small beginnings
to the large mass of information and functionality that exists today. These principles
include:
• Addressable resources
• Standard resource formats
• A uniform interface for interacting with those resources
• Statelessness in the interaction between clients and services
• Hyperlinking to enable navigation between resources
Everything on the Web is addressable. Uniform Resource Identifiers (URIs) are used
to define the locations of particular resources. Resources can be things like HTML
documents, images, or other media types. Addressability is one of the important parts
of the Web’s success.
Based on HTTP (Hypertext Transfer Protocol), the uniform interface of the Web also plays into this openness and interoperability. HTTP is an open and well-known protocol that defines a standard way for user agents to interact with both resources and
the servers that produce the resources. These interactions are based on the verbs (or
methods) that accompany each HTTP request.
GET
is probably the most commonly used and well-known verb, and its name is descriptive
of its effect. A
GET for a particular URI returns a copy of the resource that URI
represents. One of the most important features of
GET requests is that the result of a
GET
can be cached. Caching GET requests also contributes to the scalability of the Web.
Another feature of
GET requests is that they are considered safe, because according to
the HTTP specification,
GET should not cause any side effects—that is, GET should never
cause a change to a resource. Certainly, a resource might change between two
GET
requests, but that should be an independent action on the part of the service.
POST
, which indicates a request to create a new resource, is probably the next most
commonly used verb, and there are a whole host of others that we will examine later
in this chapter and throughout this book.
HTTP and the Web were designed to be
stateless. A stateless service is one that can
process an incoming request based solely on the request itself. The concept of per-client
state on the server isn’t part of the design of HTTP or the Web.
If a request from a particular user agent contains all of the state necessary to retrieve
(or create) a resource, that request can be handled by any server in a farm of servers,
thus creating a scalable, robust environment.
Statelessness also improves visibility into web applications. If a request contains
everything needed for the server to make a proper reply, the request also contains all
the data needed to track and report on that request. There is no need to go to some
data source with some key and try to recreate the data that was used as part of a request
in order to determine what went right, or what went wrong (this wouldn’t be ideal
anyway, since that data may have changed in the meantime). Statelessness increases a
web application’s manageability because the entire state of each request is contained
in the request itself.
Hyperlinking between resources is also an important part of the Web’s success. The
fact that one resource can link to another, enabling the user agent (often through its
human driver, but sometimes not) to navigate between related resources, makes the
Web interconnected in a very significant way.
The Web is the world’s largest, most scalable, and most interoperable distributed application.
The success of the Web and the scalability of its architecture have led many
people to want to build applications or services on top of it.
SOAP
Many individuals and organizations have tried to build on the success and scalability
of the Web by describing architectures and creating toolkits for building services.
Services are endpoints that can be consumed programmatically rather than by a person
sitting at a computer driving an application like a web browser. The two main approaches
used in these attempts have been either the SOAP protocol or the architectural
style of REST.
While a chapter on the subtle differences between protocols such as
REST and POX (Plain Old XML over HTTP) might make for an interesting
read, this chapter is more specifically focused on the architectural
differences between REST and its main competitor, SOAP.
SOAP, which at one point in its history stood for Simple Object Access Protocol (before
its acronym status was revoked in the 1.2 version of its specification), is what many
developers think of when they hear the term
web service. SOAP was born out of a
coordinated attempt by many large vendors to create a standard around a programmatic
Web.
In many ways, SOAP doesn’t follow the architecture of the Web at all. Although there
are bindings for using SOAP over HTTP, many aspects of SOAP are at odds with the
architecture of the Web.
Rather than focusing on URIs (which is the way of the Web), SOAP focuses on
actions
, which are essentially a thin veneer over a method call (although of course a
SOAP client can’t assume a one-to-one relation between an action and a method call).
In this and many other ways, SOAP is an interoperable cross-platform remote procedure
call (RPC) system. SOAP-based services almost always have only one URI and
many different actions. In some ways, actions are like the HTTP uniform interface,
except that every single SOAP service creates new actions; this is about as un-uniform
and variable as you can get.
When used over HTTP, SOAP limits itself to one part of the Web’s uniform interface:
POST
. This creates a limitation because results, even those that are read-only, can’t be
safely cached. In many SOAP services, most actions should really use
GET as the verb
because they simply return read-only data. Because SOAP doesn’t use
GET, SOAP results
cannot be cached because the infrastructure of the Web only supports caching responses
to
GET requests. To be honest, you can’t really call a SOAP-based service a web
service since SOAP intentionally ignores much of the architecture of the Web. The term
“SOAP service” is probably a more accurate description.
When confronted with the fact that SOAP doesn’t follow the architecture of the Web,
SOAP proponents will often point out that SOAP was designed to be used over many
different protocols, not just HTTP. Because it is meant to be generic and used over
many different protocols, SOAP can’t take advantage of many of the Web’s features
since many of those features are particular to HTTP.
REST
REST is an architectural style for building services. This style is based on the architecture
of the Web, a fact that creates a fairly sharp contrast between REST and SOAP.
While SOAP goes out of its way to make itself protocol-independent, REST embraces
the Web and HTTP. Although it’s certainly possible to use some or all of the principles
of REST over other protocols, many of its benefits are greatest when used over HTTP.
Another significant contrast is that SOAP isn’t an architectural style at all. SOAP is a
specification that sets out the technical details on how two endpoints can interact in
terms of the message representation, and it doesn’t offer any architectural guidance or
constraints. In contrast, REST services are built to follow the specific constraints of the
REST architectural style.
Services that follow this style are known as
RESTful. Note that these
architectural constraints are more what you’d call “guidelines” than actual
rules. Some services will use all of these constraints, and some will
use only some of the constraints.
In their book
RESTful Web Services (O’Reilly), Leonard Richardson and
Sam Ruby lay out something they call the Resource Oriented Architecture
(ROA), which is a stricter set of rules for determining whether a
service is really RESTful.
While SOAP services are based on a service-specific set of actions and a single URI,
RESTful services model the interaction with user agents based on resources. Each resource
is represented by a unique URI, and the user agent uses the uniform interface
of HTTP to interact with a resource via that URI. Put another way, REST services are
more concerned with nouns (e.g., resources) than verbs (e.g., HTTP methods or SOAP
actions) since the design of a service is about the URIs rather than a custom interface.
Resources and URIs
The first thing to do when designing a RESTful service is to determine which resources
you are going to expose. A
resource is any information that you want to make available
to others,
some resources are static, like pictures taken on a particular day in the
past, and some resources are dynamic, like the movies playing in or near a particular
zip code. Many resources are dynamic in nature, so having an addressable set of
resources for your service doesn’t mean that you know all the particular resource instances
when you sit down to design your service. A resource is a conceptual mapping
to a particular entity or entity set that you want your service to be able to work with.
When designing a RESTful service, you will identify the resources that your service will
expose and use. Once you’ve identified the resources you’ll map them to URIs.
No comments:
Post a Comment