Chapter 17

Plan for Hosting

Summary

A website needs to be deployed in a public environment to function correctly. You need to determine the technical, organizational, and financial parameters of this hosting.

Does this Apply?

To be honest, this probably doesn’t apply to you because only a minority of people reading this book will have to deal with any of this. If you have an IT staff that manages all your infrastructure, then they’ll manage most of this. An external development team that’s building your website will often also manage it, so they might handle all this as well.

However, even if this chapter doesn’t apply, you might find it interesting as it covers some of the underlying technologies and architecture of how the internet works. You will be affected by hosting at some level, so you may as well know how it works.

Also, this might apply to your budget. Even if you don’t have to deal with hosting at a tactical level, your organization might want to take it out of your budget, so it might be worth a read.

Narrative

Your website has to be somewhere in order to be available to the world.

At the risk of being pedantic, the files that comprise your website have to be copied to a computer somewhere, in a format the computer can understand, and that computer needs to be hooked to the internet and configured in such a way so when a visitor types your domain name into their browser, the mighty power of internet routing directs them to your website on that computer in such a way that it can match up their request to some resource and return the correct response.

That’s the long version.

Here’s the short version: your website needs to be hosted.

“Hosting” is a word that represents all the stuff it takes to make your website less theoretical and more actual. It’s the processes and infrastructure to make your website available to visitors.

Depending on your staffing, organizational, and contractual scenario, you might be involved in this a great deal, not at all, or any gradient in between.

Let’s consider some examples:

  • John works for a multi-national bank. He’s built a small campaign microsite as static HTML documents. The bank has a data center and has agreed to give him a folder on the main web server to deploy his site by copying files across his local network.

  • Mary works for a non-profit. They’re using a subscription website builder she logs into to make her content changes and apply themes. She makes changes and they appear on the website under the domain name she typed into the admin console.

  • Terri works for a B2B manufacturer. They hired an external marketing company to rebuild their website. As a part of that contract, the marketing company is managing the hosting of the website on Microsoft Azure. Terri is aware that it’s hosted somewhere, and she gets the invoices for it, but she can’t affect anything in the hosting environment.

  • Laura is a developer at a university. She purchased a CMS from a vendor and built a new website using it. The vendor includes hosting in their environment , and Laura contributes code to a source code repository system, then deploys her changes via an admin console.

All those people have hosting in some sense, but they’re affected by it in varying degrees. Some of them are up close and personal with it. Others couldn’t tell a hosting account from a jelly doughnut (and are better off for it).

In this chapter, we’re going to talk about hosting in general, and a lot of hosting-related topics. Just know that you might never come into contact with any of this stuff after you finish reading this. Or you might.

The key is: you need to find out.

To go back to the original point: your website has to be somewhere. So you need to get some questions answered before you have any chance of actually unleashing your magnificent creation onto the world.


The Hosting Account

We’re going to roll up a lot of functionality and discussion up into the idea of a hosting account. This is a service you purchase from some provider that allows you to run a website and expose it to the internet.

Hosting accounts have varying capabilities. At the bare minimum, a hosting account is going to provide:

  • The ability to store a set of files

  • The ability to make those files available at specific URLs

You can upload a set of files that comprise a website, and the hosting account will make them available to the internet at a URL (we’ll talk about domain names a little later, so hang on).

Let’s talk about some common capabilities:

Capability: Programming Language

Briefly, let’s cover two terms:

  1. Request: data your web browser sends to a server

  2. Response: data the server sends back to your browser in response to your request

When your browser requests an image file, for example, the server responds with a set of bytes that represent that image, along with some other hidden data such as what type of image it is.

The simplest request-response is:

Browser: “Give me this file.”

Server: “Here are the contents of that file.”

Sometime right after the web was born, someone got the idea that we might do things to files before we send them back. Instead of sending requests to actual files, some decided to send them to programming scripts that would execute on the server and return the results.

Now we have:

Browser: “Execute a script based on this URL.”

Server: “I have executed that script, and here are the results.”

(In reality, when your browser makes the request, it just has a URL which is a series of characters – /products/, for example. It has no idea if that represents a file or a script. It’s the server’s job to figure this out.)

Flash forward 20 years, and this largely how the web works. URLs map to scripts just as often as they map to files. Sometimes those scripts are an actual file (like /news.php), and sometimes they’re just some designation that a larger process reads in and uses as input (like /news/topics/africa).

Most every hosting account is going to be able to execute some language. Different accounts will service different languages. Common languages are:

  1. PHP

  2. .NET (pronounced, “dot net”; also commonly referred to as "C#" or “C-Sharp” for reasons that aren’t important)

  3. Java (or “J2EE” or "JEE," depending on how pedantic one wants to be)

  4. Ruby (or “Rails,” because the most common framework in use is called "Ruby on Rails")

  5. Python

Whatever CMS you select will be based on some language, and it will need a hosting account capable of understanding and executing that language.

Capability: Database

Databases are places to store and retrieve information. If you have 10,000 news articles, for example, you could put those in a database for safekeeping, and then query the database to return certain articles based on criteria. We might ask questions like this:

  • “Give me all the articles.”

  • “Give me the last 20 articles that were written, ordered by date, newest first.”

  • “Give me all the articles containing the word ’economics’.”

Refer back to the programming language discussion above – any one of those scripts, when executing, might contact a database, get some data back, and then incorporate that data into its response.

Common databases are:

  • MySQL (pronounced “my sequel” or "my ess que elle" – no one really agrees)

  • SQL Server (see a trend there? SQL is a common database query language)

  • PostgreSQL (most people just say “post-gress”)

  • MariaDB

  • Oracle

  • MongoDB (in a class of so-called “No SQL” databases that use a fundamentally different storage and retrieval mechanism ... but it’s still a database in the purest sense of the term)

Every CMS is based on a database of some type. In fact, a CMS might be considered a “super database” that wraps a raw database in another level of functionality for people who want to manage content.

The “Technology Stack”

The programming language of the hosting account, combined with the database it supports, comprise what’s known as the “technology stack.”

It’s called this because every CMS runs on an underlying “stack” of technologies. At the lowest level, there’s the operating system of the computer itself. Then the web server that runs on it. Then the programming language that executes. Then the database that’s included. And then the CMS sits on top of all of these.

Here’s an example for Episerver:

  • Programming Framework: ASP.NET MVC

  • Programming Language: C#

  • Web Server: Internet Information Server

  • Database: SQL Server

  • Operating System: Microsoft Windows

Each piece builds on the one “below” it. Episerver requires all the pieces in this stack, from top to bottom.

Almost every CMS requires a specific technology stack, and this tends to be a rigid requirement. Sometimes it can swap out parts of it (some CMSs, for example, can talk to more than one kind of database), but that’s becoming less common. As specific stacks are easier to come by, CMSs tend to just support one of them – the CMS dictates the environment which it requires to run.

This means that you need to acquire a hosting account that supports the technology stack that your CMS needs.

The hosting accounts we’ve been talking about are quite simple. They’re what’s called “shared hosting,” where you buy an account on a server that hosts a bunch of other websites, not just yours. You’re buying an apartment inside a bigger apartment building with other tenants.

However, some systems require more than that. Some systems need to “own” the entire computing environment. They have hooks and tendrils that need to dig deeper into the hosting system than just a small, shared hosting account will provide. The technology stack for these systems require you to have administrative control of an entire server.


Acquiring a Hosting Account

Before you worry about how to get a hosting account, you should find out if you even need to do it. Two reasons why you might not need to care at all:

  1. Many CMS platforms come with hosting built-in (they’re “in the cloud”)

  2. If someone is building your website for you, they can often host it when it’s launched

Those two scenarios probably represent the majority of situations. It’s uncommon for someone to build a website with absolutely no guidance on where it’s going to live in the long-term.

If you’re working for a large enough organization, there’s also a chance that your IT department wants to host the website on their own servers in their own data center. This is becoming less common, as IT staff like the idea of the website being someone else’s problem, but you still see this occasionally in health care and finance where there might be privacy and security concerns.

If none of those apply, you will need to find a place to host your website. Thankfully, compatible hosting accounts are easily purchasable from many vendors, unattended, with nothing but a credit card.

At the higher end of the spectrum, if you have a massive site serving lots and lots of traffic, you’ll probably tend towards the two big enterprise cloud companies:

  • Amazon Web Services (AWS)

  • Microsoft Azure (“az-sure”, like "assure" but with a soft “z” sound in the middle; also, some people emphasize the two syllables, while some run it all together)

These two companies provide immense computing platforms that can scale easily and quickly.

If you run a TV commercial during the Super Bowl, for instance, you could expand your website capacity by a factor 10x in just minutes, then reduce it later when the traffic has died down. This ability to quickly increase capacity is known as burst scaling.

Unfortunately, they can also be quite expensive and complicated. If you have a website that needs this level of hosting, then you likely need to find a qualified infrastructure architect who can put together an environment for you. There are lots of things to consider, many of which we’ll discuss below.

If your website isn’t quite so large, then thousands of hosting vendors can provide you with hosting accounts on various technology stacks.

We’re not endorsing anyone in particular, but here are some of the larger names in this industry:

Also worth considering are companies that have platforms specifically designed for the CMS you want to use. You’ll generally only find this in the open-source world (commercial vendors want to sell you their own hosting), but here are some players for the most common open-source CMSs:

Again, we’re not endorsing anyone, but just be aware that many hosting vendors claim to have special expertise in particular platforms.

Hosting vendors will price their plans on several different axes, all supposedly based on how much load you’re going to put on their servers.

  • Number of inbound domain names

  • Whether you need Secure Sockets Layer (SSL)

  • Number of databases

  • Amount of traffic in and out, or necessary bandwidth

  • Number of user accounts to transfer files

  • How much storage space you need for files

They usually sell these things in packages. Something like “Basic” gives you certain amounts of each; "Business" gives you a little more; “Enterprise” even more, etc. (They all love three package levels, for some reason.)

When you create a hosting account, you’ll be offered multiple ways to actually get your files out there.

  • Manually push using File Transfer Protocol (FTP or SFTP)

  • Deploy directly from public source control systems like GitHub

  • Synchronize against file storage systems like Dropbox or Amazon S3

  • Publish directly from code environments via plugins to those tools

How your files get out to your hosting account is a technical detail that your developers will handle. We’ll talk a little more about deployment in a later chapter.

Once your files are in your hosting account, and the domain name is attached to the account (we’ll discuss this a bit below), then you technically have a functioning website available on the internet. Congratulations.


Uptime, Capacity, and Reliability

Your website needs to stay available. If there’s an error on just a single page or a certain part of the website, then those problems tend to be related to how the website was built. Hosting failures, on the other hand, are usually absolute – the website just goes away completely.

Reliability in hosting is known as uptime – literally, how often is the server “up” and available for connections. The opposite is downtime. Hosting providers don’t really split hairs about uptime or downtime. The server is either up or down, and they don’t really do any shades of gray.

Sometimes, they might have a “slowdown” based on some external factor. For instance, maybe their upstream connection to the broader internet is congested with traffic . They might announce this as a service degradation, but more often they just do their reporting and service quality in terms of uptime and downtime.

How often should your server be up? Well, 100% of the time, of course.

But you usually can’t get to 100% on a single server. Servers have to be rebooted occasionally. They need maintenance. Hard drives might fail. There are lots of things that could go wrong.

Uptime is reported in percentages, representing how much time a server was available during the year. Here are some uptime percentages and the amount of downtime they would represent in a single year.

  • 99% means the server would be down for almost four full days per year

  • 99.9% is about nine hours of downtime

  • 99.99% is just under one hour

  • 99.999% is about five minutes

  • 99.9999% is just 30 seconds downtime per year

These are called “nines,” as in "how many nines of uptime do you offer?" Clearly, the more nines, the less downtime.

What’s acceptable? It depends on how important your site is, and when the downtime happens.

If you have a small campaign microsite, targeted towards North American business users who are usually active only during business hours, and the downtime can be scheduled overnight, then perhaps a total of nine hours of downtime spread throughout the year is okay for you, and three nines (99.9%) is acceptable.

But if you’re Amazon, this isn’t gonna work. Amazon pulls in tens of millions of dollars in revenue from all over the world every hour (every minute, even). Additionally, their website isn’t an optional part of their business – no website, no money. Amazon can never be down.

You can achieve perfect, constant uptime, but it will cost real money.

Servers will always need to go down for occasional maintenance, so if you need what’s called “high availability,” your website has to be spread across multiple servers and synchronized, so that individual servers can go up or down without the website being affected. Something called a load balancer sits in front of all these servers (called a server farm), and it knows which servers are up or down, and it routes requests only to the functional servers. A server could go down, and the load balancer would just stop sending it traffic until it came back up.

You’ll even need to guard against entire data centers going down, which means you’ll need to synchronize your site across continents, so that if the East Coast of the United States somehow goes offline, you can route traffic to the West Coast. You’ll need your database replicated to multiple locations, and you’ll need special programming and CMS functionality to make sure you don’t have data conflicts.

The difference between four nines (99.99%) and five nines (99.999%) is usually where costs begin to creep upwards markedly. Six nines (99.9999%) is where things start to get very expensive. From that point forward, the annual hosting costs might start to overshadow all the other costs associated with the project.


Bells and Whistles

What we’ve explained above is the bare minimum you need to have a website connected to the internet. But there are some extra things to consider with hosting.

Caching

Content often exhibits a characteristic called WORM – Write Once, Read Many. We might publish an article one time, but it’s read tens of thousands of times. In most every case, content will be read much more often that it’s written.

This is true of any internet resource. The entire internet is WORM.

It takes work to generate a response to a request. If you need to query a database or do some other script execution, this consumes CPU cycles and can sometimes take enough time to make a website feel sluggish. It’s often helpful to generate responses to request, and then hold onto that response so you can replay again easily.

These saved responses are called a cache (sounds like “cash”) and the action of doing this is called caching.

Caching is performed at all levels in computing – the CPU inside your computer even caches certain data to make it run faster. For a website, caching could take place at multiple points.

  1. The CMS might hold onto content that it has retrieved from its own repository

  2. The web server might hold onto responses it has generated

  3. Some infrastructure in front of the web server might also hold onto responses; requests are sent through these dedicated caching servers first and might be handled at that level, and never even touch the actual server

  4. The visitor’s browser might hold onto data it has received and not request a new version for a period of time

In all cases, every process is going to check its cache first to see if could quickly fulfill a request. If not, it will need to retrieve the content from upstream.

The downside to caching is that a cache can “get stale,” which means the underlying content has changed, but since an older version was cached, visitors are still seeing that instead. To prevent this, systems have methods to invalidate their cache, which is a fancy way of saying they’ll delete the cached data so that it has to be requested from scratch (and then they’ll re-cache the fresh data).

Caching can make a website very fast, but sometimes stale cache can cause strange issues where two people see different things.

Monitoring

Let’s say your website mysteriously goes offline every night at 11 p.m. and comes back online at 4 a.m. If you’re not awake and looking at the website at those times, how would you ever know?

This could even happen during the day. If you’re not looking at your website all day, then it could easily go offline for hours without you knowing. All you might see is a drop in analytics during that time (no site visits would register), or a drop in form submissions later on. Unless you get the inevitable call that goes, “People are saying our website is down!” then it could disappear and come back and you’d be none the wiser.

Let’s face it: most of the time, you’re just trusting that your website is running. You don’t know this for sure unless you go look.

This is where a monitoring service comes in. There are automated services you can subscribe to that will check your website multiple times a day – every hour, every minute, whatever – to make sure it’s functioning.

Monitoring systems can check lots of different things. At the highest level, they can simply make sure the website responds, which would verify the server and network connection are up. But this is just binary – it’s up or down – and errors are often more granular than that. Your website might technically be up and responding, but some error means it’s returning garbage.

Going deeper, some systems can do things like:

  • Request specific pages, and look for certain text and elements

  • Virtually click on various navigation options and ensure the page responds as it’s supposed to

  • Submit search forms and verify that certain, known results appear

  • Submit contact forms and verify the correct email contact is received

In QA and testing parlance, the usage of these checks is called test coverage. Ideally, you would create tests to give you maximum coverage – to automatically explore every nook and cranny of your website on a timed schedule. The corresponding downside is the expense and the effort, both to create the tests, and to change them when the underlying functionality of content changes in such a way that it makes the test invalid.

If the service detects a problem, it can send alerts. It might send an email, send a text, post to a Slack channel, whatever.

These notifications might be blind, meaning the service just sends them, or they might require acknowledgement or they escalate. So the text goes to Person A, who has to acknowledge it in some way or a text goes to Person B in five minutes, and so on. The system will continue down an escalation path until someone acknowledges it.

Some systems will even store a screencap or video of the test interaction, so in addition to the notification, you can watch the test in-progress and see exactly what happens when it fails. This is absolutely critical to finding problems that are frustratingly random and can’t be replicated on-demand.

Content Delivery Networks (CDNs)

Earlier we differentiated between a file and and a script. In some cases, a web server just returns the raw contents of a file, and sometimes it executes the script. Raw files are much easier and much more common.

The average web page usually involves the execution of a single script (the page itself), but might request dozens of files – images, stylesheets, etc. We’ll call these static assets.

On a high volume site, sometimes it’s helpful to serve those static assets from another server. By doing this, you relieve your main server from the workload of something it doesn’t really need to do. While a logical script needs to execute from a central location where it has access to databases and other resources, a static asset can be delivered from anywhere – your logo image is going to be the same, no matter where it comes from.

In fact, it’s often helpful to physically place those files in locations closest to the requestor, geographically. Image files tend to be larger, and minimizing the logical network distance they need to travel makes your site more responsive.

Content Delivery Networks (CDNs) are networks of servers designed specifically for delivering static assets at high speed. Since they could be requested from anywhere, your static assets are replicated to multiple servers, so they can be delivered from the node closest to where they were requested. Your assets might exist on hundreds of servers all around the world, simultaneously. Large CDNs have servers strategically located close to metropolitan centers to efficiently deliver assets to large concentrations of people.

All the while, your main server is relieved of all the processing overhead for delivering this content, and your website is more responsive as a result.

Data Residency and Sovereignty

We tend to think of the cloud as just being “out there” somewhere. We store data as if the cloud is its own special geographic location.

However, occasionally, you need to be concerned with where the actual servers storing your data are physically located. Different laws might apply to data stored in different places.

This concept is known as data residency or data sovereignty. It’s beyond the scope of this book to discuss the particulars, but know that if you’re collecting any personal information from people, there may be laws of the country your organization is bound by that will force you to store that data within your country’s borders.

If that data were to be transmitted to a data center in another country, it might be legally available to another government, which might violate laws of your country. It gets complicated quickly, and you should consult your legal team to find out if you’re bound by any of it.

If you are, it’s a surmountable challenge. You’ll just need to find a hosting provider that will certify data is stored within the borders of the specific country you require. These providers exist for exactly these reasons.

Disaster Recovery (DR)

If a disaster strikes and your entire hosting infrastructure goes offline, how quickly can you recover? To get your website back online, you would need to secure a new hosting account, redeploy the website from backup, re-point your domain name to the new location, and wait for all DNS changes to propagate (discussed below).

For many organizations, this process would take too long to perform from scratch, so they require disaster recovery (DR) or business contingency (BC) plans in place and ready to be executed when necessary.

In many cases, this means your entire website will be constantly replicated to another environment (called, appropriately, a DR environment or DR server). So, every time new code is deployed to the main environment, it’s also deployed to the DR environment. And whenever content is changed, the content change is also replicated to the DR environment.

The goal is that you always have a duplicate version of your website on hot standby, ready for traffic to be routed to it at any time. These environments are often meant to be temporary, which means content or code would never be changed there – the website would be read-only until the problem is resolved and traffic can be routed back to the main environment.

Hot standby informally means you can switch very quickly, perhaps in a matter of minutes. Some organizations might use the term warm standby to mean a longer ramp-up – maybe a matter of hours – but still faster than having no recovery plan at all.


Domain Names and DNS

Earlier, I congratulated you for getting your files out to your hosting account and “getting on the internet.” But I skipped an important part: your domain name. There’s a step where you have to tell the world that your domain name – www.myawesomewebsite.com – should point to the website you built.

We’re going to cover this at a pretty high level.

Computers on the internet are actually identified by numeric labels called IP addresses – something like, 12.34.56.78 . Given that address, the magic of the internet can send your request to the computer to which it’s assigned.

But humans can’t remember those, so someone came up with the idea of using easier-to-remember text labels as stand-ins for IP addresses. These text labels are called domain names.

So, your domain name of www.myawesomewebsite.com is actually mapped to an IP address. It’s resolved to this address by way of the domain naming system (DNS). When you type your domain name into your browser, it contacts the global DNS system to find the IP address to which that domain name is assigned, then sends your request to that computer.

A server might have a single IP address, but be serving 10,000 websites at the same time. This still works because requests for all of those 10,000 websites will come into that server bearing the domain name they want. That server can evaluate that domain name, then send the request to the files of the hosting account configured for that specific domain name.

So the process for making this all happen takes a these steps:

  1. You need to acquire a hosting account and determine what its IP address is

  2. You need to acquire a domain name by purchasing it any number of vendors who sell them

  3. You need to configure a DNS record that tells the global DNS system what IP address that domain name should map to

  4. You need to configure your hosting account – which exists at the IP address from step #2 – that it should handle inbound requests for that domain name

As I said before, this is wildly simplified. Hosting vendors often combine the domain purchasing and hosting process. You can get a domain name and hosting account in one transaction, and the vendor will handle all the mapping for you automatically.

What this also means is that if you’re re-building your website, launching it often just means changing where your domain name points. You might build your new website on a new hosting account, serviced by an alternate domain name, like new.myawesomewebsite.com. When it’s time to launch, you just change where the DNS record for www.myawesomewebsite.com points, then shut down the old hosting account (which shouldn’t be receiving traffic any longer).

DNS takes some time to change. Since the DNS system is contacted a lot, DNS servers will cache the mapping between domain names and IP addresses. They might check for changes once an hour, or once every 24 hours. So when you change where a domain name points, it sometimes takes a while for everyone to see the new website, and you can have situations where someone in one part of the country is seeing the new site, and someone somewhere else is still seeing the old site.

Even if none of this interests you, there’s something you need to be aware of: for you to launch your new website, someone in your organization will likely need to change a DNS record. If this is the case, find out who this person is.

There’s nothing more frustrating than getting ready for a big launch ... only to find out that no one knows who has access to re-configure the domain name. The person who can do this – and it’s likely a system administrator of some kind – needs to be acutely aware of your launch schedule and be on-standby to make the required change.


Inputs and Outputs

Before you discuss hosting, you need to know what technology stack your website will run on. This means that you need to select a CMS. Additionally, you need to determine if hosting is even your problem. It might not be, for reasons discussed in this chapter. If hosting is something you need to manage, then the output of this process is an acquired hosting account, ready to receive the finished website.

The Big Picture

You’ll need to have a development environment figured out before you can start building your website. And clearly, you’ll need to have a production environment set before you can launch.

Staffing

If you have them, you probably need to involve your IT staff in these discussions. There’s a lot of stuff here that might be affected by your organization’s IT policies. Ideally, you can hand this problem off completely to someone on that side.

Resources

Articles