Biz & IT

Web Served, the finale: Congrats, you have a Web server! What’s next?

Analytics, DIY Twitter, e-mail hosting—we’ve got the hookup on neat stuff to try next.

Lee Hutchinson – Oct 9, 2013 8:00 pm | 61

Credit: Aurich Lawson / Thinkstock

Welcome, dear readers, to the final piece in our long-running “Web Served” series. Starting last November, Ars has been helping to shed light on the fun world of DIY Web hosting—we started with setting up Nginx on Ubuntu, and we’ve progressed to advanced application hosting with PHP and even Node.js.

Along the way we’ve struggled with the command line and probably cursed at typos in config files. We’ve felt the incredible triumph of a simple “success” log file message and the crushing defeat of an error that appears to be happening for absolutely no reason. If you’ve stuck with us for the entire spread of articles, you’ve got a full-featured Web server capable of safely and quickly serving pages and running a wide range of awesome applications. Congratulations are in order—good job!

At this point you’ve got a functional Nginx Web server that’s configured with an eye toward speed and security. You’ve got it configured with SSL/TLS, (maybe) have some official certificates, and can serve data encrypted. You’ve got PHP set up along with the MySQL-compatible MariaDB, so you can handle serving most popular Web applications. Speaking of applications, you also probably have a WordPress blog, a Vanilla forum, and maybe even your own MediaWiki wiki.

But there is so much more out there beyond simple PHP applications! We cracked that door open a bit in Web Served 8, where we set up Node.js and Redis in order to get Etherpad up and running. That’s just one of a huge multitude of non-PHP Web applications. If you’re like me, setting all this stuff up just gets you excited about the next big thing you can do with the server—setting up a new Web application and seeing it work correctly is addictive. What else is out there that you can play with beyond forums and wikis? What new cool stuff can we do?

Strap in. We’re going to hit a whole bunch of stuff. This time, rather than walk you through the details, you’ll be on your own for the detailed setup instructions. Don’t worry—if you’ve come this far, you can go a little farther. You’re ready.

Charts, graphs, and stats

We’ve set up a whole lot of stuff over the past eight articles, but we haven’t focused much at all on the monitoring and reporting side of things. It’s one thing to have your Web server happily spitting out pages to anyone who visits, but how do you get a handle on who’s actually doing the visiting?

Modern Web analytics is a highly refined science, and there are tons of vendors that will help you get a handle on who your visitors are and what they’re looking at on your site. The most prominent analytics tool is the aptly named Google Analytics. It’s both highly functional and free. You sign up for an account, tell Google some basic information about your site, and you’re given a tracking code—a snippet of JavaScript that you embed in each of your website’s pages. When visitors browse your site, the code is downloaded by their browsers, and their own browsers report back to Google what actions they’re taking. Google then aggregates the data in nice charts and graphs.

Google’s offering is free and it works very well, but it comes with the obvious downside of you not being in control of your tracking results. You’re leaning on Google to host the analytics service, and you’re also turning all of your data over to Google for it to use (remember the old adage that if an Internet service is free to you, you’re probably not the service’s real customer).

Piwik for self-hosted analytics

There are alternatives to Google Analytics, and in the DIY spirit of “Web Served,” I recommend downloading and setting up one of those alternatives—specifically, Piwik. Piwik is an open source analytics application that uses your server’s existing PHP and MySQL/MariaDB capabilities to deliver a very Google Analytics-style experience, with the huge bonus of leaving you in control of your data. All of the analytics collected remain on your own server in your own database.

The Piwik dashboard page for my blog. Credit: Lee Hutchinson

Piwik has good documentation for self-hosting installation. It requires PHP and MySQL (or MariaDB, like we’re using). It needs to have its own database set up, just like most Web apps require. It’s best to give it its own unique database credentials which only have privileges on its own database—same as we’ve done for the other Web apps we’ve set up in this series.

Once installed, Piwik will generate its own JavaScript tracking code for you to insert into your webpages, just like Google Analytics. It’s also a good idea to take a look at Piwik’s privacy options—Piwik includes the ability to honor a browser’s “Do not track” option as well as the ability to anonymize user IP addresses.

Piwik’s geolocation feature shows you where your visitors are coming from. Credit: Lee Hutchinson

Once you’re set up and running and you’ve stuffed your tracking code into your pages, you can sit back and watch the numbers start to roll in. Piwik will tell you what websites you’re getting visitors from and what search engine queries are leading visitors to your site (well, sort of—Google doesn’t report search referral terms on logged in Google account users, so this functionality is rapidly losing usefulness). It can break down your traffic reporting by periods of time, by pages, by sources, or by any number of other factors.

It also offers some fancy traffic flow analysis functions, though they’re not as complex and interactive as the ones available via Google Analytics. Credit: Lee Hutchinson

Munin, for your server’s health

Getting a handle on who’s visiting your server is one thing, but it’s also handy to be able to watch the health of your server—is it running out of hard disk space? Is it bumping up against its RAM limit? Is the load too high for its poor little CPU?

As with the Web analytics, there are a lot of different ways to flip here, but I’ve been particularly happy with Munin, an open source server monitoring tool. It supports as many servers as you’ve got, and it has a plug-in system that lets it monitor just about anything, from hardware to applications.

Unlike Piwik, Munin doesn’t require any active Web technology or an external database. There are two components to install: the Munin server piece and the “munin-node” reporting piece. If you only have a single server to monitor, then you install the server and monitoring node components on it. If you’ve got multiple servers to keep tabs on, one of them gets the server component and the others all get the reporting node component.

Looking through my local servers’ overall stats in the Munin console. Credit: Lee Hutchinson

Munin is installed via the command line. There are PPAs for Ubuntu with current versions, so if you like, you can grab it with a quick aptitude install munin munin-node or you can grab the latest stable code and compile it yourself. Once installed, the reporting node components rely on cron to regularly scan the server they’re installed on and report the results back to your master Munin server, which organizes the data into a simple Web interface. Munin keeps historical data, and the visual graphs are extremely handy for getting an overall feel for your server’s (or servers’) health.

Munin also has various plug-in modules that let you monitor applications—here, for example, are some of the stats available for my Web server’s Varnish cache. Credit: Lee Hutchinson

More apps to play with

Server monitoring and Web analytics are fun, but there are plenty more Web applications out there to be playing with. Our last piece on Etherpad and Node.js got us out of the PHP sandbox and into the new world of “Web scale” apps and technology. There’s much more to be done here.

Ruby, Rails, and other insanity

Ruby—and all of the various ways of serving Ruby-based applications to the Web—is rapidly growing in popularity. The language has gained beautiful darling status among developers, and there are some extremely high-profile projects (like Discourse) that are being developed with it. It’s not at all a bad idea to get some version of Ruby set up on your Web server and to practice deploying an application with it.

The problem I’ve found with Ruby and Ruby-based Web applications is that while the entire development ecosystem is apparently quite developer-friendly, it also feels very sysadmin-hostile. Much like with thrice-damned Java, Ruby applications tend to be written to require one specific version of Ruby, and there are many stable versions to choose from. This leads to tools like rbenv and RVM, which are designed to let multiple version of Ruby simultaneously coexist on your system. Ruby by default does everything it can to politely stay isolated inside your home directory. This is great if you’re developing a Ruby app, but it makes things annoyingly complex when deploying a Ruby Web application designed to be run under a non-privileged account.

There are workarounds, but the simplest thing the aspiring Web server admin can do is abandon the complexity of maintaining multiple Ruby versions attached to users’ home directories and simply install one system-wide Ruby instance. This sacrifices development flexibility, but it will keep you from developing a head full of gray hairs named after the Ruby project maintainers. On the other hand, if you’re more concerned with development than deployment and you actually want to get your hands dirty and code some Ruby yourself, set it up however makes you happy!

Discourse, a Ruby-based forum

I’ve mentioned Discourse a few times in “Web Served.” The application is the brainchild of Developer Jeff “Codinghorror” Atwood, who was most recently mentioned on Ars as the developer of the CODE keyboard (on which I am typing this very story). Atwood bills Discourse as a bottom-up rethink of how Web-based discussion forums should work. The application has a ton of neat features and is rapidly gaining in popularity—BoingBoing switched to it a couple of months back for its commenting system.

I’ve been running my own Discourse forum for a few months now, and I blogged about getting it set up in June. My instructions differ from the official installation guide in several places—most notably, I’m using Phusion Passenger and its Nginx module to serve Discourse rather than relying on Discourse’s bundled instances of Thin. This simplifies some things but complicates others, as Nginx must be compiled from source to include Passenger support (though there are ways around that).

A screenshot of meta.discourse.org, the official Discourse development forum. Credit: Lee Hutchinson

Discourse not only abandons PHP in favor of Ruby—it also drops MySQL, instead requiring a combination of PostgreSQL and Redis to operate. It relies very heavily on JavaScript for page generation (as the developers are fond of pointing out, load up a topic on a Discourse forum and peek at the page source—there’s a lot of JavaScript in there).

Discourse is still in development, so a certain amount of bumpiness is to be expected. Right now, setting it up is many orders of magnitude more complex than setting up a PHP Web application (though there’s a Juju charm for it if you’re a Juju user). Still, it’s a joy to use once it’s been installed and configured properly. Plus, “Ruby-based Web forum running PostgreSQL and Redis” just sounds so Web three-point-oh!

Discourse’s admin control panel. Credit: Lee Hutchinson

Coming soon—projects in development

There is a trio of additional Web applications in early development that I want to quickly highlight. Each of these holds tremendous promise, though two are just rough frameworks and one isn’t yet publicly available.

Roll your own Twitter with pump.io

Pump.io calls itself a “stream server.” Pump.io creator Evan Prodromou spoke at length to Opensource.com on what exactly pump.io is for, but the summary is that it’s an open source social networking app intended to let you send messages, pictures, status updates, and other short snippets of things to others.

In other words, it does a lot of the same stuff Twitter does, but without the enormous business overhead (and the responsibility to make money) of Twitter. Rather than relying on a central service, though, pump.io runs on distributed pump servers, which can either push updates to the distributed pump network or directly out to followers. One of the use cases given is a developer wanting to set up a status feed for a project. Rather than creating a Twitter feed, they can set up a pump server and folks can follow them directly there.

Pump.io is sort of a DIY Twitter. Credit: Lee Hutchinson

Pump.io is still in early release, but it’s pretty easy to get up and running—plus, it can use Redis for a back-end. And if you’ve been following the other “Web Served” tutorials, you already have a Redis setup with open databases up and running.

Host your own e-mail with Mailpile

E-mail isn’t secure. This is a given. The protocols that underlie e-mail are insecure by design, the paths over which e-mail moves are insecure by design, and the places where e-mail is stored when it’s not in transit are typically insecure. There are workarounds for each of those insecurities, but communicating securely over e-mail still requires a tremendous amount of diligence and inconvenience—so much so that few people bother.

One of the things you can do to help with security is to ditch Google (or whoever hosts your e-mail) and keep your mailbox local. This grants you effectively unlimited storage and it makes it much more difficult for your inbox to be mined for data. However, the traditional methods of sending and storing e-mail just aren’t that friendly. If you’re feeling terribly industrious and you don’t mind pain, you can set up Postfix and Dovecot to do your e-mailing and something like SpamAssassin to keep your inbox clean, but that’s about as difficult a server administration task as there is. Seriously, if you thought setting up a Web server was a little complex at times, wait until you dive into Postfix.

However, Iceland-based Mailpile wants to remove a lot of the complexity behind self-hosted e-mail while at the same time adding in a bit of security. Mailpile recently graced the Ars front page when it was the victim of some frozen crowdfunding dollars; the company has since gotten all of its funding released, and the developers are hard at work.

How you’ll mostly interact with Mailpile today. The project is in very active and very rapid development, though. Credit: Lee Hutchinson

The Python-centric project as it exists today is only a shadow of what it will eventually become. Cloning the current contents of the Github repo will get you a lightning-quick mail indexing and searching system that you can set to work on an MBOX-formatted mailbox file. It also has a nascent Web interface with basic functionality included. However, the company only recently closed out its crowdfunding round—in development terms, it’s just getting started. Mailpile tech lead Bjarni Einarsson told Ars via Skype that his goal with the project is an application “as easy as Thunderbird to get up and running, or easier, but includes all the functionality of a personal mail server.”

Mailpile’s evolving GUI. Credit: Lee Hutchinson

Ghost—blogging in Node.js

A little while back, we covered the launch of Ghost, a new minimalist blogging platform created by WordPress expat John O’Nolan and developer Hannah Wolfe. Ghost is currently only available to people who backed their Kickstarter, but according to this blog post, the project’s Github repo will be opened to the public on October 14.

Ghost is a nice alternative to WordPress, and it also uses much more buzzword-compliant technology: rather than PHP, Ghost uses Node.js. It has a beautiful (and beautifully responsive) live preview window for your posts, which you compose in Markdown.

It’s also designed to be easily customizable, relying heavily on Handlebars for theming. Ghost makes a strong differentiation between actual programming logic and design, meaning that you can change how your Ghost blog looks and functions without having to dig much into the core code.

Ghost’s post editing window, showing a live preview of what you’re typing. Credit: Lee Hutchinson

Once the repo is open to the public, setting up Ghost will be extremely easy: provided you have a compatible version of Node.js installed, you’ll simply clone the repo, run an npm install to pull down the required packages, and start it up. In my self-hosted Ghost blog, I opted to create an Upstart job to auto-start the blog’s Node.js instance, and I’ve fiddled a teeny bit with the theme. It has also proven itself to be cache-friendly, which is a nice bonus.

I’m very pleased with how Ghost looks and functions so far. It looks like it will be an excellent middle ground between something like WordPress and a full static site generator like Jekyll—just as soon as it goes live and more people can get their hands dirty with it.

Advanced stuff

There are a couple of fancy add-ons we’re going to hit—capabilities you can add to your Web server in order to make it work a little better—and perhaps a little faster—under load. The first is the ability to send e-mail without relying on Google Apps and its often capricious limits; the second is caching.

Postfix and Mandrill

I said some unkind things about Postfix a little earlier in this piece, but when it comes to mail transfer agents, Postfix is what the big boys use. Postfix is one of the ugly bedrock applications that keeps the Internet humming, and getting it set up on your Web server gives your websites and applications the near-universal ability to send e-mail.

However, in an effort to guard against spam, a significant chunk of e-mail servers around the world won’t accept messages that originate from netblocks owned by residential ISPs. Too many zombie computers crap out too many spam e-mails from residential IP addresses. The solution is to relay e-mail through a trusted source rather than sending it directly. If you have a Google Apps account, you can use Google’s servers as a relay (which I’ve blogged about here), but Google is extremely quick to drop the time-out hammer on what it perceives as spammy behavior. One or two e-mails a day is fine, but more than a few dozen—like the amount that might be generated by your Web forum software sending notifications of replies and new posts—and the e-mail stops flowing.

An excellent free alternative to Google—and one that I use myself—is Mandrill. Mandrill offers a free usage tier that will enable you to send out up to 12,000 e-mails a month—plenty for a personal Web server with a few hosted sites.

Mandrill’s control panel, showing how much mail you’ve sent and if it’s successfully getting where it needs to be going. Credit: Lee Hutchinson

There are a couple ways to send e-mail with Mandrill. The first is to set up Postfix on your Web server and configure Postfix to relay through Mandrill. The Mandrill folks have an excellent guide on getting this done (it’s far more succinct than my own guide on relaying via Google Apps). Following this guide will get you up and running with a minimal amount of fuss.

The second method is even easier: Mandrill has an API that can be programmatically addressed from within some Web applications. The Node.js blogging platform Ghost, for example, can be configured to directly send mail via Mandrill simply by using an API key, without having a locally installed version of Postfix to relay through.

There are other e-mail service providers besides Mandrill—Mailgun is another excellent one. But the ability to send e-mail is vital to having a fully functioning Web and application server, so getting set up with one of these guys is extremely useful.

GIMME THE CACHE!

We’ve saved the most complex (and fun!) part for last: caching. The Nginx Web server that we’ve built as part of this long series of articles ought to be plenty quick, because Nginx is a fast static file serving machine. However, sometimes simply being fast isn’t enough. Bolting on some method of caching to your Web server can improve its responsiveness under load (sometimes massively so, depending on the type of workload) by keeping frequently accessed files or pieces of files stored in RAM.

Web server caching solutions are a complex subject, and cache performance tuning past a certain point is less science and more voodoo black magic. Nonetheless, it’s a fascinating world to dip your toes into. One way to employ caching in front of your Web server is to install Varnish Cache.

Varnish is another application about which I have a ludicrously long blog entry; take a peek at it if you want step-by-step details on installing and configuring it under Ubuntu. The quick way, though, is to add the right PPA and install via apt-get or aptitude.

Varnish differs from other Web caching applications by virtue of being purpose-built to cache Web assets—most others, like the venerable memcached, are general key-value stores that happen to function as Web caches. Varnish is a purely RAM-based cache, so anything it grabs hold of will always come out of RAM rather than having to incur the extra latency of being run through the file system.

Varnish doesn’t really have a Web interface, but here’s a picture of htop with the two Varnish processes highlighted. Credit: Lee Hutchinson

In order to work its magic, Varnish actually sits in front of your Web server application and reverse-proxies to it. This means that implementing Varnish will almost certainly require you to move Nginx to listen for HTTP traffic on a TCP port other than 80, because Varnish will be listening on port 80 and then forwarding requests for non-cached objects back to Nginx on its own port. You can also have Nginx listen on a Unix socket if you want, though this will take a bit more work to set up.

Getting Varnish set up is typically a quick affair; it will work out of the box with almost no tweaking. However, actually getting Varnish set up effectively is a good way to blow an entire weekend. Varnish is selective in what it will and will not cache, because most websites these days feature some amount of dynamically generated content, and serving stale dynamic content can be a very bad thing (especially when that dynamic content is potentially private—like a logged in user’s Web app control panel). There are tons and tons of guides around the Internet on setting up Varnish to work with various Web applications (including my own blog post) that are well worth perusing.

Achievement unlocked: WEB SERVED

If you’ve come this far and followed all the steps, congratulations—you had to wade through a lot of jargon and a lot of technical stuff, but you now have a solid grounding in the nuts and bolts of serving pages and applications in several different contexts.

If you’re like me, though, the only thing that reading these pieces has really done is stir up the desire to do more. As Jedi Master Kenobi said, you’ve taken your first steps into a larger world—and believe me, this series is only the barest of introductions into the wild world of Web servers. Even with all the next steps outlined above, there are still so many more things to try to do—and that’s just with a personal Web server!

You should know enough at this point to avoid the Dunnig-Kruger effect when you think about what to do next. Security is always a prime consideration when adding a new application or feature—just because you have an un-hacked WordPress installation on your Web server doesn’t mean that you’re the master of all you survey. Similarly, just because you can parse a regex or two doesn’t mean that the hacked-together set of rewrite rules you’re using for your blog is the One True Way to do it. There are always new things to learn and discover. You’ll often learn something new while setting up the latest fun thing that will have implications for earlier things you’ve set up. In fact, nothing is better than learning you’ve done something wrong, because that means you can fix it.

In the meanwhile, you’ll be able to scoff at people who struggle with managing their VPS instances through cPanel. As they flail through their GUI to try to adjust a parameter, you can softly chuckle, reach forth, and deftly enter the correct command at the bash prompt. When they gape at you like you’re some kind of Unix wizard, you can nod and smile knowingly.

Served.

Listing image: Aurich Lawson / Thinkstock

Lee Hutchinson Senior Technology Editor

Lee is the Senior Technology Editor, and oversees story development for the gadget, culture, IT, and video sections of Ars Technica. A long-time member of the Ars OpenForum with an extensive background in enterprise storage and security, he lives in Houston.

61 Comments