Hosting Mailing Lists
For the last 15 years, I've hosted community mailing lists for a few groups. At this time I am asking those groups to find new volunteers for these efforts, as my own time is now too constrained to properly handle the task. This page is meant to document some of the things we do here today; and lessons learned.
This is fairly straight forward.
- "mailman" mailing list manager/processor from http://list.org
- "postfix" mail server (SMTP) from http://postfix.org
- "detach" for detaching attachments, and replacing with links (local version available from me)
- "demime" for converting messages to plain text (local version available from me)
- apache, php, python are base layers for much of the software used.
- "mhonarc" for mail list archives, converting from mailbox format to online viewable, from http://mhonarc.org/
- solr is used for indexing the mailing lists.
The following policies are in place for the lists I host.
Only subscribers can post. The only reason for this: cut out the spam problems.
This is implemented at the SMTP level - using postfix, and a custom policy filter. This filter stops non-subscribers from posting; it stops the messages from even being sent to the mailing list processor. No administrative need to review and reject the mail. This policy has kept the mailing lists *mostly* spam free (except for spam/viruses that forge as active subscribers), with minimal overhead.
Permitted - but automatically detached. This way, we aren't bulk sending those attachments (big bandwidth spikes!); and not sending all that data to people who don't' actually want it. Some people are still on dialup. Some are on mobile devices. Both, are sensitive to big attachments being sent to them.
The detach software looks through the message, pulls out the attachments, and replaces it with a link. Anyone wanting to get the attachment, can click the link and fetch it real-time; everyone else can ignore it.
Full text is archived at http://lists.gigo.com/ . Access to the archives requires a human to look at a captcha image and respond with what number it says. This is to keep the bots out that want to harvest email addresses. We *do* show the email addresses in the archives; this allows people to send followup questions to the original posters ("Did you ever find an answer to..."). We just want to avoid the bots.
We also keep the mailing lists searchable. This is actually quite a PITA. "htdig" sort of works, but is slow and cumbersome to manage. Today we're using "solr", with a custom indexer and search page. The indexer removes some of the well known mailing list footers and such, and tries to reduce the page down to just the actual subject and content.
If necessary, as operational ownership of the mailing lists I've been hosting is transitioned, I can continue to run the archives at http://lists.gigo.com . This is less of a time commitment for me, since the work is already done. For those of you willing to handle the archives, I will pass over the "mbox" formatted messages for you to import.
One of the biggest headaches managing a mailing list, is dealing with bounces.
When you send a message to someone, it might be forwarded to another ISP. And that ISP, may no longer have that subscriber. The bounce back you get, does not match the address you actually sent to.
A local customization was done to have mailman use VERP with Postfix. VERP is a method of sending mails with unique return addresses, that uniquely identify the recipient (Even if forwarded). This allows the mailing list software to correctly identify the bounces (regardless of what the bounce text might have said), and to disable subscribers who are no longer reachable. This is a LOT friendlier to the list administrators, particularly when forwards are involved.
Depending on the list, a list might have 100-500 posts a month, and have 250-1000 subscribers.
Because each 1 post can generate 250-1000 new messages to go out, this can generate a reasonable load on the mailing list server. Each message has to be individually sent to other service providers. An average list today generates between 100 to 200 megabytes of traffic a month, not counting people looking at the detached attachments.
As the lists grew, and as measures were put into place to handle bounces, it became clear that the main bottleneck from mailing list hosting is I/O, not bandwidth. To make the system more responsive while the mailing lists were sending messages, and to improve how quickly we could actually deliver the mail, the smtp server's mail queue was moved to a ram disk. Because a ram disk has no moving parts, reading and writing to it is extremely fast. It also does not bog down the regular disks on the system. Most mail is delivered within a few seconds (as compared to before, where it could take up to 5 minutes).
Pain Points Today
The single biggest pain point: People who are subscribed to a mailing list, and getting messages and then reporting those messages as spam to their ISP. The ISPs take these complaints, figure out what IP address sent them, and blacklist that IP. When this blacklist happens, *all* mail from that IP address to that ISP is blocked. It doesn't matter what mailing list it is, or even if it is personal mail.
If the list is instead hosted at a "big" list provider, then you're at the mercy of not just the actions of your subscribers, but of the other lists as well. Any shared community is a shared risk.
- Run the mail server for your mailing lists *separate* from any regular users.
- Be clear on your signup page who you regularly have problems delivering to
- Be clear on where your mailing list archives are, so that subscribers can view them when they aren't receiving email
- Consider a tie-in with a web based forum to provide another avenue for people to participate.
- Consider Yahoo! Groups or Google Groups
The next pain point *was* around performance; using a ram disk significantly eased that pain. This was especially true when the regular disk was a cheap IDE or SATA drive, and not enterprise class. SSD drives might be a viable alternative, if a ram disk scares you - but SSD drives are still expensive.