Archiving At Scale

08 May 2024

Large organizations with several thousands of employees are challenged to archive several hundreds of TBs of data or even more. In this post we’ll setup a distributed environment where the load is spread among several nodes.


Your company’s domain name is, and your SMTP servers or your provider’s mail servers send a copy of each received email to using SMTP journaling.

You have 5 worker nodes to store the emails.


You have an smtp gateway for the archive ( that forwards the received emails to be archived to

Create the following MX records for You may tweak the TTL values as well as the MX preference numbers.

archive IN MX 10 3600
        IN MX 10 3600
        IN MX 10 3600
        IN MX 10 3600
        IN MX 10 3600

Setup the archive gateway

In this example we’ll use postfix with the below configuration files


smtpd_banner = $myhostname ESMTP
biff = no
compatibility_level = 3.6

smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

myhostname =
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mynetworks =
inet_protocols = ipv4

smtpd_recipient_restrictions = check_recipient_access hash:/etc/postfix/domains, reject
virtual_mailbox_domains =
virtual_alias_maps = hash:/etc/postfix/virtual
virtual_mailbox_base = /var/mail
message_size_limit = 50000000


/etc/postfix/domains: OK

Run postmap to create the db files:

postmap /etc/postfix/virtual /etc/postfix/domains

Setup the worker nodes

The worker nodes feature the same configuration, only the license file is slightly different. The licensed hostname is for all worker node, however each node has a dedicated server_id parameter, eg. server_id=0 for, server_id=1 for, etc.




You have a high performant and fault tolerant email archiving solution. In case of a worker node is unavailable the archiving gateway can send the emails to the rest of the nodes.

Next steps

You may want to add a second archiving gateway to eliminate its single point of failure.

Then even though the whole setup is fault tolerant (ie. it can keep archiving new emails when a worker node becomes unavailable), the invidual worker nodes are not. You need to backup the worker nodes allowing you to restore them in case of an issue.

You may even consider setting up a DR site, ie. an independent datacenter where you replicate the whole setup. In a nutshell your smtp servers or your provider’s smtp servers send the journaled emails to as well which distributes the emails among dr-worker{0-4} nodes in the other datacenter.


