Database entries in
zdkimfilter.conf start with
"db_". There are
three groups of them: (1) Connection, (2) options proper, and (3) queries.
The first group controls the connection itself:
- db_backend
-
This is one of the DB engines supported by OpenDBX, which include "firebird",
"mssql", "mysql", "oracle", "pgsql", "sqlite", "sqlite3", and "sybase".
The name "test" is recognized internally and not passed to OpenDBX. In that
case, instead of running a query, the database functions print out a dump of
the allowed variables and their values. Such dump is appended to file
database_dump in the current directory if the function is linked within
zdkimfilter (used in the testsuite), otherwise it goes to stdout as shown in
EXAMPLES.
- db_host
-
Depending on database type and configuration, this can be a host name, an IP
address, a path to a Unix socket, and more. It is a string passed to
odbx_init as is.
- db_port
-
Ditto.
- db_user, db_password, db_database
-
These parameters are passed to odbx_bind verbatim.
- db_timeout
-
This is a number of seconds. Not all backends use it. Defaults to 2.
- db_trace_sql
-
Can be "Y" or "N".
If enabled, every SQL statement is logged at debug level.
When not enabled, only failed queries are logged.
The second group are the options proper. These have names starting with
"db_opt_" and are passed to odbx_set_option, if given, before binding.
- db_opt_tls
-
Can be "A" for always, "N" for never, or "T" for try.
- db_opt_multi_statements
-
Can be "Y" or "N". As with all boolean variables, "Y" is assumed if
the option is given. However, if not given, no call is made to ODBX
about it.
Note that the reference example has a query with multiple statements
(db_sql_insert_message), so this option is required if staying with
it.
- db_opt_compress
-
Can be "Y" or "N". See above for defaults.
- db_opt_mode
-
Mode of operation. This is a string that ODBX passes to the backend.
- db_opt_paged_results
-
Number of result records to transfer; 0 means all.
The third groups, queries, are made of items whose names start with "db_sql_".
Some queries may or should return values; this doc does not distinguish
between queries and statements.
Each query can use a number of variables whose values are collected while
processing a mail message or while preparing a report. A variable is
identified by a string of the form $(variable-name). Each variable
identifier is replaced by the current value, irrespective of any quotation.
Missing variables result in the empty string.
Queries in odbx_example.conf exemplify most of the descriptions given
below, using stored procedures defined in odbx_example.sql.
Queries, in turn, can be subdivided into four subgroups: (1) per-domain
settings, storage of (2) incoming and (3) outgoing messages, and (4) reports.
These subgroups characterize when the relevant queries are run. In the rest of
this section there is a subsection for each subgroup, but traits and variables
common to (2) and (3) are collected in their own subsections.
Per-domain settings
These settings are queried before signature verification of incoming messages.
The first query, new in version 1.5, extends the second one.
- db_sql_domain_flags
-
Three variables are allowed:
-
- domain
-
A domain name related to an incoming message, for example a DKIM signer.
- ip
-
The IP number of the client. For IPv4 this is given in the classic
dotted quad format.
The INET6 data type used in the example requires a "::ffff:" prefix.
- iprev
-
The reverse DNS name of the client. This variable is available if a reverse
DNS PTR RRset is defined and if looking it up by Courier is enabled (by
setting the "BOFHCHECKDNS" option in the esmtp configuration file).
-
The query runs for the From: domain, for the original From: domain if
From: was apparently munged, their organizational domains, for any
domain mentioned in SPF or DKIM results, for the sender domain and for
the list domain.
It can return one to four integer values, in this order:
- whitelisted
-
This is the internal whitelisting, not to be conflated with dnswl.
A negative value, -1, triggers shoot_on_sight mode. The message is rejected
``for policy reasons'', whether the domain is authenticated or not, unless
overridden by another whitelisting (see whitelisted_pass below)
or by dnswl (greater or equal to dnswl_worthiness_pass).
A value of 0 is ignored.
A value of 1 or higher affects the order in which DKIM signatures verification
is attempted and reported. This value can be increased automatically to 1
when a DKIM signature of the domain verifies successfully, to 2 when
sending to a domain from a personal address (not an rfc2142 address.)
When this value is greater or equal to the whitelisted_pass configured
value, it prevents a message from being rejected or dropped due to DMARC
or ADSP policy, shoot_on_sight or action_header. The domain needs to be
authenticated and not flagged shoot_on_sight itself.
- add_dmarc
-
Add to the global honor_dmarc setting. If DMARC is globally honored, a
value of -1 disables it for this domain. A value of 1 enables it for this
message, even if DMARC is not globally enabled.
Unless domain is aligned with org_domain, this result is discarded.
- add_adsp
-
Add to the global honor_author_domain setting. Like the previous item,
this is added (with sign) to the global value.
Unless domain matches the From: address, this result is discarded.
- count
-
This can be set to COUNT(*) in order to learn how many records there were.
Currently not used.
-
- db_sql_whitelisted
-
This query is kept for compatibility, and is only used if the previous one is
not defined. It can use the same three variables:
-
- domain
-
A domain name.
- ip
-
The IP number of the client.
- iprev
-
The reverse DNS name of the client. This variable is available if a reverse
DNS PTR RRset is defined and if looking it up by Courier is enabled (by
setting the "BOFHCHECKDNS" option in the esmtp configuration file).
-
This query should return a single integer, the whitelist value.
Traits and variables common to incoming and outgoing messages
After filtering is complete and response to Courier given, zdkimfilter stores
relevant message data to the database. Each message is either incoming or
outgoing, but the following thirteen variables are available in either case:
- ino, mtime, and pid
-
These are the values used by Courier to identify a message, the inode, the date
and the process-id. These variables are given as hex string, but without
the leading "0x". The following MySQL expression reconstructs the id value
found on the last line of Courier's Received. The expression uses field
names, rather than variables, since ino, mtime, and pid are also key
field names in the message tables given as example.
CONCAT_WS('.', LPAD(HEX(ino), 16, '0'), LPAD(HEX(mtime), 16, '0'), LPAD(HEX(pid), 8, '0'))
- from, date, message_id, subject, content_type, and content_encoding
-
These are some of the header fields in the message. Values are trimmed,
Content-Type is truncated at the semicolon if any. Case is preserved, since
database systems are case insensitive.
- mailing_list
-
This flag is 1 if the message has some header fields that are
typical of mailing lists. Some "List-*" fields, "Precedence:list", and
"Mailing-List:".
- envelope_sender
-
The bounce address, also known as envelop from or mailfrom.
- ip
-
The IP number of the client. For IPv4 it is given in the classic
dotted quad format; that is, without "::ffff:".
- iprev
-
The reverse DNS name of the client. This variable is available if a reverse
DNS PTR RRset is defined and if looking it up by Courier is enabled (by
setting the "BOFHCHECKDNS" option in the esmtp configuration file).
Storing incoming messages synopsis
Relevant incoming message data can be stored in the database by defining some
or all of the queries below. Besides common variables, the queries in this
group can use two additional sets of variables, one for the message proper,
and one for each domain related to that message.
incoming message variables
- message_status
-
This is one of the three values: "accept", "reject", and "drop", which
represent the message disposition as far as zdkimfilter knows.
- dmarc_dkim
-
DMARC view of DKIM validation of the whole message, it is one of:
-
- "pass"
-
if a signature by an aligned domain passed.
- "fail"
-
if no aligned signature passed.
- "none"
-
if no aligned signature or no DMARC record were found.
-
- dmarc_spf
-
DMARC view of SPF authentication, it is one of:
-
- "pass"
-
if a domain is authenticated with either BOFHSPFHELO or BOFHSPFMAILFROM
and it is aligned in the way (relaxed or strict) specified by the DMARC
record of the author's domain.
- "fail"
-
if SPF authentication did not pass, or the authenticated domain was not aligned
as required.
- "none"
-
if no DMARC record was found.
-
- dmarc_dispo
-
The message disposition, as far as DMARC is concerned. It is not necessarily
consistent with message_status. The value is one of:
-
- none
-
No policy impact on message disposition.
- quarantine
-
The message is treated ``as suspicious'', to use RFC 7489 words. (All what
zdkimfilter does is to write ``(QUARANTINE)'' in the Authentication-Results.
Any enforcement is passed on to mail delivery.)
- reject
-
The message is rejected because of remote DMARC policy.
-
- dmarc_reason
-
Currently, zdkimfilter sets this value as follows:
-
- none
-
dmarc_dispo is in agreement with the remote DMARC policy.
- forwarded
-
Not used.
- sampled_out
-
Remote DMARC policy applies to a percentage only, and the luck of the draw
exempts this message.
- trusted_forwarder
-
An authenticated domain is whitelisted >= whitelisted_pass or is
in DNSWL >= dnswl_worthiness_pass.
- mailing_list
-
MLM transformation applied and at least one signature was recovered that way.
- local_policy
-
None if DMARC, ADSP, or reject_on_nxdomain applies. Remote DMARC policy is
not being honored for this message.
- other
-
DMARC policy failed, but BOFHSPFFROM passed. Note that SPF protocol does not
provide a scope for "From:"; however, it may happen that a message has no
signature, an empty envelope sender, a mismatched BOFHSPFHELO, but a good
BOFHSPFFROM.
-
- adsp_flags
-
A comma separated list of tokens. One of the three possible values of the
policy, "unknown", "all", and "discardable"; along with "found", in case
the (unknown) status was not inferred, and possibly "fail".
This variable is only set if an ADSP record was found. Since v1.5, ADSP
records are not always looked up.
- received_count
-
The number of "Received:" header fields.
- signatures_count
-
The number of DKIM signatures.
incoming domain variables
An incoming message can be related to a number of domains. The following set
of variables are specific to each of them:
- domain
-
The domain name.
- auth
-
Sometimes called auth_type, it is the relationship(s) between message and domain,
as claimed by the use of a domain name in a message. In case publicsuffix
and save_from_anyway are both disabled --not a recommended setting--- only
authenticated relationships are flagged.
Relationship flags are expressed as a comma separated list of the following
tokens:
-
- "author"
-
The author domain is the one appearing in the From: field.
- "spf_helo", "spf", and "dkim"
-
The corresponding authentication methods, whose results are given in separate
variables. SPF results originate from Courier's BOFHSPFHELO and
BOFHSPFMAILFROM.
- "org"
-
The domain is the organizational domain of the author domain, retrieved using
the Public Suffix List.
- "dmarc"
-
The domain where the DMARC record was retrieved from.
- "aligned"
-
The domain is aligned with with the author domain, in the sense that they
share the same organizational domain.
- "dnswl"
-
The domain is whitelisted in the configured zone. This result originates
from the -allow option of couriertcpd. See couriertcpd(8).
- "nx"
-
The domain, either the author domain or the organizational domain thereof,
does not actually exist because NXDOMAIN was returned on querying its Name
Servers.
-
- dmarc_rec
-
A rewritten digest of the DMARC record, used for aggregate reports. Targets
and interval are moved to their own variables, so this digest shouldn't take
more than 60 bytes, including the appended sentinel (see "dmarc_rua" below).
- dmarc_ri
-
86400 or more frequent that matches the honored_report_interval.
- prefix_len
-
The number of bytes that must be removed from the domain name in order to
obtain the organizational domain name, as resulting from the Public Suffix
List.
- dmarc_rua
-
A rewritten subset of aggregate report targets. Only "mailto:" targets are
considered, and stored as a list of mail addresses.
A sentinel is appended to the list, to know if excessive length caused
truncation. In that case, the DMARC record is looked up anew. Even if DB
queries are usually faster than DNS ones, reserving too much room might degrade
performance, so a compromise length that fits most domains could be preferred.
- spf
-
Sometimes called spf_result, it is the SPF result, if the domain appears in any
of helo, mailfrom, or From:. SPF results originate from Courier's BOFHSPFHELO
and BOFHSPFMAILFROM.
- dkim_result
-
This is set to 1 if any DKIM signature from the given domain verified.
0 otherwise. Note that a variable with the same name but different
content is available for db_sql_insert_msg_ref.
incoming message queries
These queries run right after a zdkimfilter process yields control of the
message back to Courier. They run in the order given below:
- db_sql_insert_message
-
This query can use incoming message variables only, not the domain ones. It
can return a message_ref, a single value that will be available to
subsequent queries. Since Courier's triplet ino, mtime, and pid can
be used as a unique key to the message table, it is not imperative that the
key of the last inserted message be returned. However, the example
uses a second statement to select the last inserted ID as a return value.
- db_sql_select_domain
-
This query can be used to check whether a domain name exists already.
It can return the key --one row, one column--- which will be made available as
variable domain_ref.
The reference example maps this query to a stored procedure which takes
care to insert the domain if it isn't found.
- db_sql_insert_domain
-
This query runs only if the previous one did not return the key. It can
return the key too. If it does not, db_sql_select_domain runs for a second
time right after this.
Be wary of double insertions, as multiple filters can be active at the same
time. The example does not use this query.
- db_sql_update_domain
-
This query runs only if the first invocation of db_sql_select_domain did
return a domain_ref. That is meant to update the domain record which
existed already.
The example does not use this query, since the stored procedure used to
select or insert does the necessary updates as well.
- db_sql_insert_msg_ref
-
This is the last query in the domain loop. It is used to insert msg_ref
records. If a message was signed multiple times by a given domain, this
query is executed multiple times, one for each signature. If a message
has no signatures, the query is still executed once, to link the domain
to the message. It has the following additional variables:
-
- dkim
-
Sometimes called dkim_result, it is the verification result proper, one
of none, pass, fail, policy, neutral, temperror,
permerror. Note that a variable with the same name but different
content is available in the domain calls.
- dkim_trans
-
A tiny integer that is non-zero if a signature was recovered after
undoing a mailing list manager transformation.
Aggregate reports can output a human_result like
"through MLM transformation" when this datum is present for a signature.
- dkim_order
-
The 1-based position of this domain's signature(s), preferred domains first.
Ordering is done before verifying signatures, and it is reported to domains
which accept DMARC aggregate reports.
- dkim_selector
-
The signature selector, "s=" tag.
- message_ref
-
The value returned by db_sql_insert_message
- domain_ref
-
The value returned by either db_sql_select_domain or db_sql_insert_domain.
-
Storing outgoing messages synopsis
db_sql_check_user in action, as displayed by Thunderbird (above) and Courier webmail (below) clients.
Limiting can be achieved by imposing a fixed limit, for example like so:
db_sql_check_user SELECT IF ((SELECT SUM(rcpt_count)\
FROM message_out WHERE user = $(user_ref) AND\
mtime > UNIX_TIMESTAMP(NOW() - INTERVAL 1 DAY)) > 500,\
'Block this guy', 0) AS reason
More complex settings, possibly involving user interactions with web forms are possible.
These queries consist of an initial query, a loop for each target domain, and
a final query. They can use the common variables defined above, and three
additional ones.
outgoing message variables
- rcpt_count
-
The number of envelope recipients that the message is to be sent to.
- domain
-
This is either the user domain, in db_sql_select_user and
db_sql_check_user, or one of the domains where the message is destined
(target domains).
- complaint_flag
-
A flag that likely identifies the message as a complaint:
-
- bit 0 (complaint_flag & 1)
-
The envelope sender is empty.
- bit 1 (complaint_flag & 2)
-
The From: field is a common address; that is, one of
``postmaster'', ``abuse'', ``info'', ``sales'',
``list'', ``support'', ``noc'', ``marketing'',
``security'', ``hostmaster'', ``webmaster''.
-
outgoing message queries
- db_sql_select_user
-
This can additionally use local_part and domain, which identify the
authenticated user who sent the message. It is the first query called for each
message. It can return two values, that is one row with two columns, that will
be available as variables user_ref and message_ref to the remaining
queries.
- db_sql_select_target
-
This query selects a target domain and returns a domain_ref. This and
the following queries occur once per target domain; that is, for each of the
domains that appear in one or more of the message's recipients.
- db_sql_insert_target
-
This query is called only if no domain_ref was selected by
db_sql_select_target. It is meant to insert a domain and possibly returns
domain_ref directly. If it returns nothing, then db_sql_select_target is
called a second time with the same domain.
- db_sql_update_target
-
This query is called only if domain_ref was obtained by the first call to
db_sql_select_target; in that case, db_sql_insert_target is not called.
- db_sql_insert_target_ref
-
This completes the target loop. It is called with both domain_ref and
message_ref and provides the many-to-many link between the targeted domain
and the user.
- db_sql_check_user
-
This query is called last for outgoing messages. It can use local_part and
domain as well as user_ref variables to identify the user. It is meant
to check whether a user's activity is normal or exceedingly high, such as
submitting too many messages for too many recipients, from too many IP
addresses.
If the query returns anything that is not 0 or NULL, and blocked_user_list
is defined, then the filter appends the user-id to that list, quoting the
returned string as a reason.
As the result depends on a query, it can as well consider a field that users
can adjust from a web form in order to temporarily increase their limits.
Retrieving values for DMARC aggregate reports
This last group of queries is only used by
zaggregate. Two queries are
expected to return multiple rows and columns with the data that was stored
from incoming messages; columns here are named as the corresponding variables
there. A third query updates domain timestamp. The three of them --one per
subsection--- are described below, although
zfilter_db, as a command,
simulates filtering only, not reporting. For the rest of the story see
zaggregate(1).
DMARC aggregate domain query
The query db_sql_dmarc_agg_domain selects the domains that deserve an aggregate
report. It can use two variables:
- period
-
The duration, in seconds, of the reporting interval. It is the value
configured as honored_report_interval, possibly forced to be a submultiple
of 86400 (24 hours).
- period_end
-
The end of the period, in seconds since the Unix epoch, UTC. It is the time
of the current zaggregate invocation, rounded downward to a multiple of
period.
This query is expected to return six columns, the first two of which are
essential, in the order given:
- domain_ref
-
Domain key, passed as-is to the next query.
- domain
-
Domain name, as written in the aggregate report. This name is also used to
look up the following three "dmarc_r*" values, in case they are missing or
corrupted, and to validate external report targets.
- last_report
-
The end period of the previous report, 0 if it is the first report to this
domain. This value is used to skip a domain if its report is due at a future
interval, to avoid sending already sent data, and the like.
- dmarc_ri
-
The (adjusted) report interval for this domain.
- dmarc_rua
-
The report targets.
- dmarc_rec
-
The DMARC record, used to report the policy published by the domain.
DMARC aggregate record query
The query db_sql_dmarc_agg_record selects the records, or rows, to be
reported for a given domain. It can use four variables:
- domain_ref
-
Domain key, as obtained from the previous query.
- domain
-
Domain name.
- period_start
-
The beginning of the period being reported, in seconds since the Unix epoch,
UTC.
- period_end
-
The end of the period, as above.
This query is expected to return 11 + N*4 columns, where N is the number
of DKIM signatures reported. The order of columns is as given below
(parenthesized names refer to XML tags in the report):
- ip
-
Where messages in this row originated from ("source_ip").
- count
-
Number of messages that this row covers ("count").
- dmarc_dispo
-
("disposition")
- dmarc_dkim
-
Message's overall DKIM evaluation ("dkim").
- dmarc_spf
-
Message's overall SPF evaluation ("spf").
- dmarc_reason
-
The policy-override reason detailed above ("reason/type"). The XML schema provides for a
comment to supplement the reason ("reason/comment"). To add it, put it after the reason but in
the same column, separated by a space. That can be done, for example, using
MySQL's CONCAT function.
- domain
-
The author's domain ("header_from"). Of course, it turns out to match the
DMARC domain, but doesn't have to be exactly equal.
- spf domain 1
-
The domain of envelope sender in scope mfrom ("envelope_from" and
possibly "spf/domain"). Only one SPF domain/result pair is reported.
Preferably this one, since it is the most widely used SPF authentication
scope.
- spf result 1
-
The corresponding result ("spf/result").
- spf domain 2
-
The domain of HELO/EHLO verb, in scope helo ("spf/domain").
This is the only official SPF alternative to envelope sender.
zaggregate reports this second pair of SPF values only if it is a pass
while the first one is not, or if the result is similar but this domain
is better aligned with From: than the first domain.
- spf result 2
-
The corresponding result ("spf/result").
- dkim domain N
-
The Nth signing domain ("dkim/domain").
- dkim selector N
-
The selector used to sign ("dkim/selector").
- dkim result N
-
The corresponding result ("dkim/result").
- dkim transformation N
-
An integer, if positive signals that the MLM transformation succeeded
("dkim/human_result").
The number N of DKIM pairs may vary from a row to another. If either
member is NULL the pair is skipped.
Check recipient address
The query db_sql_check_dmarc_rcpt checks an email address,
which is available as a variable. Returns zero on good addresses.
The only variable is the address to be checked.
- dmarc_rcpt
-
The email address given in "rua=" (possibly overridden).
Set DMARC aggregate query
The query db_sql_set_dmarc_agg updates the last_report using the value of
period_end. For practical reasons, it can use exactly the same four
variables as db_sql_dmarc_agg_record, but is not expected to return
anything.
The distributed file
odbx_example.conf contains an example configuration
which works with MariaDB.
Example database tables and procedures are defined in odbx_example.sql.
For DMARC, file odbx_example.sh contains a few calls to zfilter_db
meant to provide some data to test zaggregate.
The following examples work in such environment.
Check what happens receiving a message:
$ zfilter_db -f test.conf --set-stats i example.com
Same as above, with a DKIM signature:
$ zfilter_db -f test.conf --set-stats i example.com/dkim
The two commands above only insert a record in message_in. To simulate a
real message, one can use something like the following:
$ zfilter_db -f test.conf \
--set-stats I @ bounces@s.example.org ugo@s.example.org \
--set-stats-domain \
example.org/org/aligned/dmarc:'v=DMARC1; p=reject; ri=300;' \
s.example.org/author/aligned/spf:softfail/dkim:pass
Besides inserting a message_in record, the latter command inserts or
updates the domain and generates as many msg_ref records as needed.
It could result in the following:
MariaDB [zdkimfilter]> SELECT m.ip, FROM_UNIXTIME(m.mtime) AS time,
-> d.domain, r.dkim, r.dkim_order AS o
-> FROM msg_ref AS r, message_in AS m, domain AS d
-> WHERE r.domain=d.id AND r.message_in=m.id
-> ORDER BY m.id DESC limit 3;
+------------------+---------------------+---------------+------+---+
| ip | time | domain | dkim | o |
+------------------+---------------------+---------------+------+---+
| ::ffff:192.0.2.3 | 2022-05-13 19:31:18 | s.example.org | pass | 1 |
| ::ffff:192.0.2.3 | 2022-05-13 19:31:18 | s.example.org | pass | 2 |
| ::ffff:192.0.2.3 | 2022-05-13 19:31:18 | example.org | none | 0 |
+------------------+---------------------+---------------+------+---+
3 rows in set (0.001 sec)
Does the user check function work? To test it, we simulate an outgoing message
with 8000 recipients. Since rcpt_count is accepted at position 9, we use
"@"s placeholders and let the test program assign those values for us:
$ zfilter_db -f test.conf --set-stats-domain example.org \
> --set-stats o user@example.com @ @ @ @ @ @ @ 8000
The bottom line of the output should report ``user check: 0'' until the
configured limit is reached.
We can populate the test database as needed, in order to design and test our
procedures and scripts. But what variables can we use for a given query?
Just ask:
$ zfilter_db -f test.conf --test --set-stats i example.com,dkim
Variables allowed for statement db_sql_insert_message:
ino: 9f5ae9e
mtime: 627f7c93
pid: 186e
ip: ::ffff:192.0.2.9
date: Sat, 14 May 2022 11:55:31 +0200
envelope_sender: bounce.address@example.com
from: sender@example.com
subject: Subject of the msg
message_id: <2307a5eb@example.com>
content_type: multipart/mixed
content_encoding: -- not given --
received_count: 2
signatures_count: 1
mailing_list: 1
adsp_flags: -- not given --
dmarc_dkim: none
dmarc_spf: none
dmarc_reason: sampled_out
dmarc_dispo: reject
message_status: accept
(Never mind whether dummy values are given and, in case, their value.)