Courier-MTA has a generic filtering interface different than sendmail's milter. Among the available software, there is a Python-based architecture which you can find on pypi. It includes an attachments.py that the filter presented here is alternative to. You must install that filter before trying this.
The original attachments.py uses libarchive-c if present, but doesn't require it. The present filter does. (Mind a Python package with a similar name whose namespace collides with libarchive and hence is difficult to recognize once installed.) In addition, this filter requires oletools, a Python library to analyze OLE and MS Office files. They are both listed in requirements.txt.
This filter won't block messages destined exclusively to abuse@ mailboxes. That's meant to let abuse teams receive complaints. You may want to alter it (search can_pass), which is not quite a trendy way to maintain software —see below.
Careful with that pip3 install as it will try and install courier-pythonfilter. If it's already installed and you're in a virtualenv, you need no sudo.
sudo pip3 install -r "http://www.tana.it/svn/pyfilters/trunk/requirements.txt" curl -O "http://www.tana.it/svn/pyfilters/trunk/attachments3.py" python -m compileall -l "attachments3.py" sudo mv -i __pycache__/* /usr/local/lib/python3.7/dist-packages/pythonfilter/__pycache__ sudo courierfilter stop sudo courierfilter start
Check the /python3.7/ destination directory is correct! If you'd like to consider a pythonic install, please see below.
001: #!/usr/bin/python3 002: " attachments -- Courier filter which blocks specified attachment types" 003: # Copyright (C) 2005-2008 Robert Penz <robert@penz.name> 004: # hacked (H) 2017-2021 ale 005: # 020: 021: import sys 022: from email.message import EmailMessage, _unquotevalue 023: import email.utils 024: from email.policy import EmailPolicy 025: from email.headerregistry import HeaderRegistry as HeaderRegistry 026: import binascii 027: 028: # this is libarchive-c 029: import libarchive 030: 031: import oletools.olevba 032: from oletools.mraptor import MacroRaptor 033: from oletools import rtfobj 034: 035: # added 20 Apr 2020 036: from oletools.ooxml import XmlParser 037: from oletools.oleobj import find_external_relationships 038: from zipfile import is_zipfile 039: 040: from io import BytesIO 041: 042: 043: # for debugging: 044: import traceback 045: 046: # Extensions. Assume any extension appears in at most one list. 047: # Each list has a different treatment. 048: # Maintain: 049: # $ i=0; for e in $(sort < temp |uniq ); do printf " '%s'," $e; if [ $((++i % 8)) -eq 0 ]; then printf "\n"; fi; done; printf "\n" 050: 051: # https://support.google.com/mail/answer/6590?hl=en 052: # http://www.theverge.com/2017/1/25/14391462/gmail-javascript-block-file-attachments-malware-security 053: # https://kb.intermedia.net/Article/23567 054: 055: blocked_extensions = ( 056: '.acc', '.ade', '.adp', '.asp', '.bat', '.ccs', '.chm', '.class', 057: '.cmd', '.com', '.cpl', '.dll', '.dmg', '.drv', '.exe', '.grp', 058: '.hlp', '.hta', '.htx', '.ins', '.isp', '.jar', '.je', '.js', 059: '.jse', '.lib', '.lnk', '.mde', '.msc', '.msh', '.msh1', '.msh1xml', 060: '.msh2', '.msh2xml', '.mshxml', '.msi', '.msp', '.mst', '.ocx', '.ovl', 061: '.pcd', '.php', '.php3', '.pif', '.ps1', '.ps1xml', '.ps2', '.ps2xml', 062: '.psc1', '.psc2', '.reg', '.sbs', '.scr', '.sct', '.shb', '.shd', 063: '.shs', '.sys', '.vb', '.vba', '.vbe', '.vbs', '.vdl', '.vxd', 064: '.ws', '.wsc', '.wsf', '.wsh', '.wst') 065: 066: 067: # extensions supported by VBA_parser, see also 068: # https://en.wikipedia.org/wiki/List_of_Microsoft_Office_filename_extensions 069: # https://datatypes.net/open-ade-files 070: # https://docs.microsoft.com/en-us/deployoffice/security/block-specific-file-format-types-in-office 071: # office_extensions = ( 072: # '.doc', '.dot', #- Word 97-2003 073: # '.docm', '.dotm', #- Word 2007+ 074: # '.xml', #- Word 2003 XML 075: # '.mht', #- Word MHT - Single File Web Page / MHTML 076: # '.xls', #- Excel 97-2003 077: # '.xlsm', '.xlsb', #- Excel 2007+ 078: # '.ppt', #- PowerPoint 97-2003 079: # '.pptm', '.ppsm') #- PowerPoint 2007+ 080: 081: office_extensions = ( 082: '.accda', '.accdb', '.accde', '.accdr', '.accdt', '.ade', '.adn', '.adp', 083: '.cdb', '.doc', '.docb', '.docm', '.docx', '.dot', '.dotm', '.dotx', 084: '.htm', '.html', '.laccdb', '.ldb', '.maf', '.mam', '.maq', '.mar', 085: '.mat', '.mda', '.mdb', '.mde', '.mdf', '.mdn', '.mdt', '.mdw', 086: '.mht', '.mhtml', '.ods', '.pot', '.potm', '.potx', '.ppam', '.ppax', 087: '.pps', '.ppsm', '.ppsx', '.ppt', '.pptm', '.pptx', '.rtf', '.sldm', 088: '.sldx', '.thmx', '.wbk', '.wiz', '.xla', '.xlam', '.xlb', '.xlcxlk', 089: '.xll', '.xlm', '.xlmss', '.xls', '.xlsb', '.xlsm', '.xlsx', '.xlt', 090: '.xltm', '.xltx', '.xlw') 091: 092: 093: # extensions implemented as a zip container have a variety of media 094: # files, but macro are still implemented as OLE containers. 095: # See 'Heuristic' below. 096: # https://www.codeproject.com/Articles/15216/Office-2007-bin-file-format 097: # https://kb.intermedia.net/Article/23567 098: 099: 100: 101: # https://en.wikipedia.org/wiki/List_of_archive_formats but must be in 102: # https://github.com/libarchive/libarchive/wiki/ManPageLibarchiveFormats5 103: archive_extensions = ('.zip', '.tar.gz', '.tgz', '.tar.Z', '.tar.bz2', 104: '.tbz2', '.tar.lzma', '.tlz', '.7z', '.ace', '.rar') 105: 106: 107: # TODO: detect documents with TargetMode="External" and DDE, see: 108: # http://staaldraad.github.io/2017/10/23/msword-field-codes/ 109: 110: 111: def de_comment(field): 112: """Parse a header field fragment and remove comments. 113: 114: copied from AddrlistClass.getdelimited() in email/_parseaddr.py 115: """ 116: 117: slist = [''] 118: quote = False 119: pos = 0 120: depth = 0 121: while pos < len(field): 122: if quote: 123: quote = False 124: elif field[pos] == '(': 125: depth += 1 126: elif field[pos] == ')': 127: depth = max(depth - 1, 0) 128: pos += 1 129: continue 130: elif field[pos] == '\\': 131: quote = True 132: if depth == 0: 133: slist.append(field[pos]) 134: pos += 1 135: 136: return ''.join(slist) 137: 138: def is_quoted(value): 139: """ Check whether a value (string or tuple) is quoted 140: """ 141: if isinstance(value, tuple): 142: return value[2].startswith('"') 143: else: 144: return value.startswith('"') 145: 146: class Recipients(object): 147: def __init__(self, controlFileList=None, *args, **kwargs): 148: object.__init__(self, *args, **kwargs) 149: self.rcpt_count = 0 150: self.rcpt_abuse = 0 151: self.relay = False 152: 153: for cf in controlFileList: 154: with open(cf) as fp: 155: for line in fp: 156: if line[0] == 'r': 157: self.rcpt_count = self.rcpt_count + 1 158: if line[1:7].lower() == 'abuse@': 159: self.rcpt_abuse = self.rcpt_abuse + 1 160: elif line[0] == 'u': 161: if line[1:9] == 'authsmtp': 162: self.relay = True 163: 164: def can_pass(self): 165: "Return true if the only recipient(s) are RFC2142 abuse-mailbox(es)" 166: return self.rcpt_count > 0 and self.rcpt_abuse == self.rcpt_count 167: 168: # Python 3.7.3: 169: # Use unstructured fields by default. Structured ContentTypeHeader 170: # parses fields too cleverly, so that the following, found in the wild: 171: # 172: # Content-Type: application/octet-stream; name=3D"198646.zip" 173: # 174: # becomes: 175: # 176: # Content-Type: application/octet-stream; name=3D 177: # 178: # That way a potential threat can get away without being uncompressed. 179: # 180: # Must pay attention to potential API changes in this area. 181: myemailpolicy = EmailPolicy(header_factory=HeaderRegistry(use_default_map=False)) 182: 183: class MyMessage(EmailMessage): 184: """Email message with comments stripped 185: """ 186: def __init__(self, *args, **kwargs): 187: EmailMessage.__init__(self, *args, **kwargs) 188: 189: def get_filename(self, failobj=None): 190: """Return the filename associated with the payload if present. 191: 192: The filename is extracted from the Content-Disposition header's 193: `filename' parameter. If that header is missing the `filename' 194: parameter, this method falls back to looking for the `name' parameter. 195: """ 196: # changed from original: get the unquoted string 197: missing = object() 198: filename = self.get_param('filename', missing, 'content-disposition', 199: unquote=False) 200: if filename is missing: 201: filename = self.get_param('name', missing, 'content-type', unquote=False) 202: if filename is missing: 203: return failobj 204: 205: # added to original: non quoted comments are removed 206: bare = is_quoted(filename) 207: if not bare: 208: filename = _unquotevalue(filename) 209: filename = email.utils.collapse_rfc2231_value(filename) 210: if bare and '(' in filename: 211: filename = de_comment(filename) 212: # malformed values, e.g. name=3D"blah", we only remove trailing char 213: while filename.endswith(('"', "'", '>', ',', ';')): 214: filename = filename[0:-1] 215: return filename.strip().lower() 216: 217: def reader_entry(which): 218: # print('Entered', which, 'reader') 219: pass 220: 221: def check_message(msg): 222: block = False 223: for part in msg.walk(): 224: try: 225: # reader of attached email message 226: def mail_reader(): 227: reader_entry('mail') 228: cte = str(part.get('content-transfer-encoding', '')).lower() 229: if cte in ('quoted-printable', 'base64', 'x-uuencode', 'uuencode', 'uue', 'x-uue'): 230: payload = part.get_payload(decode=True) 231: else: # 7bit, 8bit 232: payload = part.get_payload(decode=False) 233: if part.is_multipart(): 234: return payload[0].as_string().encode() 235: # When is_multipart() returns False, the payload should be a string object. 236: # https://docs.python.org/release/3.7.3/library/email.message.html#email.message.EmailMessage.is_multipart 237: return bytes(payload); 238: 239: # multipart/* are just containers 240: if part.get_content_maintype() == 'multipart': 241: continue 242: 243: if part.get_content_type() == 'message/rfc822': 244: inner_msg = email.message_from_bytes(mail_reader(), 245: policy=myemailpolicy, _class=MyMessage) 246: return check_message(inner_msg) 247: 248: # get_filename() is in MyMessage 249: filename = part.get_filename() 250: if filename: 251: # print part.get_content_type(), filename 252: if block_file(filename, mail_reader): 253: return True 254: 255: finally: 256: pass 257: 258: return False 259: 260: def block_ole_file(filename, data): 261: try: 262: # Macros 263: parser = oletools.olevba.VBA_Parser(BytesIO(data), data=data, relaxed=True) 264: # Heuristic: if an OpenXML contains an OLE container, it is suspicious 265: if parser.type == 'OpenXML': 266: if len(parser.ole_subfiles) > 0: 267: sys.stderr.write('attachments OpenXML contains an OLE container\n') 268: return True 269: if parser.detect_vba_macros(): 270: vba_code_all = '' 271: for (subfilename, stream_path, vba_filename, vba_code) in parser.extract_macros(): 272: vba_code_all += vba_code + '\n' 273: mraptor = MacroRaptor(vba_code_all) 274: mraptor.scan() 275: if mraptor.suspicious: 276: sys.stderr.write('attachments Found mraptor.suspicious\n') 277: return True 278: 279: # External stuff 280: # DISABLED on 09 Mar 2022 281: #filedata = BytesIO(data) 282: #if is_zipfile(filedata): 283: # xml_parser = XmlParser(filedata) 284: # # This does not catch http://schemas.openxmlformats.org/officeDocument/2006/relationships/image 285: # # but instead catches simple, non-autoloading links 286: # for relationship, target in find_external_relationships(xml_parser): 287: # if not target.startswith('file:'): 288: # sys.stderr.write('attachments ' + "Found relationship '%s' with external link %s" % (relationship, target) + '\n') 289: # return True # one is enough 290: 291: except oletools.olevba.FileOpenError as e: 292: sys.stderr.write('attachments FileOpenError: ' + str(e) + '\n') 293: except oletools.ooxml.BadOOXML as e: 294: sys.stderr.write('attachments: ' + str(e) + '\n') 295: 296: def block_file(filename, reader): 297: """ 298: Check if a file should be blocked, either because of its extension 299: or its content. If content must be examined, the reader is called. 300: For Python3, the reader returns bytes. 301: filename must be defined and lower().strip() 302: Return True if blocking is deserved. 303: """ 304: # print('block_file', filename) 305: if filename.endswith(blocked_extensions): 306: sys.stderr.write('attachments Blocked extension: ' + filename +'\n') 307: return True 308: 309: if filename.endswith(archive_extensions): 310: # print filename 311: try: 312: zmem = reader() 313: with libarchive.memory_reader(zmem) as archive: 314: for entry in archive: 315: def archive_reader(): 316: reader_entry('archive') 317: mem = bytearray(); 318: for block in entry.get_blocks(): 319: mem += bytearray(block) 320: return bytes(mem) 321: 322: if block_file(entry.pathname, archive_reader): 323: return True 324: except libarchive.exception.ArchiveError as e: 325: if e.retcode == libarchive.ffi.ARCHIVE_FATAL: 326: # Unrecognized archive format, e.g. rar v5 327: sys.stderr.write('attachments Unrecognized archive format: ' + filename +'\n') 328: return True 329: finally: 330: pass 331: 332: elif filename.endswith(".gz"): 333: def gunzip_reader(): 334: reader_entry('gunzip') 335: myvars = object() 336: myvars.mem = bytearray() 337: myvars.just_1 = 0 338: myvars.size = -1 339: with libarchive.memory_reader(reader(), 340: format_name='raw', filter_name='gzip') as archive: 341: for entry in archive: 342: myvars.just_1 += 1 343: if myvars.just_1 != 1 or entry.size != None: 344: raise ValueError('Invalid gzip format') 345: for block in entry.get_blocks(): 346: myvars.mem.append(block) 347: return bytes(myvars.mem) 348: return block_file(filename[0:len(filename)-3], gunzip_reader) 349: 350: elif filename.endswith(office_extensions): 351: try: 352: data = reader() 353: if oletools.rtfobj.is_rtf(data, treat_str_as_data=True): 354: rtfp = oletools.rtfobj.RtfObjParser(data) 355: rtfp.parse() 356: for rtfobj in rtfp.objects: 357: if rtfobj.is_ole: 358: if rtfobj.oledata_size is None: 359: # format_id=TYPE_LINKED? 360: return True 361: elif block_ole_file(filename, rtfobj.oledata): 362: return True 363: elif rtfobj.is_package: 364: sys.stderr.write('attachments Found RTF package\n') 365: return True 366: else: 367: return block_ole_file(filename, data) 368: finally: 369: pass 370: elif filename.endswith('.eml'): 371: msg = email.message_from_bytes(reader(), 372: policy=myemailpolicy, _class=MyMessage) 373: return check_message(msg) 374: return False 375: 376: def doFilter(bodyFile, controlFileList): 377: "Function called by Pythonfilter" 378: try: 379: with open(bodyFile) as fp: 380: msg = email.message_from_file(fp, 381: policy=myemailpolicy, _class=MyMessage) 382: block = check_message(msg) 383: if block: 384: rcpts = Recipients(controlFileList) 385: if rcpts.can_pass(): 386: return '' 387: 388: return "550 Attachment rejected for policy reasons" 389: 390: except Exception as e: 391: sys.stderr.write('attachments ' + type(e).__name__ + ': ' + str(e) + '\n') 392: # print(traceback.format_exc()) 393: # nothing found --> to the next filter 394: return '' 395: 396: 397: if __name__ == '__main__': 398: # For debugging, you can create a file that contains a message 399: # body, possibly including attachments. 400: # Run this script with the name of that file as an argument, 401: # and it'll print either a permanent failure code to indicate 402: # that the message would be rejected, or print nothing to 403: # indicate that the remaining filters would be run. 404: if len(sys.argv) != 2: 405: print("Usage: attachments.py <message_body_file>") 406: sys.exit(0) 407: re = doFilter(sys.argv[1], []) 408: if (re == ''): 409: re = '(empty string)' 410: print(re) 411:
This is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Please find copies of the GNU General Public License
at http://www.gnu.org/licenses/.
While courier-pythonfilter now has a pip installer, this filter is still rustic. If you have better ideas, please write on list.
Copyright (C) 2017-2021 Alessandro Vesely, all rights reserved except as specified.