High-Level Structure

ArmorLib takes the following high-level structure to maximize modularity and extendability. This document details the implementation-independent structure of ArmorLib to bring prospective developers up to speed on the software architecture.

At the most general level, ArmorLib takes in a binary object—a text file, a PDF, a ZIP file, or anything else—and returns to the user a security and privacy report about that file (or the files that it contains, in the case of an archive file). ArmorLib performs its file scanning in scan modules, small self-contained programs that take in a binary object and return a security report. The scan modules are managed by the coordinator, which ensures that binary objects provided by the user are passed through the right scan modules. (For example, the coordinator will ensure that a text file isn’t given to a scan module meant for .docx files.) Binary objects are prepared for processing by preprocessors, which run before the scan modules and do not return reports but instead append metadata to the binary object. For example, a core preprocessor would determine the binary object’s ISO filetype classification, while another may determine whether or not a file is encrypted.

Coordinator

The coordinator manages the lifecycle of an ArmorLib scan. When the coordinator is given a blob to process (either by an API call or via a command-line invocation), the coordinator first runs all preprocessors. Then, the coordinator passes the prepared binary object (with metadata from the preprocessors) through each scan module.

Preprocessors

Preprocessors take in a binary object and return metadata to the coordinator. Preprocessors may not rely on any other preprocessors. One preprocessor may determine the file’s type (necessary for determining which scan modules will run), while another might determine whether the file is encrypted. Preprocessors generally perform functions that create data that is useful to a wide number of scan modules.

Scan Modules

The core modular unit of ArmorLib is the scan module. A scan module takes a binary object—typically, the file the user is scanning—as an input and returns a scan report as an output. Scan modules perform the actual work in ArmorLib. For example, a certain scan module searches for the presence of zero-width characters in text files, while another may search for hidden virus payloads in all files. Scan modules can range from very general (those that are run on every file scanned) to very specific (those that only run on PDFs, for example). Scan modules may be recursive—that is, they may call the on coordinator to return a report on a separate binary file (for example, the .ZIP contents scan module may recursively run scans on all of the .zip file’s contents).

Scan modules may be called as part of a full scan as managed by the coordinator, or independently by another program by direct import. While the latter process must also be performed through the coordinator (preprocessors still must be run, after all). Therefore, scan modules may not depend on any other scan modules to function (though they may depend on the preprocessors specified in their configuration; see below).

Registration

All scan modules must be recognized by the coordinator in order to be run when ArmorLib scans a file. In order for a scan module to be recognized, it must be referenced in the ArmorLib configuration. In registration, the scan module tells the coordinator its location (that is to say, how to run it), its subscription (the type of files it handles, as an ISO code), and the preprocessors it depends on.

Scan Reports

Each scan module returns a scan report to the coordinator. A scan report is an array of findings, each of which has the following fields: status (an enumerator for OK, WARN, DANGER, SEVERE that corresponds to the most severe status of the any of the findings); and findings, an array of findings, each of which has a human-readable description, a status level (an enumerator identical to that of the parent object), a title, a type (human meaningful and constant among all identical findings, for example ZERO_WIDTH_CHARACTER_DETECTED), and an unstructured JSON-like object data (the .ZIP contents scan report might attach the scan reports of the archive file’s contents in the data field, for example).

Command Line Interface

The command line interface allows for an end-user to interact directly with the coordinator without using another Rust program as a hook. It acts as a proxy by which the user can run files on the local machine or on the network through the coordinator, and parses the scan reports into a human-readable format.