The embodiments present a new class of content masking defenses against the Portable Document Format (PDF) standard. The defenses can identify attacks that cause documents to appear different than the underlying content extracted from the documents. A content masking defense method can include identifying a content masking attack by scanning a document file to extract a character code of a character appearing in the file. Next, the character is rendered based on a font that is embedded in the document file. Optical character recognition can be performed on the rendering, and a content masking attack can be identified based on a comparison of a result of the optical character recognition against the character code of the character.
Liu, Yao; Lu, Zhuo; Markwood, Ian Davidson; and Shen, Dakun, "Content masking attacks against information-based services and defenses thereto" (2023). USF Patents. 1361.
UNIVERSITY OF SOUTH FLORIDA