Sean Cuenco,
Title:
Out of Context: Enhancing Phishing Email Detection with Extended Context in Next-Generation Large Language Models
Abstract:
Phishing emails continue to be a prevalent cybersecurity threat today and can cost organizations an average of $4.88 million. It is a type of cyberattack where hackers use social engineering to entice people into sharing sensitive information.
Existing research in phishing email detection relies on outdated Natural Language Processing methods and legacy Large Language Models, such as BERT and Flan-T5, which use limited context windows. Additionally, email datasets being used consist of the Enron and SpamAssassin corpora, dating back to pre-2010. These approaches may not capture the evolving nuances of modern phishing attacks.
Our research seeks to improve existing phishing email research by using ModernBERT, which features an extended context window of 8,192 tokens compared to the conventional 512. It can process longer, more complex email sequences and allows the model to capture subtle and evolving patterns in phishing emails that previous models may have missed. We combine this advanced model with our novel dataset—comprising over 20,000 emails from universities and corporations, all dated from 2022 onward—to ensure that our analysis reflects the current landscape of phishing strategies.
This integrated approach aims to improve detection performance by leveraging extended contextual understanding alongside modern, representative data and serves as a building block to using updated models and datasets for future research.
Cuenco, Sean
Category
Poster Presentation
Description
Session 2: 10:30 am-12:00 pm
O'Leary
O'Leary-LSAMP
10:30-12:00