Nov 13 – 14, 2024
Europe/Berlin timezone

Enhancing Phishing Email Detection with Context-Augmented Open-Source Large Language Models

Nov 14, 2024, 1:20 PM
20m
Talk Main Track Main Conference

Description

Abstract
Large Language Models offer a promising approach to improving phishing detection through advanced natural language processing. This paper evaluates the effectiveness of context-augmented open-source LLMs in identifying phishing emails. An approach was developed that combines the methods of Few-Shot Learning and Retrieval-Augmented Generation (RAG) to significantly improve the performance of LLMs in this area. On this basis, it has been shown that the presented approach can significantly improve the recognition rate even for smaller models.

Introduction
Phishing attacks are a major threat to cybersecurity, using evolving techniques to trick individuals into revealing sensitive information. With estimated 90% of successful cyber attacks starting with phishing, robust detection mechanisms are crucial. Large Language Models (LLMs), such as OpenAI's GPT, have revolutionised NLP by using large text corpora to perform tasks beyond text generation. This makes them suitable for detecting phishing emails. This paper presents a promising approach that combines Few-Shot Learning and Retrieval-Augmented Generation (RAG) with open source LLMs to improve the detection of phishing emails. The focus on open source has the advantage that the LLM can be operated in its own isolated network. This can significantly increase the protection of confidential email content.

Related Work
Previous studies have investigated simple classification approaches for phishing detection using LLMs. These studies showed that pre-trained models could detect phishing attempts, but focused primarily on commercial models such as GPT. However, there have also been attempts using open source networks. For example, Koide et al. achieved an accuracy of 88.61% using the open source LLM Llama2.

Methodology
A balanced dataset was created using legitimate emails from the CSDMC Spam Corpus and phishing emails from the Phishing Pot dataset, resulting in 5,800 emails equally divided between phishing and non-phishing. On this basis, several open source LLMs were evaluated, including Phi-3 3.8B (Microsoft Research), OpenChat 7B, Mixtral 8x7B, Mistral 7B, Gemma 7B (Google Deep Mind) and Llama3 (8B and 70B) from Meta AI. As a phishing detection approach, two specific prompts were designed for evaluation. The first prompt followed a persona pattern and instructed the LLM to identify phishing emails. The second prompt included a list of indicators suggesting phishing attempts. The proposed approach extends the context using Few-Shot Learning and RAG. Relevant examples are retrieved from a knowledge base and conditioned with the LLM prior to generation, thereby improving domain-specific performance without additional fine-tuning.

Experiments and Results
The performance metrics considered were precision, recall, F1 score and accuracy, with 121,800 classifications performed across different models and settings. The results demonstrate the variability in model performance of different models, which is influenced by architecture and training data. Larger models, such as Llama3 70B, consistently outperformed smaller counterparts. Prompt 2 improved recognition rates for most models, particularly those close to 50% accuracy with Prompt 1.

Conclusion
This study demonstrates that LLMs can effectively distinguish between legitimate and phishing emails. The proposed approach, combining Few-Shot Learning and RAG, significantly improves detection rates, particularly for smaller models.

Primary authors

Fabian Nicklas (Hochschule Kaiserslautern) Prof. Jan Conrad (Hochschule Kaiserslautern) Nicolas Ventulett (Hochschule Kaiserslautern)

Presentation materials