Graph malware classifiers achieve high accuracy on standard benchmarks but suffer significantly under distribution shift when new malware variants emerge. Our research highlights that existing structural features fail to capture the deeper semantic patterns necessary for robust generalization.
We introduce two new benchmarks, MalNet-Tiny-Common and MalNet-Tiny-Distinct, designed to evaluate performance under realistic covariate and domain shifts. We propose a semantic enrichment framework that augments Function Call Graphs (FCGs) with function-level metadata and code embeddings derived from Large Language Models (LLMs).
Our experiments demonstrate that this approach improves classification performance by up to 14.2% under distribution shift and enhances the robustness of adaptation-based methods.
Our framework integrates semantic signals from function metadata and LLM-based embeddings into the graph structure. Please consult our README.md files for detailed information. Below is an overview of the proposed pipeline of our work.
The dataset curation process involved several stages to ensure high-quality, semantically enriched benchmarks:
The pre-processed attributed graphs (including all semantic embeddings) will be released as downloadable files. These files allow researchers to train and evaluate models without performing the expensive LLM inference step themselves. Due to the large size of the dataset, we provide links to download the datasets across multiple platforms — please download all below links and merge them in the datasets directory.
You can download the datasets from the following links:
We provide our code to construct our dataset from any APKs, ensuring compliance with AndroZoo's redistribution policies.
Install the necessary dependencies:
pip install -r requirements.txt
Start the inference server to handle code embedding requests (supports various backends):
# Example: Start server on port 8080
python llm_inference_server_cxe.py --port 8080
Run the construction script to process APKs into attributed graphs:
python create_graph.py --apk_dir ./path/to/apks --n_jobs 8 --port 8080
The splits directory contains the definitions for MalNet-Tiny-Common and MalNet-Tiny-Distinct.
The training directory contains the Exphormer-based model implementation and evaluation scripts.
Organize your datasets as follows in the datasets/ folder:
datasets/
├── [dataset_name]/
│ ├── raw/
│ │ ├── malnet-graph-tiny/ (Graph structures)
│ │ └── split_info_tiny/ (Train/Val/Test splits)
│ └── processed/ (Generated automatically)
If you download the dataset from our repository to the processed directory, you can skip the graph construction step.
To reproduce the results, use the provided configuration files:
python main.py --cfg config_file.yaml
@misc{tran2025mitigatingdistributionshiftgraphbased,
title={Mitigating Distribution Shift in Graph-Based Android Malware Classification via Function Metadata and LLM Embeddings},
author={Ngoc N. Tran and Anwar Said and Waseem Abbas and Tyler Derr and Xenofon D. Koutsoukos},
year={2025},
eprint={2508.06734},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2508.06734},
}