Android Malware Performance Counter Data

Download

For a download link, please send email to the CASTL lab. Include you name and affiliation in the email.

Citation

If you use these data in a publication, please cite our ISCA paper:

@inproceedings{isca13_demme,
 author = {Demme, John and Maycock, Matthew and Schmitz, Jared and Tang, Adrian and Waksman, Adam and Sethumadhavan, Simha and Stolfo, Salvatore},
 title = {On the Feasibility of Online Malware Detection with Performance Counters},
 booktitle = {Proceedings of the 40th annual international symposium on Computer architecture},
 series = {ISCA '13},
 year = {2013},
 location = {Tel-Aviv, Israel},
 publisher = {ACM},
 address = {New York, NY, USA},
}

Data Format

The data tar.bz2 includes many, many files. The actual data are contained in the traces/*.bin files, one per thread. Additionally, there are a number of CSVs in the root directory. Four of them ({ham,malware}_{test,train}.csv) are indexes into the traces. Each line is in the following format:

<pid>,<command line args | java package>,<executable path>,<trace filename>

For each thread, there is an entry in one of those indexes. Additionally, package_map.csv provides a mapping from java package name to the name of the malware it came from.

Each trace file is an array of samples. Each sample is in the following format:

struct Sample {
    uint32_t cycles;
    uint32_t counts[6];
}

The cycle field is the actual number of clock cycles over which the events were monitored. (The sampling module shoots for sampling every N cycles and records the actual N in this field.) The 'counts' field are the counts for all of ARM's 6 event counters. There are also special samples with all fields set to zero. They are inserted into the array in places where a context swap was detected. If you want to neglect context swaps, simply skip over them.

Additionally, each trace contains several strings at the beginning containing some metadata.

In these data, the following events were configured: 0x06,0x07, 0x0C, 0x0D, 0x0F, and 0x12 (in that order). Those events correspond to the following, taken from "ARM DDI 0406B_errata_2011_Q3 (ID120611)", the "ARM Architecture Reference Manual, Arm v7-A and ARMv7-R edition, Errata markup":

Example reader

To ease use, the following is a short C program which demonstrates one way to read the trace files:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>

struct Sample {
    uint32_t cycles;
    uint32_t counts[6];
};

int main(int argc, const char** argv) {
    const char* filename = argv[1];

    int fd = open(filename, O_RDONLY);
    if (fd < 0) {
            perror("Error opening file: ");
            fprintf(stderr, "Could not open data file: %s", filename);
            exit(-1);
    }

    struct stat fileInfo;
    int rc = fstat(fd, &fileInfo);
    if (rc != 0) {
            perror("Error stat'ing file: ");
            fprintf(stderr, "Could stat data file: %s", filename);
            exit(-1);
    }

    long fileSize = fileInfo.st_size;

    void* mmapArea = mmap(NULL, fileSize, PROT_READ, MAP_SHARED, fd, 0);
    if (mmapArea == MAP_FAILED) {
            perror("Error mmaping file: ");
            fprintf(stderr, "Could not mmap data file: %s", filename);
            exit(-1);
    }

    printf("%s has %ld bytes\n", filename, fileSize);

    size_t n;
    ssize_t sizeLeft = fileSize;
    const char *dummy = (const char*)mmapArea;
    n = strnlen(dummy, sizeLeft) + 1;
    sizeLeft -= n;
    assert( sizeLeft > 0 );

    const char* cmdlineW = &dummy[n];
    n = strnlen(cmdlineW, sizeLeft) + 1;
    sizeLeft -= n;
    assert( sizeLeft > 0 );
    printf("Package: %s\n", cmdlineW);

    const char* exeW = &cmdlineW[n];
    n = strnlen(exeW, sizeLeft) + 1;
    sizeLeft -= n;
    assert( sizeLeft > 0 );
    printf("Exec: %s\n", exeW);

    const struct Sample* points = (const struct Sample*)(&exeW[n]);
    const size_t numPoints = sizeLeft / sizeof(struct Sample);
    if ((sizeLeft % sizeof(struct Sample)) != 0) {
            fprintf(stderr, "Warning: body is not a multiple of SamplePoint "
                    "size(%lu)\n%s\n",
                    sizeof(struct Sample), filename);
    }

    unsigned i;
    for (i=0; i<numPoints; i++) {
        struct Sample s = points[i];
        printf("%08u: %08u, %08u, %08u, %08u, %08u, %08u\n",
               s.cycles,
               s.counts[0], s.counts[1], s.counts[2],
               s.counts[3], s.counts[4], s.counts[5]);
    }

    close(fd);
    fd = 0;

    return 0;
}