"Vircing" the InVircible: 6. The Automatic Scan String Extractor (IVX).

6. The Automatic Scan String Extractor (IVX).

As a sorry excuse for its horribly bad known-virus scanner,
InVircible provides an automatic scan string extractor, called with
the usual buzzword term "hyper-correlator". The idea behind it is
that it is supposed to be given an infected file and a set of
suspected files in a particular subdirectory tree. It examines the
bytes near the entry point of the infected file and attempts to
determine which other of the suspected files look like it and are
therefore infected by the same virus. In this way it is supposed to
scan for new viruses - or for known viruses that are simply unknown
to the virus-specific detector (IVSCAN).

First of all, this idea is definitely not new. One of the first
anti-virus products to use it was Victor Charlie. Similar
capabilities are available in the registered version of TBAV. The
anti-virus researchers at IBM's High Integrity Lab at the T. J.
Watson research center use a similar idea but on a much more
scientific basis to speed up their work in picking good scan
strings for the new viruses that their scanner has to handle
([Autoextract]). However, they are sensible enough to understand
that, even scientifically much more developed than the approach in
InVircible, their method is too unreliable to be given in the hands
of the general user and have it used it only internally by their
anti-virus experts who know what they are doing. Finally, within
CARO we are often using a similar method of computing the ratio of
common substrings to determine whether two viruses belong to the
same virus family.

Secondly, this idea is seriously flawed. It assumes that the virus
that has infected the file always receives control at its entry
point. In many cases (e.g., Omud, Lucretia, and so on), this is not
the case.

Finally, this method fails miserably when it encounters a
polymorphic virus. To prove this, we took a file infected by a the
One_Half.3544 virus (a polymorphic virus known to be in the wild)
and instructed IVX to look for other infected files in a directory
containing a couple of dozens of them. Using the default
correlation factor of 20%, not even a single infected file was
detected.

While we did expect a bad performance (One_Half is polymorphic),
we were surprised that it was so bad. For instance, using MtE- or
TPE-based polymorphic viruses (which are much more polymorphic than
One_Half), the program succeeded to detect some (but not all) of the
infected files with a correlation factor of 30-40%. However, it
also gave one false positive when run on a directory containing some
perfectly innocent files.

Other viruses whose polymorphic mechanism seemed to defeat IVX
even better than MtE or TPE were Neuroquila and Tremor - again to
our surprise, because, while polymorphic, they are not as much
polymorphic as the MtE- or TPE-based viruses.

Anyway, it is obvious that IVX can be easily defeated and should
not be relied upon.

However, besides being useless in detecting polymorphic viruses,
IVX can often cause dangerous false positives. To demonstrate this,
we performed the following tests.

First, we used a prepending parasitic virus written in a
high-level language and spreading in LZEXE-compressed form -
HLLP.3680. We instructed IVX to use one infected file as a sample
and to look for other infections in the directory where
InVircible's programs were installed. The program reported that
almost all programs in that directory (except IVHELP.EXE) seemed to
contain the same virus - with a confidence of 76%! Needless to say,
neither of them was actually infected. They were just compressed
with the same executable file compressor (LZEXE 0.91), as the virus.

Next, we used a companion virus written in a high-level language -
HLLC.Globe.7705; a virus which spreads in PKLited form. We gave a
sample of this virus to IVX to examine and instructed it to search
the DOS directory for other infections. We are using DR-DOS 6.0,
and most files in the DOS directory are compressed with PKLite;
just like the virus. According to our expectations, IVX reported
most of them as "infected" again - but this time the "confidence
factor" often reached 100%!

We definitely do not think that in a real-life situation the
average user will be able to figure out that some of the files are
infected, while others are perfectly clean and just compressed with
the same file compression utility. The level of both false
positives and false negatives is simply unacceptably high and makes
IVX too unreliable and unsuitable for virus detection purposes.