Comparison of php scanners
Hi there!
I have recently looked different compromised websites on github, mostly using outdated Wordpress/Joomla/Drupal versions. In these cases, I often have to go through many different files to find the malicious one, whether added on the website or added to legitimate files. Here is a short summary of the different tools to detect them.
ClamAV#
ClamAV is an open-source antivirus developed for different platform, but also one of the seldom antivirus used on Linux. It has a complex signature format with many open signature provided by the community (more than 3 700 000 according to wikipedia). And the good news, is that it supports Yara since last year!
It is possible to convert some of the clamav signature to Yara thanks to the nice script published in the Malware Analyst Cookbook by Michael Ligh, Steven Adair, Blake Hartstein, Matthew Richard :
sigtool -u main.cvd
python clamav_to_yara.py -f main.ndb -o main.yara
Clamav has several signatures related to php backdoors, like this nice one in daily.ldb (logical signatures in .ldb files are not converted by the previous script):
Php.Malware.Mailbot-45;Engine:51-255,Target:7;0&1;6563686F207068705F6F732E{-35}275D2830393837363534333231292E;6563686F207068705F6F732E{-35}275D2832323232323232323232292E
Shortly explained, this signature is a logical clamav signature with a format NAME;INFOS;CONDITION;PATTERN1;PATTERN2…
- The infos contains information on the clamav engine needed to read this signature and the type of target file (7 is ASCII file)
- The condition here is 0&1 so both patterns should be present for the signature to match.
- Patterns are in hexadecimal format and {-XX} means that at most XX characters are not considered in the pattern. Apparently signatures on ascii files are case insensitive.
So we can convert this signature to the following yara signature
rule php_malware_mailbot_45 {
strings:
$a = /echo php_os\..{,35}'\]\(0987654321\)\./ nocase
$b = /echo php_os\..{,35}'\]\(2222222222\)\./ nocase
condition:
all of them
}
linux-malware-detect#
Linux Malware Detect uses signatures extracted from Clamav and other tools to detect malware, mainly based on md5 hashes of malicious files (and thus not very reliable). The tool was not updated since 2013 apparently, and according to the documentation it contained 8,908 MD5 / 1,914 signatures.
php-malware-finder#
php-malware-finder is a tool developed by NBS System based on a simple shell script which rely mainly on Yara signatures to detect malicious files.
This tool has some nice rules for instance for detecting obfuscated php:
rule ObfuscatedPhp
{
strings:
$eval = /(<\?php|[;{}])[ \t]*@?(eval|preg_replace|system|assert|passthru|(pcntl_)?exec|win_shell_execute|call_user_func(_array)?)\s*\(/ nocase // ;eval( <- this is dodgy
$b374k = "'ev'.'al'"
$align = /(\$\w+=[^;]*)*;\$\w+=@?\$\w+\(/ //b374k
$weevely3 = /\$\w=\$[a-zA-Z]\('',\$\w\);\$\w\(\);/ // weevely3 launcher
$c99_launcher = /;\$\w+\(\$\w+(,\s?\$\w+)+\);/ // http://bartblaze.blogspot.fr/2015/03/c99shell-not-dead.html
$variable_variable = /\${\$[0-9a-zA-z]+}/
$too_many_chr = /(chr\([\d]+\)\.){5}/ // concatenation of more than two `chr()`
$concat = /(\$[^\n\r]+\.){5}/ // concatenation of more than 5 words
$var_as_func = /\$_(GET|POST|COOKIE|REQUEST)\s*\[[^\]]+\]\s*\(/
$gif = /^GIF89/
condition:
any of them and not IsWhitelisted
}
The problem is that there is only generic methods (nothing designed for specific sample) so it misses a lot of samples (many basic webshells for instance), and some methods (like Dodgy php) are generating a lot of false positive.
php-malware-scanner#
php-malware-scanner has been developed by the french company planet-work likely to clean their own hosted websites and the result is pretty good.
Their idea was too make a score with different parameters often identified in malicious php files, here is a short list as example :
- MANY_GLOBALS : Contains $GLOBALS many times (+20)
- MD5_VAR : contains a MD5 variable (+2)
- VERY_LONG_LINE_EARLY : the file has a first ver long line (+10)
- HAS_BASE64DECODE : has a function base64_decode() or str_rot13()
This approach is nice, but the problem is that the code is real mess with a long list of if/else. Have a look:
if re.compile('.*=\s*"http://[a-z0-9].*";').match(l) or re.compile(".*=\s*'http:.*';").match(l) :
if not 'simpletest.org' in l and not 'facebook.com' in l:
has_var_http = True
if has_var_http and ('curl_exec' in l or 'xxxxxxxxxx' in l) and line_num < 20:
score.append(('CURL_HTTP' ,''))
if line_num < line_early and 'call_user_func' in l:
score.append(('HAS_CALL_FUNC_EARLY','line %i' % line_num))
if 'agent' in l.lower() and 'google' in l.lower():
score.append(('UA_GOOGLE',''))
And even though they have added few signature directly in the code (hurrr), there are really few of them and the tool is often missing easy to detect samples. The following basic shell added to a legitimate php file would go undetected for instance:
<?php @preg_replace('/(.*)/e', @$_POST['abcdef'], '');
Here is an example of result by this tool (you should add a –minscore option, 10 seems to be a good value):
{
"score": 15,
"filename": "/home/etienne/perso/fun/new-caffe/./public/wp-includes/images/crystal/epsg8vpeff.php",
"cleanup": false,
"details": [
{
"score": 5,
"details": "",
"rule": "UA_GOOGLE",
"description": "V\u00e9rifie le User-Agent contre Google"
},
{
"score": 10,
"details": "line 1",
"rule": "VERY_LONG_LINE_EARLY",
"description": "Contient une ligne de plus de 3000 caract\u00e8res en d\u00e9but de fichier"
},
{
"score": 0,
"details": "1 lines",
"rule": "FEW_LINES",
"description": "Contient peu de lignes"
}
],
"mtime": 1463402470.307976,
"ctime": 1463402470.307976
}
Comparison#
I have tested these tools on a wordpress website which has 38 different malicious files (mainly simple backdoors but also mass mailers or other malicious files).
Here are the results:
Tool | True Positive | False Positive |
---|---|---|
Linux Malware Detect | 4 / 39 | 0 |
ClamAV (default) | 4 / 39 | 0 |
php-malware-finder | 28 / 39 | 126 |
php-malware-scanner (minscore 10) | 10 / 39 | 0 |
So nothing is perfect, likely something to do to improve these results.
Hasta Luego!