Hey CyberGeeks! Welcome to another episode of Can we scrape it? In this series I will scrape the most well-known malware repository and perhaps one of the most go-to sites of you geeks out there! Virustotal.
Virustotal is just one of the many, or even one of the first application to give you details about a certain binary. It has become the standard of intelligence where some get their infos, and even some other security tools depend their findings with this web app.
Indeed, VT is the best online virus scanner and threat intel out there. But what makes it the go-to site?
a. Flexibility - VT can scan not only suspected malwares, but also provide you informations on IP address, websites, email attachments and file hashes.
b. Universal - VT supports several antimalware engines that what distinguish if a certain input is suspicous on a certain engine or not. With this comes the well-known 'detection ratio' that will be a standard if a certain findings is a threat or not.
c. Community-driven - Due to its popularity VT has established a role on cybersec community. Thus, they put the "people" into a single arc, where they share what they know about a certain findings and also provide feedback on a detection.
e. Detail-oriented - To the malware's basic details(hash algorithm,file type,file size, etc), to upload history,to possible file names, to a signature verification,to where it can be found on host, to PE imports and DLL launched,to possible malware behaviors, and to file metadata, - VT has done a tremendous job of providing the complete details about it.
f. Standard of all infos - With all those said, this web app has become the standard of threat reputation on this community.
Not all application are perfect, thus they all have flaws. And these are the following...
a. Task-y : One file/IP/URL can only be scan at a time.
b. No unlisized upload - File upload can only be until 32MB
c. Can it be scraped? Now let's find it out!
Can we?
Starting out with the basic. Here's the fresh site, we look unto the form tag.
We have input id = searchInput, which is nice!
Then we input on our sample specimen's hash.
We then examine the post request that processed the execution. To get informations that we could apply on our code we look at the POST request particularly the highlighted details below.
With the gathered information from the POST method, we then apply it to our session request code below.
We then open the resulting web server response, output.txt..
And it seems that we're in! Above is the html response we get, and with that we can parse everything from the result.
To make it all simple, we just need the hash info, file name , detection ratio and the analysis date. Let's try to have an input file that would read all the hashes and automatically input such on VT. Let's say we have inputs below
All that we need lies on div class = "span8 columns". Supplementing the code above, we parsed the parameters we need given that all of it lies on the div class mentioned.
We then process execution, and we have the result below..
So yeah! Doing all those, we conclude that VT can be scraped. Now, is it a cons or a pro? We have our answer on 3, but a little more of a neutral. VT is an open source site that we get threat infos on, what we did is just an automation. A cool stuff! =)
That will be all for this episode. And we did scraped VT. So VT is checked!
Tune in for another scraping series, Au revoir !
Comments