Sunday, July 4, 2021

File Scanning Project

 

Sometimes few things in technology makes more sense when they work end to end rather than working as an individual piece. So that we can come to know how things are stitched together.

 

I am going to talk about one small project I have created recently. What is does, it scans a file from a directory with one of the 3rd party antivirus API (OPSWAT). After scanning the file, if file is not infected then it stores in a different directory (success file blob) or if file is corrupted then file is move to an infected file directory so that in doesn’t infect another existing file.


Prerequisite:

 

Create a free/paid account on OPSWAT:

Once you create an account on OPSWAT, you will receive an API key to call antivirus API.



 

I will be using 2 API for scanning:

i)       Scanning a file by file upload https://onlinehelp.opswat.com/mdcloud/2.1_Scanning_a_file_by_file_upload.html

ii)     Retrieving scan reports using data ID https://onlinehelp.opswat.com/mdcloud/2.2_Retrieving_scan_reports_using_data_ID.html

This is the 2-step process call to get the scan report. In 1st step, a file will be uploaded and OPSWAT will return an asynchronous response with data id (file upload and scanning are in progress). In 2nd call we can fetch the scanning report by the data id which is returned in 1st call.

 

1st call example:

Request:


curl -X POST https://api.metadefender.com/v4/file \

-H 'apikey: ${APIKEY}' \

-H 'content-type: application/octet-stream' \

-d @/path/to/data.file



 

Response:



{

"data_id""bzIwMDExN1NreHhiOG9RSmJVU2tXeFdJc1FKWjg",

"status""inqueue",

"in_queue"1,

"queue_priority""normal",

"sha1""068AE4D07A7F4FE2BF955CBA0FD05AB0A5A8A6FE",

"sha256""67C6BCEE6FFCEFA887E415CBF0247C2788696169B58EB66319F558DDB6822D9D"

}



 

2nd call example:

Request:

curl -X GET \

https://api.metadefender.com/v4/file/ZTE2MTIyNkhKeGs5WElSNHhIMVFGLVlUYk85LP \

-H "apikey: ${APIKEY}" \


Response:

{

"scan_result_history_length"8,

"file_id""bzE5MDIxMkJ5RXEtWmdIRQ",

"data_id""bzE5MDIxMkJ5RXEtWmdIRXJrR2VlNmhNWUg0",

"sanitized": {

"result""Allowed",

"progress_percentage"100,

"data_id""ZDE5MDIyMEJ5RXEtWmdIRS5zYW5pdGl6ZWRCeWx1a2Fic1NF",

"reason"""

},

"process_info": {

"result""Blocked",

"profile""Sanitize",

"post_processing": {

"copy_move_destination""",

"converted_to""xls",

"converted_destination""Ft._immediata_group_7893_2019_02_sanitized_by_OPSWAT_MetaDefender_779e0a0966f348fcaecdacc4f6c47e16.xls",

"actions_ran""Sanitized",

--

--

--

},

"rescan_available"true,

"scan_all_result_i"1,

"start_time""2019-02-20T17:22:27.058Z",

"total_time"1166,

"total_avs"37,

"total_detected_avs"17,

"progress_percentage"100,

"scan_all_result_a""Infected"

--

--

"share_file"1,

"rest_version""4",

"additional_info": [],

"votes": {

"up"0,

"down"0

}

}



 

Create a free/paid account on Azure portal

You can create a free account or pay as you go if you don’t have any from https://portal.azure.com/

 

Integration Solution Resources:

To achieve this solution, we need to create below resources:

·       A storage account with the 3 blob containers

o   toscan

o   scanned-success-files

o   infected-files



 

·       A Keyvault – to store OPSWAT API Key

 



 

·       A Service bus namespance with a queue



 

Integration Solution:




Let’s follow step by step with above diagram to understand how things are connected and working.

 

1)     Put any file to scan on toscan container

2)     As soon as file is uploaded on toscan container, it will trigger “scanfile” logic app (looks up add/update blob in every 10 sec) to process file for scanning

3)     scanfile logic app will get the OPSWAT API key from keyvault and make a http call to OPSWAT API for uploading and scanning blob file, which we placed for scanning

4)     OPSWAT API will receive the request and send a asynchronous response

5)     As soon as we receive response from OPSWAT with dataid, same response message will be pushed to Service Bus queue to process further

6)     Once message is received in Service Bus queue, it will trigger another Logic app (looks p message in every 10 sec) called “getfilescanreport”

7)     This getfilescanreport logic app make another 2nd call to OPSWAT to receive scan report with the dataid, received in 1st OPSWAT API call.

8)     OPSWAT will receive 2nd request and provide the scan report in http response

9)     Getfilescanreport logic app will assess the response and make decision to store file in relevant folder

10)  If file is infected, then it will store the file in infected-files folder else scanned-succes-files and in last step it will delete the file from “toscan” folder as the process is completed

 

 

Mostly thing in above steps are very straight forward but let’s investigate the logic app where I have made connection and written business logic.

 

 

scanfile Logic App:


 


getfilescanreport Logic APP:




Running Example:

Upload a File



File will be moved to scanned-success-file folder as this is not infected file



No comments:

Post a Comment