Having spent a couple of months recovering from an illness (not corona), I’ve had plenty of time to look over the latest release of this unusual product. I have been using Infocyte for several years both in my lab and at a client site. Although what it does is not new – there are other tools that gather forensic data from individual devices over the network, none are as comprehensive, accurate or unobtrusive as this one. It focuses on files so it might be thought of as a very advanced antimalware tool but it really is a lot more than that.
I have explained the various aspects of artificial intelligence in the security field elsewhere but for the sake of analysis of Infocyte here we go again – a bit more briefly this time. For our purposes I focus on the key aspect of AI: machine learning (ML). There are three primary types with which we are concerned: supervised, unsupervised and reinforcement learning. Data scientists have described at least eleven more types among which is transductive learning which describes pretty well what is going on the Infocyte ML engine.
Supervised learning occurs when the ML algorithms use an existing dataset as its training set. Unsupervised learning occurs when the algorithms have no pre-existing dataset but must create their own from observation based upon a model and application of their learning algorithms. Reinforcement learning does not start with a training set. It must derive its own much like unsupervised learning but in the case of reinforcement learning the system must learn what to do with the data. Transductive learning is much like supervised learning but depends upon statistical algorithms used to predict results given a statistical algorithm and a large number of examples from which to work. You will see how this impacts Infocyte shortly.
Our last piece of background knowledge comes from the core formula that defines machine learning. As you will see it is very simple. However, there is an aspect that makes that apparent simplicity deceptive. The formula is:
Y=f(X)
…where Y is a dependent variable that we are seeking, X is the independent variable that we are given and f is a function that balances the equation. Sometimes we cannot balance the equation easily so we may add an error (e):
Y=f(X) + e
to balance the equation.
This is a simple equation that we probably learned early in high school algebra. The point is that we know neither Y nor f. Y always will be the outcome of the equation while f certainly will change as X changes. X will come, in our example, from the collected data in Infocyte and f will be the algorithm(s) developed by Infocyte. These algorithms are statistical in most cases, largely Bayesian. Bayes’ Theorem is predictive in that is predicts future outcomes based upon past observations. Now, on to Infocyte.
Infocyte, at its core, is an over-the-network device forensics tool. It uses agents on various devices to collect and report forensic data. Those agents can be placed permanently by the administrator or can be generated at the time of analysis. After use those agents are self dissolving. In either case they are very lightweight and do not interfere with the efficiency of the network or the device under test. In its latest incarnation it not only reports but can alert in near real-time based upon the frequency of testing.
Infocyte allows the administrator to build up domains based upon the organization of the enterprise and then test those domains individually. That allows staggered testing, one or more domains at a time, reducing any possible impact on a very large, distributed enterprise. Infocyte’s analysis is broken up into two groups: threats, and vulnerabilities and risks. These are neatly summarized on a dashboard. See Figure 1.
Figure 1 – Infocyte Dashboard
There is a well-organized top pull-down menu that addresses alerts, discover, analyze and report. Each of these takes you to a new section. we begin at the beginning: discover. I have set up a simple four-device network on the domain, CDFS-Honey. This is my honeynet and I have limited the active devices for simplicity. There are four hosts: deb, maeconsole(1), drstephenson-pc and dc01centerd1.local. Of these, two were accessible and two were not at the last discovery scan. See Figure 2.
Figure 2 – Infocyte Discovery Scan on CDFS-Honey
Our second stop is, logically, analyze. After running analyze against our target group we ge result that are extremely comprehensive. See Figure 3.
Figure 3 – Analyze Results for CDFS-Honey
As you can see there is a lot of information here. Some of the more important forensically are autostarts, connections, memory and processes. These are things in which any forensic analyst would be interested. Don’t discount the rest, of course. Drilling down we can get more detail. We’ll drill down to host dc01.centerd1.local. dc01 is a domain controler. See Figure 4.
Figure 4 – Connections for dc01.centerd1.local
There are two types of connections shown here. The first is to 3.229.46.33 : 443. This is the connection Infocyte. The second is all zeros. This is an internal connection. You can examine these connection to determine their importance and relevancy to your forensic exam of the overall device. Let’s take one more look at dc01.centerd1.local, running processes this time. See Figure 5.
Figure 5 – dc01.centerd1.local Running Processes
Note that these were the processes running at the time of Infocyte’s snapshot of the device. We see that armsvc.exe is labeled “suspicious”. We can drill down on that file and see what Infocyte found. See Figure 6.
Figure 6 – Details of Running Process armsvc.exe
There is a lot of useful information here to help answer the question, “is this file malicious?”. Starting on the top left, there is a process score of 2. We then see that it does have a signature but (drilling down to score details) that is is deemed suspicious based upon its behavior. Threat intelligence views how other programs – typically antimalware programs – view the file. Only one, Webroot (view all) thinks that it is malicious.
The synapse score is the degree of maliciousness or risk based upon features and characteristics as analyzed by Infocyte, It ranges from lower than -1 (bad) to higher than +1 (good). Synapse here is 1.31, probably good. Entropy – an important part of Infocyte’s analysis – is 6.415. Any entropy score higher than 7.2 indicates that the file likely uses a cryptor or a packer. This is a pretty sure indication of malware. The scores range from 0 to 8.
Figure 7 – Partial Trace of Activity on dc01.centerd1.local
There is a lot of information that a forensic analyst needs and this information is very tedious to extract. On a client site I had several years experience doing manual forensics on the client’s servers. On average it took a day to image the server and several days up to weeks to analyze it – a single server. I ran Infocyte on the same networks, found a couple of things that needed attention, and finished the entire network (about 700 servers and 400 endpoints) in under eight hours including about an hour for my analysis of the results. I saw things that I could not have seen without performing memory forensics on the devices which can be tedious and lengthy. If for no other reason – and there are plenty – the efficiency of Infocyte makes it well-worth having. If you add the detail available, the multitude of analytical functionalities thanks to its machine learning, its ease of use and minimal impact on devices and networks, you have the leader in over-the-network forensic tools.
OPINION
As you can see, there is a lot to learn about a device using Infocyte. Having used Infocyte as well as other similar tools in production I can be confident that Infocyte is unquestionably best-of-breed. The application of Bayesian-based machine learning, the convenient layout of the user interface and the comprehensive analysis (replicated in the reports that you can generate emphasizing the functionality audience you want such as by threat, by host or by vulnerability) makes this a first-rate tool. If we add vulnerabilities and the unique activity analysis that shows step-be-step the activity behavior of any particular host (one of my favorite features – see Figure 7) we have what I consider the ultimate (for now, anyway, – obviously Infocyte has improvements and upgrades in the works) over-the-network device forensics tool.
Infocyte is highly recommended.
P. R. Stephenson, PhD, CISSP (ret)
Publisher of Future inTense
Exclusively at Cyber Defense Magazine