This website uses its own and third party cookies to collect information that helps to optimize your visit to their web pages. Cookies will not be used to collect personal information. You can either allow or reject their use. You can also change their settings at any time. You will find more information on our Cookie Policy page.


DS4N6 Blog >> Graph Machine Learning for DFIR with CHRYSALIS

Graph Machine Learning for DFIR with CHRYSALIS

Photo by PIRO4D on Pixabay

On April 24th, we presented the new version of the ds4n6_lib library (AKA CHRYSALIS) at the RSA23 conference in San Francisco. In this release, we added a new graph Machine Learning (ML) module to analyze DFIR data through graphs. In this post, you will learn how to apply all these new features in your forensic investigations, so grab your data, download ds4n6_lib v0.8, and let's get started!

Graph analysis is gaining popularity in the cybersecurity community. With the rise of sophisticated attacks and more professionalized actors, new advanced analysis techniques are essential. As you can check in our post Graphs for DFIR Analysis. In the Roadmap, cybersecurity analysts find this technology a valuable ally for detecting the stealthiest actors. The graphs show a holistic view of the network that is very useful for detecting Lateral Movement (LM) at scale.

The new version of CHRYSALIS has come to facilitate all these tasks for analysts. With the new module mlgraph, we can order our forensic artifacts in a graph and apply advanced Machine Learning algorithms to detect suspicious behaviors in the network.

Thinking in Graphs

We can find a lot of success histories about the use of AI algorithms to analyze DFIR data. Examples include works such as Me, My Adversary & AI: Investigating and Hunting with Machine Learning or CHRYSALIS: Age of the AI-Enhanced Threat Hunters & Forensicators. But, What if we interpret our data as graphs? In this case, we would no longer have isolated data, but a series of interconnected events.

For thinking in graphs, the first step will be to represent our data as a graph. CHRYSALIS simplifies this task by providing us with the new function build_lm_dataset(). This function transforms an event logon dataset into a graph by extracting two matrices, the feature and the adjacency matrix. Then, it ties all user sessions using logon events. The function will output a series of sub-graphs from the main graph. Such sub-graphs represent all the lateral movements made by each user in the network. Finally, the function will provide a CSV file with the information of each sub-graph of the network, i.e. a dataset of lateral movements. Using this function is quite simple. Next you can find its main attributes:

Argument Default value Description
dset None Path of the CSV file to read. Min. Required columns: ['time', 'event_id', 'source_name', 'host_ip', 'source_hostname', 'logon_type', 'remote_user']
mode 'hostname' Build mode. 'hostname' to create lateral movement dataset only by using known hostnames. 'ip_addr' to use the IP address for unknown hostnames.
path '/' Path to store the lateral movement datasets.

Below are a couple of examples of the use of this function.

build_lm_dataset(dset=logons.csv, mode=’hostname’, path=’./’)


time user path
2023-01-01 user001 [‘host010’, ’host011’, ‘host100’]
2023-01-01 user005 [‘host005’, ’host011’]
2023-01-02 user002 [‘host001’, ’host011’, ‘host100’, ‘host102’]
build_lm_dataset(dset=logons.csv, mode=’ip_addr’, path=’./’)


time user path
2023-01-01 user001 [‘host010’, ’host011’, ‘host100’]
2023-01-01 user005 ['', ‘host005’, ’host011’]
2023-01-02 user002 [‘host001’, ’host011’, ‘host100’, ‘host102’]

Note that with the ‘ip_addr’ mode, the algorithm identifies an additional LM from an unknown hostname with IP address to host005.

Graph Machine Learning

As covered in the Graphs for DFIR Analysis. The Roadmap post, the visualization of lateral movement telemetry with graphs helps in detecting anomalies at scale. However, even by applying these techniques, the investigation can be overwhelming due to the large amount of data we may have in enterprise networks. Somehow, we need to filter the information (automatically) to focus on the most suspicious artifacts. Fortunately, Data Science has the solution. AI and ML models can process large datasets and extract high-value information. Specifically, the sub-field Graph-ML can learn hidden features related to different data types and events in a network. CHRYSALIS incorporates these interesting functions in its new version.

The find_lm_anomalies() CHRYSALIS's function uses the power of ML to find suspicious lateral movements that could belong to a stealth actor moving through the network. With this function, the user will automatically train an Autoencoder, a deep neural network model. Next, it will compute the reconstruction error of each sub-graph in the network (lateral movements). Finally, the function will output a ranking with the most suspicious lateral movements in the input dataset as you can see in the next figure. The use of this function is quite simple. You only need an LM dataset and select a few parameters. Easy right? You do not need to be an expert in AI or ML modeling. CHRYSALIS does it for you.

In the following table you can find the most important attributes of the find_lm_anomalies() function:

Argument Default value Description
lm_dataset None Lateral movement dataset. See build_lm_dataset() function to generate the LM dataset.
model None Machine Learning algorithm to use. Supported models: [ 's2s_lstm', 'transformer' ].
from_date None Init date for the training dataset.
to_date None End date for the training dataset.
top_n 50 The number of anomalies to detect.
neo4j True True' to export the output to Neo4j format. 'False' otherwise.
path '/' Path to store the neo4j output datasets.

Below you can find an example of the use of this function.

find_lm_anomalies(lm_dset, mode=’transformer’, from_date=2023-01-01, to_date=2023-12-31, top_n=3, neo4j=True, path=’./’)


TOP-3 Anomalies
1 ) Error=0.99
Date: 2023-02-09
User: user008
Lateral Movement: ['host045', 'host029', 'host021']
2 ) Error=0.97
Date: 2023-02-04
User: user012
Lateral Movement: ['host012', 'host008', 'host001']
3 ) Error=0.86
Date: 2023-02-06
User: user024
Lateral Movement: ['host001', 'host010', 'host011', 'host100']

In addition to showing the results inline, the function saves the model's output in a CSV file. For further analysis, we could load the suspicious user activity data into any graph visualization tool such as Neo4j or Networkx. Visit our Graphs for DFIR Analysis. The Roadmap post to learn how to use graph tools in your forensic investigation.

More examples and real use cases of the new CHRYSALIS functions are available in the demos presented at the RSA conference 2023 by Jess Garcia in the talk Hunting Stealth Adversaries with Graphs & AI.

If you are not familiar with what the DS4N6 Library is or how it can help you, please check the blog post What is the DS4N6 Library (ds4n6_lib)? In the blog, you will find several articles and hands-on examples to use the power of DS and AI in your forensic investigations.

May the ds4n6 be with you!