DS4N6 Blog >> What is the DS4N6 Library (ds4n6_lib)?

What is the DS4N6 Library (ds4n6_lib)?

[22/04/21] April 22, 2021
Jess Garcia - One eSecurity
Twitter: j3ssgarcia - LinkedIn: garciajess

[ For more information visit the ds4n6_lib Home Page ]

In an nutshell…

The ds4n6_lib Library is a python library designed to facilitate the use of Data Science and Machine Learning to the average Forensicator.

Wait a minute! A “python Library”? Ouch… Sounds scary! I'm not a python person!!

Don't be afraid of the “python” part. In reality, you don't need to know (almost) any python to use the library. Better think of it like a Toolkit designed to be run in a Jupyter environment.

For starters, the Data Science community is made of people like you, Subject Matter Experts (SMEs) who are not really programmers. And guess what? Jupyter/pandas (the most popular Data Science combination) is estimated to be used by a community of near 10 million users, mainly SMEs, but also including primary/secondary school students. Go figure! If that's the case, this “Jupyter/pandas” thingy is probably not that difficult, don't you think?

Now, talking specifically about the ds4n6_lib itself, we made it even more simple! Trust me, it really is very easy to use, you will only need to learn less than 10 commands, and many of them are GUI-based! It is designed for Forensicators, not for programmers. Take a look at the screenshots on this blog post to understand what I'm talking about.

Why did you create this library and why should I bother learning/using it?

While DFIR tools (both commercial and free) are obviously very much needed, many of them really awesome, often times (specially in complex, large-scale investigations) you need more flexibility. And when that happens, we have traditionally recurred either to the Linux command line (grep, sed, awk, sort, uniq, etc.), to scripting (e.g. python) or to programs like Excel. These tools give us additional flexibility, but are still limited, not very scalable to the amounts of data we often use and do not work well with some entities we typically use (e.g. timestamps). Additionally creating visualizations is typically complicated too.

The Data Science world naturally solves these problems via a friendly web-based environment (Jupyter) and an intuitive Data Science Library (pandas). There also exist many packages for data visualization (matplotlib, bokeh, seaborn, …).

But more interestingly, once you have your data in this environment then a completely new universe opens, including different types of analysis for your data (e.g. statistical) and, even more interestingly, the use of machine/deep learning for things like anomaly analysis, classification, etc.

Furthermore, some vendors are so convinced that Jupyter/pandas is the way to go, that they have been building in this direction too (as an example, check the huge amount of effort and great work that Elastic has put into Elasticsearch's eland library).

Going down specifically to the DFIR arena, some projects such as Timesketch's picatrix or Velociraptor have enabled their backends to be interacted with from a Jupyter environment.

Ok, cool. I'm sold to this Jupyter/pandas thing. But what is this ds4n6_lib Library about?

The first challenge that you will have if you want to analyze your forensic data in a Jupyter environment is to bring your data (forensic tool output) into it. While it is really easy to load csv files with pandas, it can be quite a lot of work to read all the data and to prepare it properly for analysis. This is where the ds4n6_lib starts getting handy: it will allow you to read and Harmonize your tool output so you can then easily run other analysis types (visualization, analytics, machine learning, etc.) on your forensic data. Currently we support a few popular tools like plaso, kansa, kape, volatility, etc.

Once your tool output has been imported in the Jupyter environment, you will have the ability to analyze the data in a flexible and DFIR-oriented way via different analysis routines, flexible analysis functions (for the UNIX/Linux lovers, we have even mimicked tools like grep or sed). And all of that at blazing speed, since everything is loaded into memory in an extremely efficient format!

Can you give me more details?

Let me summarize from a high level point of view what the whole point of the ds4n6_lib library does.

The idea is the following:

You have some output data from a specific forensic tool (let's say a process list from volatility or a file listing from kape), from one or many machines.
You want to analyze that data in a more flexible (filtering/aggregating the data in different ways, doing visualizations, etc.) and also in a more efficient way (escale up to dozens or hundreds of GBs with fast response).
You know that Jupyter / pandas (and friends) are great for that, they are “kind of” a replacement for the traditional command line tools (e.g. bash) or scripts (e.g. python).

In this context, the ds4n6_lib will allow you to easily do the following:

Read your tool output files (typically csv) to Jupyter/pandas data structures.
Assign the right data type to them (integer, string, datetime, etc.) so you can do easier analysis later.
Organize them in collections that make sense from a forensic point of view and are comfortable to work with (e.g. a collection of dataframes containing all your evtx files).
Harmonize the data, so the resulting structures will be the same regardless of what tool they came from.
Visualize the data in a comfortable/graphical way (via different widgets and GUIs).
Allow you to easily “export” the data from the graphical analysis enviroment (qgrid/aggrid) to pandas DataFrames.
Apply different types of pre-canned data analysis (including machine learning) by simply selecting it from a menu, allowing you to extend these analysis as you wish.

Ok, how can I start using it?

The ds4n6_lib is designed to run in a Jupyter environment, so before using it you need to get a Jupyter environment up and running.

To make things easy for you, we have prepared things (environment, evidence, notebooks, ds4n6_lib, addons) to easily let you try Jupyter and the DS4N6 library in Binder (i.e. Binder is a free Cloud-based Jupyter notebooks service; no registration or account needed).

So, if you just want to “play” with Jupyter Notebooks and the DS4N6 Library to learn how it all works, just follow these simple instructions and you will be playing with the ds4n6_lib in minutes!

You can also create your own machine. It is not difficult to install the base environment with Anaconda, but to be honest, if you want all the bells and whistles that the ds4n6_lib can provide (specially the GUI part) you will need to install some additional plugins (qgrid, aggrid, etc.) which may give you some headaches.

To make your life easier we've been working on the creation of DAISY, a DFIR-oriented Data Science & AI Virtual Machine that you will be able to download. If everything goes as planned, DAISY will be released on May '21.

If you want to know more, visit the ds4n6_lib Home Page