AIDFIR / DS4N6 Blog >> DS4N6 - The Road So Far... - Part I

DS4N6 - The Road So Far... - Part I

The Genesis of a Community Initiative and an Open Source Project

[04/05/21] May 04, 2021
Jess Garcia - One eSecurity
Twitter: j3ssgarcia - LinkedIn: garciajess

[This is Part I of a 2-part blog post series. Please Click Here for Part II of this post]

Yes, I know. You haven't heard from us in a long long time. Actually, to be fair, the activity of the DS4N6 Project, since it started back in July last year has been negligible. Why is that? What has happened during all these months? Did the project die even before it started?

The answer is no, the project is very much alive and in good health. And proof of that is the release of the first “real” version of the ds4n6_lib library this week (the previous version was a stand-alone file with 1.400 lines of code; this one is made of 25 python modules and over 10.000 lines).

I was going to write a blog post explaining what has happened during these last 9 months, the sub-projects we are currently working on, etc., but then I thought I would expand the post to explain how it all started. The idea is that maybe it can serve as an example (and maybe inspiration) for others who want to start an innovative Community project like this.

How it all started

About year ago, after having heard for quite some time^{[ 1 ]} that Jupyter (the “de-facto” Data Science platform) could be useful in DFIR, and with the “excuse” of having some time to spare due to the COVID lock-down, I decided to dig deeper into it.

My first question (which I guess may be the same that you are asking yourself right now) was: “Why would I want to do that? What benefit can I get from Jupyter that I don't already have with, let's say, bash or python scripting?”. And, after some research, the answer was mind-blowing: not only you can do the same you used to do with scripting and one million things more, but also much faster and in a much easier and powerful way. And not only that: it opened the door to much more powerful types of analysis (statistical, machine learning, visualization, etc.). This Jupyter “thing” definitely seemed promising.

So I started researching the state of the art, and my searches revealed just a few pages and projects^{[ 2 ]} which seemed very targeted to a very specific software or type of artifact.

So my next question was: “What if I want to post-process my forensic tool output in Jupyter?”. I am a consultant, I use many different tools (free/commercial/proprietary), and I need to be able to post-process their output data. So… “How can I post-process my volatility output, or my plaso output, or my kansa output, in Jupyter?”. There was nothing out there that facilitated this process.

I soon realized that, even when ingesting the input of forensic tools is super-easy (all these tools provide a csv/json output, and pandas can easily read those formats), reading and adapting the data from different tools, step that is needed for later analysis, was a quite tedious process (specially if, for instance, a tool produces multiple different files, like kansa or kape). And doing it for multiple different tools, even if for the same artifact, was a real pain (e.g. multiple tools can provide a process list, but the format of the resulting output will be different for each tool, even if the content itself is somewhat similar).

At that point it was clear for me that I definitely wanted to incorporate the Jupyter ecosystem to my toolset (possibly making it the center of my DFIR Analysis Universe), but it was also clear that an easier data interface was needed that facilitated the ingestion of forensic tools output into the Jupyter environment. Once there, I was pretty sure that a lot of cool things could follow.

The seed of the ds4n6_lib library: a library to ingest your forensic tool output data in Jupyter

Ok, decided, I was going to create a few python functions that made it possible to ingest forensic tool output in Jupyter/pandas. And now that we are it, I thought that I could also create a few additional functions that helped with the review and visualization of the data (often times forensic data contains a myriad of columns and it is really inconvenient to work with them).

After thinking a little bit about it, I realized that this was exactly the type of thing I would love to share with the Community. In the end, Jupyter and the vast majority of the Data Science ecosystem is open source, many of the tools I was interested in parsing were open source, and therefore sharing our work with the Community was natural.

That's how the idea of the ds4n6_lib library was born.

The ds4n6_library would grow immensely later on, in terms of providing harmonization and advanced-while-easy-to-use analysis capabilities, way beyond the original data ingestion initially required.

The DS4N6 Iniative

I decided to ask fellow Forensicators if they knew the Jupyter/pandas ecosystem, and I found out that that many had heard about it but did not know how this could be of use or how it worked. One of them told me: “Well, I don't know very much about it, but I believe Jupyter is a web interface where you enter python commands, right?”.

And that somehow struck me. Well, it was technically true that Jupyter is an interface where you run “python” commands, and pandas is indeed written in python. But pandas or matplotlib are not really python itself. I mean, even if you don't have a clue of python, you can easily do pandas. I was surprised to learn that the pandas community is more than 10M users, and many of them are researchers with very little python knowledge. Actually, I've seen pandas notebooks created by primary/secondary school students, so it definitely couldn't be very difficult.

Then I realized that what I really wanted to do was helping Forensicators to jump into Data Science, for that would open a new world of analysis possibilities for them. And the more Forensicators in the Data Science Community, the cooler analysis techniques we would be able develop!

This was not only about creating a library, it was about facilitating the transition from the traditional forensics world to the new Data Science DFIR world.

And that's how the DS4N6 Initiative was born, with the Mission to “Bring Data Science & Artificial Intelligence to the fingertips of the average Forensicator and promote advances in the field”.

The SANS DFIR Summit 2020: “Data Science for DFIR - The Force Awakens”

As it is often the case, you need some motivation and focus to make things work at the right pace, since otherwise you are always caught in your daily routine which sucks every minute you have.

Often times presenting at a relevant conference gives you the required “push” to get things moving at the right speed (at least that's how it works for me). Therefore I submitted a talk to the SANS DFIR US conference, which was accepted. The clock started ticking!

At this point, with a clear mind of what I wanted to do, I involved my team at One eSecurity. We launched the ds4n6.io website, our @ds4n6_io Twitter account, our github repo, Youtube channel, etc.

Then, I made my presentation at the SANS DFIR Summit!

It was received well, I got a lot of positive feedback, but I also realized that Forensicators still saw it as too difficult and complicated, they didn't see it as something they could use right away. And in a sense it was true: the ds4n6 library, while operational, was just a Proof of Concept. And there was still was a very big gap in terms of bridging the DFIR world with the DS/AI world. We really needed to work much harder to do things muuuuuch easier.

And after that… well I'm afraid you didn't hear from us in quite a long time. Please read on to understand why.

[This is Part I of a 2-part blog post series. Please Click Here for Part II of this post]

References

[1] Kristinn Gudjonsson, of plaso/Timesketch/picatrix fame, was specifically talking about it during the SANS DFIR Summit EU 2019. And there are references to it as far back as 2015, like this paper, DFIR Analysis and Reporting Improvements with Scientific Notebook Software, by Ben S. Knowles.
[2] The Threat Hunter Playbook project, by Roberto Rodriguez, makes profuse use of notebooks, but mostly oriented to event log analysis. Michael Cohen also published a couple of papers on how to use notebooks for Velocirraptor post-processing. And more recently picatrix by Kristinn Gudjonsson to interact with Timesketch.