AIDFIR / DS4N6 Blog >> DS4N6 - The Road So Far... - Part II

DS4N6 - The Road So Far... - Part II

New Projects and The Road Ahead

[11/05/21] May 11, 2021
Jess Garcia - One eSecurity
Twitter: j3ssgarcia - LinkedIn: garciajess

[This is Part II of a 2-part blog post series. Please Click Here for Part I of this post]

The Long Silence (July'20 - April'21)

Those watching my talk in the DFIR may have felt frustrated by the fact that there was a big silence between the Summit and these posts you are recently reading. Just a few entries in the News, a few tips, and little more. Our @ds4n6_io Twitter account has been similarly silent, as well as our Youtube channel.

“Why so? Did the DS4N6 Initiative die as soon as it was born, after all that setup effort?”

Absolutely not! We've been working like crazy during these last 9 months!

The proof is the recent release of the new version of the ds4n6_lib, together with the demo Jupyter templates and the imminent release of DAISY, the DS/AI-for-DFIR Virtual Machine.

“Why the silence then?”

Well, let me explain…

Immediately after the SANS DFIR Summit I started writing a lot of blog posts to explain all the things that I thought would be needed in order to use the library. And, at the same time, I continued developing the ds4n6_lib in the way that I thought would be most transparent and easy to use for the Forensicator, and at the same time more flexible to provide future compatibility with different artifacts and different forensic tools.

And guess what? What soon happened is that the blog posts started to be obsolete even before they were finished. The library was advancing at very good speed, maturing and growing, changing almost completely in every aspect from the code I had originally written. And then I realized it was not possible to publish anything before I had reached a certain stability in the library, before I knew that the functions I was explaining would not dramatically change the following month.

I therefore focused on improving the library. As I will explain in future posts, it was an extremely interesting trip, in which I came up with the need to define a common artifact format, which I denominated the Harmonized Artifact Model (HAM). This was needed so you can have a homogeneous data format regardless of the forensic tool you used to process you artifacts.

I was also determined to not make forensicators memorize 50 different libraries and commands. I established a target of 10 different global commands, that's all. This implied that the library itself would have to take care of somehow recognizing the data that was being analyzed so it could call the appropriate tool/artifact library.

I will not go over all the gory details here, I will leave that for future posts, but I hope you can see why we couldn't really start publishing things before we had a stable version of the library.

The RSA Conference 2021: “Me, My Adversary and AI”

Come December '20, when the ds4n6_lib library had evolved substantially and started to be mature, I decided to take the next challenge: present at the RSA Conference. Just like I mentioned before with the SANS DFIR Summit, presenting at an important conference is always an “adrenaline shot” to push your innovation. Fortunately my talk “Me, My Adversary & AI: Investigating & Hunting with Machine Learning” was accepted.

I had some preliminary results of the use of Machine Learning algorithms for DFIR (I actually briefly presented about the use of AutoEncoders for detecting malicious logon activity at the SANS DFIR Summit), which with a little expansion would have made an excellent talk, but I wanted to go much deeper! And, with the help of my team, I started developing two innovative approaches for the use of Machine learning in Threat Hunting processes, and applied it to the detection of the recent Solarwinds attack: “Could you have detected the Solarwinds intrusion without IOCs via Machine Learning Anomaly Analysis in the context of a Threat Hunting process?”. We implemented this ML models in real world production server data of one of our Fortune 50 customers with which we have a Continuous Threat Hunting service and the results were amazing!

I will not elaborate further here, since I have written an specific blog post about the topic. If you want to know more, please check that post and/or attend my talk at the RSA Conference. I guarantee you will enjoy it! ;)

Current DS4N6 Projects

These are the projects we are currently working on. The ds4n6_lib library, which I have extensively spoken about, is ready for release. And our objective is to have a first version of the DAISY VM released within the next month. The other projects are in their early stages and, while we will be sharing some details about them soon, they still require to mature a bit before fully sharing with the Community.

ds4n6_lib: The DS4N6 Library
DAISY VM: The DS/AI-for-DFIR Virtual Machine. It will initially contain a basic environment with a mix of DFIR and DS/AI tools, ready to use.
HAM: The Harmonized Artifact Model. A Model that harmonizes the output of different tools into a shared common artifact format you can do your DS/AI analysis on.
ADAM: The ADversAry eMulator. A package that allows us to emulate a complex attack by injecting it directly in pandas DataFrames. This is very useful to evaluate the effectiveness of detection techniques of DS/AI models.
D4ML: The Machine Learning Library. An extension to the ds4n6_lib library that provides the capability to integrate Machine Learning models as one more tool in your DFIR analysis toolset.

What will happen in the next few months?

There are soooo many things I would like to share with you. I strongly believe the library can really help you in many of your tasks, but I do realize that even when it is now very easy to use, we will need to explain you the underlying concepts and how to use it.

Our plan from the outreach point of view is the following:

We will release a set of Jupyter notebooks that you launch in Binder in order to play with the ds4n6_lib library and see its capabilities (this has already happened, see this blog post for most information).
We will update an release all the posts that we have “in the oven”, adapting them to the latest version of the library.
We will create several videos to explain you “live” how to use the library.
An in-depth discussion of the D4ML, the machine learning module of the ds4n6_lib, along with an in-depth discussion of the models that will be presented at the RSA Conference.

From the “tooling” point of view, you should expect:

A release of the DAISY Virtual Machine somewhere in May (hopefully).
A release of an initial (proof of concept) version of the D4ML library with the functionalities that will be presented at the RSA Conference (the objective is to release right after RSA the conference).
A release of very preliminary version of the Harmonized Artifact Model (HAM) with a few sample artifacts.
Sometime later we will release a first version of the ADversAry eMulator (ADAM), but I think this will take some extra time as it is not very mature yet.

Ant to end up with, I am also planning to present at different webinars and international conferences during this year (if they want to have me! ;) ):

To maintain the “adrenaline high” for new innovation
To share the Community how to make use of the things that we have developed in the real world. There is so much we need to explain and share!
To explain how things work at the low level to the Community. There is a lot of thinking in the tools we are developing and I would like other colleagues to hear them, comment on them, criticize them, make suggestions. We definitely don't know everything, there are so many talented people in this Community, that we will be honored to receive any feedback or new ideas for improvement.

I'm also having discussions with other colleagues in the Community (like the Google TimeSketch Team, fellow SANS instructors like Lenny Zeltser, Eric Zimmerman, etc.) in order to improve the integration of their tools in our projects (ds4n6_lib, DAISY, etc.), share ideas, learn from their experience, etc.

Learning from The Road So Far, I can almost guarantee that we will hit new challenges and things that we can't even imagine right now. And those things will most likely change our plans. But this time I hope we will be able to share them with you “as we go”.

Project Continuity & Support

At seeing “one more” open source project you may be wondering if, beyond playing with the ds4n6_lib library for educational purposes, it is really worth spending the time and effort to introduce its capabilities in your daily DFIR “production” workflows, if the project will be reasonably maintained for a reasonable amount of time. Let me address this (very good) question.

The truth is that you never really know how the DFIR field will evolve in the future and if this library or any other project will be useful (or even meaningful) a few years from now. In my opinion DS and specially AI are here to stay (not only in DFIR but in general), so I believe our project will be alive and kicking facing new and interesting challenges in the months/years to come.

We are also taking the necessary steps so the project counts with the necessary resources to help it bloom. In the end, we are using this library (and the rest of the software under the DS4N6 initiative) in our own (One eSecurity) services and products, so the more it grows and matures, the better for us (and also for the Community!).

It is however natural that we will end up prioritizing the features and capabilities that we normally use versus other features or capabilities that are more alien to us (and therefore more complicated to implement). Read, for instance, the integration of a certain tool that you use and we don't (and which maybe we don't even have access to).

First, we will of course always be happy to receive third party contributions that improve the library for the good of all. And second, we are committed to this project and to the Community and, as such, we will do a best effort to provide a reasonable support to the feature requests and bug fixes that will certainly eventually arrive. We will also try to provide an implementation roadmap so you can have a view of what's coming next.

Said that, if you eventually need to implement features or capabilities specific to your needs that are not in the development roadmap, if you need help with the deployment or training, or you need Commercial/Professional Support, do not hesitate to contact us and we will be happy to discuss how to help you.

The trip goes on…

Well, that's all for the moment. I hope this blog post has given you some insight on all the things that have happened during this year. Yes, it has been a lot of work, but I truly believe that it can help a lot of people and may be a game changer in the future.

Hope you enjoyed this rather long 2-part blog post, and as I've said you will be hearing a lot more from us from now on

If you want to share this exciting trip with us just stay tuned via:

Our @ds4n6_io Twitter account
Our blog posts
Our Youtube channel

Thanks for reading us!

References

[1] Kristinn Gudjonsson, of plaso/Timesketch/picatrix fame, was specifically talking about it during the SANS DFIR Summit EU 2019. And there are references to it as far back as 2015, like this paper, DFIR Analysis and Reporting Improvements with Scientific Notebook Software, by Ben S. Knowles.
[2] The Threat Hunter Playbook project, by Roberto Rodriguez, makes profuse use of notebooks, but mostly oriented to event log analysis. Michael Cohen also published a couple of papers on how to use notebooks for Velocirraptor post-processing. And more recently picatrix by Kristinn Gudjonsson to interact with Timesketch.