Advancing digital transformation means protecting data in more places, faster and on a larger scale. However, detecting and responding to critical cyber threats at the speed of digital business is particularly challenging if you can’t accelerate your investigations. That’s where data sciences come in. Applied properly, they help us develop high fidelity security analytics that facilitate security problem-solving to alert faster and more accurately on what is truly malicious.
Understanding how data science is applied starts with three questions: “how much data do we have,” “can we get it labeled,” and “what is the objective?”
Let’s begin with data. Every security team has it. They have logs, SIEM and cloud buckets. That’s all great data but are they data sources for advanced analytics? Yes and No. Big data or data lake solutions are different. One piece of data – like an event log or alert – might need to be saved multiple times in multiple ways. This allows the data to be most available for the next steps.
The next step, a critical one, is labeling. Algorithms can work with any type of data if they understand what data to expect. Understanding a random piece of data on the fly is computationally expensive and not feasible at scale. By storing data in ways that are appropriate to how it will be used and then labeling it, we enable creative algorithms. The art of labeling comes from security experience and may include things like “good and bad,” “specific data fields,” or “is it binary.”
The third element of problem-solving is an objective. Most people would say “my objective is to stop the bad guys,” but of course it is more specific than that. “Objectives” in this sense must also come from real-world insights from experts who kn ..