I had a few conversations over the past days that all pointed to the same conclusion: many technology companies are still being built like old SaaS companies. That is a mistake. If you are building a technology product now, the priority is not a polished frontend. It is the backend: the data layer, the ontology, the APIs, the analytics layer, the authentication model, and the infrastructure that makes AI agents fast, reliable, and cheap to run on top of the data backend. The frontend still matters, but it should not be the center of gravity anymore.
TL;DR
Start with the backend and data model, not the dashboard.
Build for token efficiency as a product requirement, not just an infrastructure metric.
Expose core capabilities through APIs and agent-friendly interfaces first.
Keep the UI light, flexible, and increasingly self-serve.
If every deployment needs heavy forward deployed engineering, the product is not ready yet.
The Moat Is Moving Down the Stack
In the old SaaS model, a lot of value sat in the application layer. You built workflows, dashboards, role-based views, and configuration screens. In AI-native software, that is no longer enough. The durable part of the company is increasingly lower in the stack: the system that structures data correctly, retrieves the right context quickly, exposes useful actions cleanly, and does all of that in a reliable and token-efficient way.
If that layer is weak, the rest of the product becomes slow, expensive, brittle, and hard to customize. If that layer is strong, you can build a surprising amount on top of it very quickly.
The UI Should Get Thinner
A lot of teams still think about product development as: first build the dashboard, then add AI to it. I think it is increasingly the opposite. First build the backend that can answer questions, retrieve context, execute actions, and expose capabilities cleanly. Then add lightweight interfaces on top.
Initially, those interfaces may be very thin. In some cases they may barely be a product UI at all. A technical user might interact through Claude, another agent interface, or an internal tool layer. Over time, you can add more purpose-built interfaces and dashboards, but those should sit on top of a backend that already works well in a headless way.
Token Efficiency Is a Product Decision
One of the bigger mistakes right now is treating token usage as a backend optimization problem. It is not. It is a product design problem. If your system cannot give agents the right context in the right shape, the product becomes costly to operate and difficult to scale. That affects margins, response times, user experience, and the kinds of workflows that are even viable.
This is why the backend matters so much. You need data structures, query systems, and analytics layers that are built for AI interaction, not just for human dashboards. A beautiful interface on top of an inefficient backend is not an AI product. It is a demo with a future cost problem.
The Goal Is Self-Serve Customization
A lot of tech companies are also running into the same trap: they need too much forward deployed engineering to make each customer successful. That is understandable for now, but it is not where you want to stay. The goal should be to make the platform configurable enough that a solutions engineer, a sales engineer, or eventually even the customer can shape the experience without constantly pulling in core backend engineers.
That only works if the system is designed the right way. If the logic, data model, and capabilities are modular and exposed well, you can let people create their own views, workflows, and operating layers on top. If not, every customer request turns into a product detour.
Build the engine first. Build the data layer properly. Make it fast, cheap, reliable, and cleanly exposed. Then let the frontend become lighter, more dynamic, and more self-serve over time. That is increasingly the difference between an AI first company and a SaaS company with an AI feature.
I have been talking to a few AI SOC and new SIEM market entrants over the past few weeks. I have voiced some opinions in previous posts but have now started to capture a list of features that I believe represent the openings existing SIEM players have created in the market for these new vendors to emerge.
Before I outline what I think those features are, let me be clear: this is my list. I am aware that existing SIEM vendors will claim that they already do many of these things. All I will say is this: market churn and capital flow suggest that these capabilities are either not as mature or not as integrated as claimed.
And to the AI SOC companies and investors: be careful about the short-term problems your investments are solving. Yes, there is real traction with MSSPs that are overloaded with false positives. And yes, many will gladly pay to reduce alert workload by 80%. But in many cases, these problems are being addressed superficially. Make sure you audit the underlying approaches and verify that the foundational infrastructure is sound. Solving this problem on top of an existing detection infrastructure doesn’t solve the problem at the core, which is the detections themselves. We need to fix those with some of the suggestions below to not needing a top-layer, alert reducer.
Without further ado, here are the items I am tracking. I welcome other opinions and additions to the list (no guarantee I will include them). Over the coming weeks, I will also try to rate some of the players across these categories to enable comparison. I could use help with that. Ping me.
A. DATA & CONTROL PLANE ARCHITECTURE
Federation – The ability to query and reason over data where it lives, without forced centralization. (Another post following here at some point about the limitations of federation).
Data PipelineOptimization – Dynamic ingestion pipelines that enrich, route, sample, and filter data based on use case, risk, and downstream value. Not static “send everything to the lake.”
Data Awareness – Understanding what data exists, what is missing, and what has silently degraded. The system must continuously reason about its own observability.
Performance as a First-Class Constraint – Fast joins and low-latency queries across all relevant data. Real-time rule execution at scale. This is not about basic scalability, but about maintaining predictable performance as rule count and complexity increase, without simply throwing more compute at the problem.
Modern AI Integration – The ability to integrate with emerging architectural patterns and frameworks, including MCP servers, vector stores, and related systems.
B. DETECTION & LEARNING SYSTEMS
Hypothesis-Driven Hunting – Hunting should start with explicit hypotheses, not ad-hoc queries. These hypotheses should evolve, fork, and self-update based on outcomes. Agents swarms anyone?
Automated Detection Tuning (Closed Loop) – Detections must evaluate their precision and recall over time. False positives and false negatives are signals. Humans stay in the loop, but are not the tuning engine. This also helps separate the detection engineering from the tuning that should be done by analysts.
Environment-Adaptive Detections – Rules and models must adapt automatically to the specific environment, business processes, and user behavior and analyst feedback. Generic detections are table stakes.
Detection Lineage and Memory – The system must remember why a detection exists, how it has changed, and what outcomes it has historically produced.
C. ENTITY-CENTRIC RISK & CONTEXT
Asset Awareness – Effective protection and detection start with understanding what is being protected. Entity visibility is foundational: who owns this entity, what does it do, and which business processes does it support?
Real-Time Entity Risk Scoring – Each entity has a continuously updated risk score driven by behavior, exposure, and contextual signals.
Entity Risk Context – Risk is not a number. It is a set of properties that help explain the risk and provide context for decision making.
Business Context Integration – Entities must be tied to business processes, ownership, and criticality, and this context must inform alert generation and prioritization. Some people have started calling this the Context Graph.
D. OPERATIONAL REALITY (SOC, MSSP, ENFORCEMENT)
Simple QueryInterface: Support for both natural language and structured query languages (such as KQL). Analysts need both.
Alert Triage Automation – Using ‘advanced’ context to tune detections. Ideally we have business context available to continuously improve our detections.
Blindspot Detection – The system must actively identify where detections cannot exist due to missing or degraded logs or logging configurations. This includes making sure that log sources are actually staying up and keep reporting what they have to.
Real-Time Readiness for Enforcement – We need our systems to become preventative. Therefore, its risk model must operate in near real time. Attackers are acting too fast.
A Few Additional Comments for Context
This is not meant to be a SIEM RFP. I am intentionally not listing table-stakes capabilities such as basic scalability, data source support, or baseline detection depth.
This list is less about features than about where intelligence and control actually live in the system. I am also not being prescriptive on how these features are built. Many of them can benefit from AI / LLM / ML approaches and, in fact, should be using them.
Look at the list, then look at your AI SOC platform of choice. How much of the above does it truly cover?
If you are evaluating an AI SOC platform and most of its value proposition lives above alerts rather than below them, you should be skeptical.
And why most of the arguments do not hold up under scrutiny
Over the past 18 to 24 months, venture capital has flowed into a fresh wave of SIEM challengers including Vega (which raised $65M in seed and Series A at a ~$400M valuation), Perpetual Systems, RunReveal, Iceguard, Sekoia, Cybersift, Ziggiz, and Abstract Security, all pitching themselves as the next generation of security analytics. What unites them is not just funding but a shared narrative that incumbent SIEMs are fundamentally broken: too costly, too siloed, too hard to scale, and too ineffective in the face of modern data volumes and AI-driven threats.
This post does not belabor each startup’s product. Instead it abstracts the shared assertions that justify recent funding and then stresses them to see which hold up under scrutiny. I am not defending incumbents. I am trying to separate real gaps from marketing (and funding) narratives.
The “SIEM is Broken” Narrative
A commonly cited industry report claimed that major SIEM tools cover only about 19% of MITRE ATT&CK techniques despite having access to data that could cover ~87%. That statistic is technically interesting but also deeply misleading: ATT&CK technique coverage is not an operational measure of detection quality or effectiveness, it primarily reflects rule inventory and tuning effort. Nevertheless, it has become a core justification for the “SIEM is obsolete” narrative. I wasn’t able to find the original report to validate what and how they tested, but I have seen SIEMs that cover much more and have big detection teams taking care of these issues.
The Five Core Claims Driving the Market Thesis
Across decks, interviews, and marketing copy, I picked five recurring themes that define what these companies think incumbents get wrong and what investors are underwriting as the path forward.
1. “Centralized SIEM architectures no longer scale”
The claim is that forcing security telemetry into a centralized repository is too expensive and too slow for modern enterprises generating terabytes of logs every day. The proposed fixes include federated queries, analyzing data where it lives, and decoupling detection from ingestion so you never have to move or duplicate all your data.
The challenge is that correlation, state, timelines, and real-time detection require locality. Distributed query engines excel at ad-hoc exploration but are not substitutes for continuous detection pipelines. Federated queries introduce latency, inconsistent performance, and complexity every time you write a detection. Normalization deferred to query time pushes complexity into every rule. You do not eliminate cost, you shift it to unpredictable query execution and compute costs that spike precisely when incidents occur. Centralizing data isn’t a flaw; it is a tradeoff that supports correlation engines, summary indexes, entity timelines, and stateful detections that distributed query models struggle to maintain in real time. In fact, if the SIEM was to store the data in the customer’s S3 bucket, you can keep cost somewhat under control.
2. “SIEM pricing is broken because it charges by data volume”
A frequent refrain is that incumbent SIEMs penalize good security hygiene by tying pricing to ingestion volume, which becomes untenable as data grows. The proposed response is pricing models untethered from volume, open storage, and customer-controlled compute.
The challenge is that cost doesn’t vanish because you hide volume. Compute, memory, enrichment, retention, and query costs all remain. If pricing is detached from ingestion, it typically reappears as unpredictable query charges, usage tiers, or gated features. Volume is not an arbitrary metric; it correlates with the cost a vendor (or customer) incurs. Treating cost as orthogonal to data volume does not make it disappear; it just blinds you to a key cost driver. I have dealt with all the pricing models: by user, by device, by volume, … in the end I needed to make my gross margins work, guess who pays for that?
3. “SIEM detections are weak because they rely on bad rules”
New entrants commonly assert that traditional SIEM rules are noisy, static, and unable to keep up with modern threat techniques. Solutions offered include natural-language detections, detections-as-code, continuous evaluation, and AI-generated rules.
The challenge is that many of these still sit atop the same primitives. For example, SIGMA is widely used as a community detection language, but it is fundamentally limited: it is mostly single-event, cannot express event ordering or causality, has no native temporal abstractions or entity-centric modeling, and cannot natively express thresholds, rates, cardinality, or statistical baselines. Wrapping these limitations in AI or “natural language” does not change the underlying detection physics. You can improve workflow and authoring experience, but you do not fundamentally invent a new class of detection with the same primitives. And guess what, large vendors have pretty significant content teams – I mean detection engineering teams – often tied into their threat research labs. Don’t tell me that a startup has found a more cost effective and higher efficacy way to release detection rules. If that were the case, all these large vendors would be dumb to operate such large teams.
4. “SIEMs lack context, causing false positives”
The argument here is that existing SIEMs flood analysts with alert noise because they lack deep asset context, threat intelligence, or behavioral understanding. New entrants promise tightly integrated TI feeds, cloud context, or built-in behavior analytics.
Context integration has been a focus of incumbent platforms for years. The real hard problem is not accessing context but operationalizing it without drowning analysts. More feeds often mean more noise unless you have mature enrichment pipelines, entity resolution, and risk scoring built into rules that understand multi-stage attack sequences. Adding more sources does not automatically improve signal quality. The noise problem is as much about rule quality and use-case focus as it is about context availability. Apply the same argument here with regards to the quality of threat feeds that I outlined in the last item.
5. “AI-native SIEMs will finally fix detection and response”
Perhaps the most seductive claim is that incumbent SIEMs were built for a pre-AI world and that new platforms built with agentic AI at every layer will finally crack automation, detection, and investigation.
The challenge is that AI does not eliminate the need for structured, high-quality, normalized data, or explainability, or deterministic behavior in high-risk contexts. AI can accelerate workflows, assist with investigation, and suggest hypotheses, but it does not replace the need for precise, reproducible, and auditable detection logic. Most AI-native claims today are improvements in UX and speed, not architectural breakthroughs in detection theory.
The Uncomfortable Conclusion
VC money is flowing because SIEM is operationally hard, expensive, and often unpopular with SOC teams. There is real pain and real gaps, especially around cost transparency, scaling, and usability. But declaring existing SIEMs obsolete because they are imperfect is not a thesis; it is a marketing slogan.
The core assumptions driving this funding wave deserve scrutiny: centralization is treated as a flaw rather than a tradeoff necessary for continuous detection, pricing complaints get conflated with architectural insights, detection quality is blamed on tooling rather than operational realities, and AI is overstated as a panacea.
On the flip side, here are a couple of directions that should be looked at:
Some of the new entrant SIEMs actually make a dent. They are rebuilding their entire pipelines and storage architecture with modern technologies, not old paradigms. They have a clear advantage and don’t have to deal with millions of lines of tech debt. Using an agentic AI architecture could be quite interesting here.
As the AI SOC emerges – and maybe become a reality – we will probably see more and more MCP servers exposing infrastructure information that can be leveraged, from alerts to context to response capabilities. But we’ll need to see how data schemas and all that will evolve.
The one innovation that has already generated some returns for investors is the entire data pipeline world. Companies like Observo (I had the privilege to be an advisor) have truly added something useful to the SIEMs and as I argue in one of my previous blogs, needs to really become a capability baked into each SIEM out there.
Before diving into cyber security and how the industry is using AI at this point, let’s define the term AI first. Artificial Intelligence (AI), as the term is used today, is the overarching concept covering machine learning (supervised, including Deep Learning, and unsupervised), as well as other algorithmic approaches that are more than just simple statistics. These other algorithms include the fields of natural language processing (NLP), natural language understanding (NLU), reinforcement learning, and knowledge representation. These are the most relevant approaches in cyber security.
Given this definition, how evolved are cyber security products when it comes to using AI and ML?
I do see more and more cyber security companies leverage ML and AI in some way. The question is to what degree. I have written before about the dangers of algorithms. It’s gotten too easy for any software engineer to play a data scientist. It’s as easy as downloading a library and calling the .start() function. The challenge lies in the fact that the engineer often has no idea what just happened within the algorithm and how to correctly use it. Does the algorithm work with non normally distributed data? What about normalizing the data before inputting it into the algorithm? How should the results be interpreted? I gave a talk at BlackHat where I showed what happens when we don’t know what an algorithm is doing.
Slide from BlackHat 2018 talk about “Why Algorithms Are Dangerous” showing what can go wrong by blindly using AI.
So, the mere fact that a company is using AI or ML in their product is not a good indicator of the product actually doing something smart. On the contrary, most companies I have looked at that claimed to use AI for some core capability are doing it ‘wrong’ in some way, shape or form. To be fair, there are some companies that stick to the right principles, hire actual data scientists, apply algorithms correctly, and interpret the data correctly.
Generally, I see the correct application of AI in the supervised machine learning camp where there is a lot of labeled data available: malware detection (telling benign binaries from malware), malware classification (attributing malware to some malware family), document and Web site classification, document analysis, and natural language understanding for phishing and BEC detection. There is some early but promising work being done on graph (or social network) analytics for communication analysis. But you need a lot of data and contextual information that is not easy to get your hands on. Then, there are a couple of companies that are using belief networks to model expert knowledge, for example, for event triage or insider threat detection. But unfortunately, these companies are a dime a dozen.
That leads us into the next question: What are the top use-cases for AI in security?
I am personally excited about a couple of areas that I think are showing quite some promise to advance the cyber security efforts:
Using NLP and NLU to understand people’s email habits to then identify malicious activity (BEC, phishing, etc). Initially we have tried to run sentiment analysis on messaging data, but we quickly realized we should leave that to analyzing tweets for brand sentiment and avoid making human (or phishing) behavior judgements. It’s a bit too early for that. But there are some successes in topic modeling, token classification of things like account numbers, and even looking at the use of language.
Leveraging graph analytics to map out data movement and data lineage to learn when exfiltration or malicious data modifications are occurring. This topic is not researched well yet and I am not aware of any company or product that does this well just yet. It’s a hard problem on many layers, from data collection to deduplication and interpretation. But that’s also what makes this research interesting.
Given the above it doesn’t look like we have made a lot of progress in AI for security. Why is that? I’d attribute it to a few things:
Access to training data. Any hypothesis we come up with, we have to test and validate. Without data that’s hard to do. We need complex data sets that are showing user interactions across applications, their data, and cloud apps, along with contextual information about the users and their data. This kind of data is hard to get, especially with privacy concerns and regulations like GDPR putting more scrutiny on processes around research work.
A lack of engineers that understand data science and security. We need security experts with a lot of experience to work on these problems. When I say security experts, these are people that have a deep understand (and hands-on experience) of operating systems and applications, networking and cloud infrastructures. It’s unlikely to find these experts who also have data science chops. Pairing them with data scientists helps, but there is a lot that gets lost in their communications.
Research dollars. There are few companies that are doing real security research. Take a larger security firm. They might do malware research, but how many of them have actual data science teams that are researching novel approaches? Microsoft has a few great researchers working on relevant problems. Bank of America has an effort to fund academia to work on pressing problems for them. But that work generally doesn’t see the light of day within your off the shelf security products. Generally, security vendors don’t invest in research that is not directly related to their products. And if they do, they want to see fairly quick turn arounds. That’s where startups can fill the gaps. Their challenge is to make their approaches scalable. Meaning not just scale to a lot of data, but also being relevant in a variety of customer environments with dozens of diverging processes, applications, usage patterns, etc. This then comes full circle with the data problem. You need data from a variety of different environments to establish hypotheses and test your approaches.
Is there anything that the security buyer should be doing differently to incentivize security vendors to do better in AI?
I don’t think the security buyer is to blame for anything. The buyer shouldn’t have to know anything about how security products work. The products should do what they claim they do and do that well. I think that’s one of the mortal sins of the security industry: building products that are too complex. As Ron Rivest said on a panel the other day: “Complexity is the enemy of security”.
Also have a look at the VentureBeat article feating some quotes from me.
Building an AI Powered Intelligence Community (Click image for video)
Here is the list of topics I injected into the panel conversation:
Algorithms (AI) are Dangerous
Privacy by Design
Expert Knowledge over algorithms
The need for a Security Paradigm Shift
Efficacy in AI is non existent
The need for learning how to work interdisciplinary
Please not that I am following in the vein of the conference and I won’t define specifically what I mean by “AI”. Have a look at my older blog posts for further opinions. Following are some elaborations on the different topics:
Algorithms (AI) are Dangerous – We allow software engineers to use algorithms (libraries) for which they do not know what results are produced. There is no oversight demand – imagine the wrong algorithms being used to control any industrial control systems. Also realize that it’s not about using the next innovation in algorithms. When DeepLearning entered the arena, everyone tried to use it for their problems. Guess what; barely any problem could be solved by it. It’s not about the next algorithms. It’s about how these algorithms are used. The process around them. Interestingly enough, one of the most pressing and oldest problems that every CISO today is still wrestling with is ‘visibility’. Visibility into what devices and users are on a network. That has nothing to do with AI. It’s a simple engineering problem and we still haven’t solved it.
Privacy by Design – The entire conference day didn’t talk enough about this. In a perfect world, our personal data would never leave us. As soon as we give information away it’s exposed and it can / and probably will be abused. How do we build such systems?
Expert Knowledge – is still more important than algorithms. We have this illusion that AI (whatever that is), will solve our problems by analyzing data with the use of software systems that work with a cloud based database. Instead of using “AI” to augment human capabilities. In addition, we need experts who really understand the problems. Domain experts. Security experts. People with experience to help us build better systems.
Security Paradigm Shift – We have been doing security the wrong way. For two decade we have engaged in the security cat and mouse game. We need to break out of that. Only an approach of understanding behaviors can get us there.
Efficacy – There are no approaches to describing how well an AI system works. Is my system better than someone else’s? How do we measure these things?
Interdisciplinary Collaboration – As highlighted in my ‘expert point’ above; we need to focus on people. And especially on domain experts. We need multi-disciplinary teams. Psychologists, counter intelligence people, security analysts, systems engineers, etc. to collaborate in order to help us come up with solutions to combat security issues. There are dozens of challenges with these teams. Even just something as simple as terminology or a common understanding of the goals pursued. And this is not security specific. Every area has this problem.
The following was a fairly interesting thing that was mentioned during one of the other conference panels. This is a “non verbatum” quote:
AI is one of the poster children of bipartisanship. Ever want to drive bipartisanship? Engage on an initiative with a common economical enemy called China.
Oh, and just so I have written proof when it comes to it: China will win the race on AI! Contrary to some of the other panels. Why? Let me list just four thoughts:
No privacy laws or ethical barriers holding back any technology development
Availability of lots of cheap, and many of them, very sophisticated resources
The already existing vast and incredibly rich amount of data and experiences collected; from facial recognition to human interactions with social currencies
A government that controls industry
I am not saying any of the above are good or bad. I am just listing arguments.
Last week I was speaking on a panel about the “Use of AI for Cybersecurity” at the Intelligence and National Security Alliance (INSA) conference on “Building an AI Powered Intelligence Community”. It was fascinating to listen to some of the panels with people from the Hill talking about AI. I was specifically impressed with the really educated views on issues with AI, like data bias, ethical and privacy issues, bringing silicon valley software development processes to the DoD, etc. I feel like at least the panelists had a pretty good handle on some of the issues with AI.
The one point that I am still confused about is what all these people actually meant when they said “AI”; or how the “Government” defines AI.
I have been reading through a number of documents and reports from the US government, but almost all of them do not define what AI actually is. For example the American AI Initiative One Year Annual Report to the president doesn’t bother defining AI.
Artificial intelligence (AI) is one such technological advance. AI refers to the ability of machines to perform tasks that normally require human intelligence – for example, recognizing patterns, learning from experience, drawing conclusions, making predictions, or taking action – whether digitally or as the smart software behind autonomous physical systems.
Seems to me that this definition could use some help. NIST on their AI page doesn’t have a definition front and center. And the documents I browsed through didn’t have one either.
Sec. 9. Definitions. As used in this order:
(a) the term “artificial intelligence” means the full extent of Federal investments in AI, to include: R&D of core AI techniques and technologies; AI prototype systems; application and adaptation of AI techniques; architectural and systems support for AI; and cyberinfrastructure, data sets, and standards for AI;
I would call this a circular definition? Or what do you call this? A non-definition? Maybe I have focused on the wrong documents? What about the definition of AI by the Joint Artificial Intelligence Center (JAIC). a group within the DoD? The JAIC Web site does not seem to have a definition, at least not one I could find.
One document that seems to get it is the Artificial Intelligence and National Security report, which has an entire section discussing the different aspects of AI and what they mean by the acronym.
In closing, if we have policy, legislative, or regulatory conversation, we must define what AI is. Otherwise we have conversations that go into the absolutely wrong directions. Does 5G fall under AI? How about NLP or automating the transcription of a conference presentation? If we don’t get clear, we will write legislation and put out bills that do not cover the technologies and approaches we actually want to govern but will put roadblocks into the path of innovation and the so fiercely sought after dominance in AI.
I was just reading an article from Forrester research about “Artificial Intelligence Is Transforming Fraud Management”. Interesting read until about half way through where the authors start talking about supervised and unsupervised learning. That’s when they lost a lot of credibility:
Supervised learning makes decisions directly. Several years ago, Bayesian models, neural networks, decision trees, random forests, and support vector machines were popular fraud management algorithms. (see endnote 8) But they can only handle moderate amounts of training data; fraud pros need more complex models to handle billions of training data points. Supervised learning algorithms are good for predicting whether a transaction is fraudulent or not."
Aside from the ambiguity of what it means for an algorithm to make ‘direct’ decisions, SML can only take limited amounts of training data? Have you seen our malware deep learners? In turn, if SML is good at predicting fraudulent transaction, what’s the problem with training data?
What do they say about unsupervised approaches?
Unsupervised learning discovers patterns. Fraud management pros employ unsupervised learning to discover anomalies across raw data sets and use self-organizing maps and hierarchical and multimodal clustering algorithms to detect swindlers. (see endnote 10) The downside of unsupervised learning is that it is usually not explainable. To overcome this, fraud pros often use locally interpretable, model-agnostic explanations to process results; to improve accuracy, they can also train supervised learning with labels discovered by unsupervised learning. Unsupervised learning models are good at visualizing patterns for human investigators.
And here it comes: “The downside of UML is that it is usually not explainable”. SML is much more prone to that problem than UML. Please get the fundamentals right. Reading something like this makes me question pretty much the entire article on its accuracy. There are some challenges with explainability and UML, but they are far less involed.
As a further nuance: “UML is not itself good at visualizing patterns. Some of the algorithms lend themselves to visualize the output. But there is more to turning a clustering algo into a good visual. I mention t-sne in one of my older blog posts. That algorithm actually follows an underlying visualization paradigm (projection of multiple dimensions into two or three dimensions).
Reading on in the article, it says:
As this use case requires exceptional performance and accuracy, supervised learning dominates.
I thought SML doesn’t scale? Turns out, it actually does quite well, not least because you can run a learner offline.
This paper highlights the problem of needing domain experts to build machine learning approaches for security. You cannot rely on pure data scientists without a solid security background or at least a very solid understanding of the domain, to build solutions. What a breath of fresh air. I hole heartedly agree with this. But let’s look at how the authors went about their work.
The example that is used in the paper is in the area of malware detection; a problem that is a couple of decades old. The authors looked at binaries as byte streams and initially argued that we might be able to get away without feature engineering by just feeding the byte sequences into a deep learning classifier – which is one of the premises of deep learning, not having to define features for it to operate. The authors then looked at some adversarial scenarios that would circumvent their approach. (Side bar: I wish Cylance had read this paper a couple years ago). The paper goes through some ROC curves and arguments to end up with some lessons learned:
Training sets matter when testing robustness against adversarial examples
Architectural decisions should consider effects of adversarial examples
Semantics is important for improving effectiveness [meaning that instead of just pushing a binary stream into the deep learner, carefully crafting features is going to increase the efficacy of the algorithm]
Please tell me which of these three are non obvious? I don’t know that we can set the bar any lower for security data science.
I want to specifically highlight the last point. You might argue that’s the one statement that’s not obvious. The authors basically found that, instead of feeding simple byte sequences into a classifier, there is a lift in precision if you feed additional, higher-level features. Anyone who has looked at byte code before or knows a little about assembly should know that you can achieve the same program flow in many ways. We must stop comparing security problems to image or speech recognition. Binary files, executables, are not independent sequences of bytes. There is program flow, different ‘segments’, dynamic changes, etc.
We should look to other disciplines (like image recognition) for inspiration, but we need different approaches in security. Get inspiration from other fields, but understand the nuances and differences in cyber security. We need to add security experts to our data science teams!
Over the weekend I was catching up on some reading and came about the “Deep Learning and Security Workshop (DLS 2019)“. With great interest I browsed through the agenda and read some of the papers / talks, just to find myself quite disappointed.
It seems like not much has changed since I launched this blog. In 2005, I found myself constantly disappointed with security articles and decided to outline my frustrations on this blog. That was the very initial focus of this blog. Over time it morphed more into a platform to talk about security visualization and then artificial intelligence. Today I am coming back to some of the early work of providing, hopefully constructive, feedback to some of the work out there.
The researcher paper I am looking at is about building a deep learning based malware classifier. I won’t comment on the fact that every AV company has been doing this for awhile (but learned from their early mistakes of not engineering ‘intelligent’ features). I also won’t discuss the machine learning architecture that is introduced. What I will argue is the approach that was taken and the conclusions that were drawn:
The paper uses a data set that has no ground truth. Which, in network security is very normal. But it needs to be taken into account. Any conclusion that is made is only relative to the traffic that the algorithm was tested, at the time of testing and under the used configuration (IDS signatures). The paper doesn’t discuss adoption or changes over time. It’s a bias that needs to be clearly taken into account.
The paper uses a supervised approach leveraging a deep learner. One of the consequences is that this system will have a hard time detecting zero days. It will have to be retrained regularly. Interestingly enough, we are in the same world as the anti virus industry when they do binary classification.
Next issue. How do we know what the system actually captures and what it does not?
This is where my recent rants on ‘measuring the efficacy‘ of ML algorithms comes into play. How do you measure the false negative rates of your algorithms in a real-world setting? And even worse, how do you guarantee those rates in the future?
If we don’t know what the system can detect (true positives), how can we make any comparative statements between algorithms? We can make a statement about this very setup and this very data set that was used, but again, we’d have to quantify the biases better.
In contrast to the supervised approach, the domain expert approach has a non-zero chance of finding future zero days due to the characterization of bad ‘behavior’. That isn’t discussed in the paper, but is a crucial fact.
The paper claims a 97% detection rate with a false positive rate of less than 1% for the domain expert approach. But that’s with domain expert “Joe”. What about if I wrote the domain knowledge? Wouldn’t that completely skew the system? You have to somehow characterize the domain knowledge. Or quantify its accuracy. How would you do that?
Especially the last two points make the paper almost irrelevant. The fact that this wasn’t validated in a larger, real-world environment is another fallacy I keep seeing in research papers. Who says this environment was representative of every environment? Overall, I think this research is dangerous and is actually portraying wrong information. We cannot make a statement that deep learning is better than domain knowledge. The numbers for detection rates are dangerous and biased, but the bias isn’t discussed in the paper.
Before even diving into the topic of Causality Research, I need to clarify my use of the term #AI. I am getting sloppy in my definitions and am using AI like everyone else is using it, as a synonym for analytics. In the following, I’ll even use it as a synonym for supervised machine learning. Excuse my sloppiness …
Causality Research is a topic that has emerged from the shortcomings of supervised machine learning (SML) approaches. You train an algorithm with training data and it learns certain properties of that data to make decisions. For some problems that works really well and we don’t even care about what exactly the algorithm has learned. But in certain cases, we really would like to know what the system just learned. Your self-driving car, for example. Wouldn’t it be nice if we actually knew how the car makes decisions? Not just for our own peace of mind, but also to enable verifyability and testing.
Here are some thoughts about what is happening in the area of causality for AI:
This topic is drawing attention because people are having their blinders on when defining what AI is. AI is more than supervised machine learning, and a number of the algorithms in the field, like belief networks, are beautifully explainable.
We need to get away from using specific algorithms as the focal point of our approaches. We need to look at the problem itself and determine what the right solution to the problem is. Some of the very old methods like belief networks (I sound like a broken record) are fabulous and have deep explainability. In the grand scheme of things, only few problems require supervised machine learning.
We are finding ourselves in a world where some people believe that data can explain everything. It cannot. History is not a predictor of the future. Even in experimental physics, we are getting to our limits and have to start understanding the fundamentals to get to explainability. We need to build systems that help experts encode their knowledge and augments human cognition by automating tasks that machines are good at.
The recent Cylance faux pas is a great example why supervised machine learning and AI can be really really dangerous. And it brings up a different topic that we need to start exploring more, which is how we measure the efficacy or precision of AI algorithms. How do we assess the things a given AI or machine learning approach misses and what are the things it classifies wrong? How does one compute these metrics for AI algorithms? How do we determine whether one algorithm is better than another. For example, the algorithm that drives your car. How do you know how good it is? Does a software update make it better? How much? That’s a huge problem in AI and ‘causality research’ might be able to help develop methods to quantify efficacy.