Here is the list of topics I injected into the panel conversation:
Algorithms (AI) are Dangerous
Privacy by Design
Expert Knowledge over algorithms
The need for a Security Paradigm Shift
Efficacy in AI is non existent
The need for learning how to work interdisciplinary
Please not that I am following in the vein of the conference and I won’t define specifically what I mean by “AI”. Have a look at my older blog posts for further opinions. Following are some elaborations on the different topics:
Algorithms (AI) are Dangerous – We allow software engineers to use algorithms (libraries) for which they do not know what results are produced. There is no oversight demand – imagine the wrong algorithms being used to control any industrial control systems. Also realize that it’s not about using the next innovation in algorithms. When DeepLearning entered the arena, everyone tried to use it for their problems. Guess what; barely any problem could be solved by it. It’s not about the next algorithms. It’s about how these algorithms are used. The process around them. Interestingly enough, one of the most pressing and oldest problems that every CISO today is still wrestling with is ‘visibility’. Visibility into what devices and users are on a network. That has nothing to do with AI. It’s a simple engineering problem and we still haven’t solved it.
Privacy by Design – The entire conference day didn’t talk enough about this. In a perfect world, our personal data would never leave us. As soon as we give information away it’s exposed and it can / and probably will be abused. How do we build such systems?
Expert Knowledge – is still more important than algorithms. We have this illusion that AI (whatever that is), will solve our problems by analyzing data. Instead of using “AI” to augment human capabilities. In addition, we need experts who really understand the problems. Domain experts. Security experts. People with experience to help us build better systems.
Security Paradigm Shift – We have been doing security the wrong way. For two decade we have engaged in the security cat and mouse game. We need to break out of that. Only an approach of understanding behaviors can get us there.
Efficacy – There are no approaches to describing how well an AI system works. Is my system better than someone else’s? How do we measure these things?
Interdisciplinary Collaboration – As highlighted in my ‘expert point’ above; we need to focus on people. And especially on domain experts. We need multi-disciplinary teams. Psychologists, counter intelligence people, security analysts, systems engineers, etc. to collaborate in order to help us come up with solutions to combat security issues. There are dozens of challenges with these teams. Even just something as simple as terminology or a common understanding of the goals pursued. And this is not security specific. Every area has this problem.
The following was a fairly interesting thing that was mentioned during one of the other conference panels. This is a “non verbatum” quote:
AI is one of the poster children of bipartisanship. Ever want to drive bipartisanship? Engage on an initiative with a common economical enemy called China.
Oh, and just so I have written proof when it comes to it: China will win the race on AI! Contrary to some of the other panels. Why? Let me list just four thoughts:
No privacy laws or ethical barriers holding back any technology development
Availability of lots of cheap, and many of them, very sophisticated resources
The already existing vast and incredibly rich amount of data and experiences collected; from facial recognition to human interactions with social currencies
A government that controls industry
I am not saying any of the above are good or bad. I am just listing arguments.
Last week I was speaking on a panel about the “Use of AI for Cybersecurity” at the Intelligence and National Security Alliance (INSA) conference on “Building an AI Powered Intelligence Community”. It was fascinating to listen to some of the panels with people from the Hill talking about AI. I was specifically impressed with the really educated views on issues with AI, like data bias, ethical and privacy issues, bringing silicon valley software development processes to the DoD, etc. I feel like at least the panelists had a pretty good handle on some of the issues with AI.
The one point that I am still confused about is what all these people actually meant when they said “AI”; or how the “Government” defines AI.
I have been reading through a number of documents and reports from the US government, but almost all of them do not define what AI actually is. For example the American AI Initiative One Year Annual Report to the president doesn’t bother defining AI.
Artificial intelligence (AI) is one such technological advance. AI refers to the ability of machines to perform tasks that normally require human intelligence – for example, recognizing patterns, learning from experience, drawing conclusions, making predictions, or taking action – whether digitally or as the smart software behind autonomous physical systems.
Seems to me that this definition could use some help. NIST on their AI page doesn’t have a definition front and center. And the documents I browsed through didn’t have one either.
Sec. 9. Definitions. As used in this order:
(a) the term “artificial intelligence” means the full extent of Federal investments in AI, to include: R&D of core AI techniques and technologies; AI prototype systems; application and adaptation of AI techniques; architectural and systems support for AI; and cyberinfrastructure, data sets, and standards for AI;
I would call this a circular definition? Or what do you call this? A non-definition? Maybe I have focused on the wrong documents? What about the definition of AI by the Joint Artificial Intelligence Center (JAIC). a group within the DoD? The JAIC Web site does not seem to have a definition, at least not one I could find.
In closing, if we have policy, legislative, or regulatory conversation, we must define what AI is. Otherwise we have conversations that go into the absolutely wrong directions. Does 5G fall under AI? How about NLP or automating the transcription of a conference presentation? If we don’t get clear, we will write legislation and put out bills that do not cover the technologies and approaches we actually want to govern but will put roadblocks into the path of innovation and the so fiercely sought after dominance in AI.
I was just reading an article from Forrester research about “Artificial Intelligence Is Transforming Fraud Management”. Interesting read until about half way through where the authors start talking about supervised and unsupervised learning. That’s when they lost a lot of credibility:
Supervised learning makes decisions directly. Several years ago, Bayesian models, neural networks, decision trees, random forests, and support vector machines were popular fraud management algorithms. (see endnote 8) But they can only handle moderate amounts of training data; fraud pros need more complex models to handle billions of training data points. Supervised learning algorithms are good for predicting whether a transaction is fraudulent or not."
Aside from the ambiguity of what it means for an algorithm to make ‘direct’ decisions, SML can only take limited amounts of training data? Have you seen our malware deep learners? In turn, if SML is good at predicting fraudulent transaction, what’s the problem with training data?
What do they say about unsupervised approaches?
Unsupervised learning discovers patterns. Fraud management pros employ unsupervised learning to discover anomalies across raw data sets and use self-organizing maps and hierarchical and multimodal clustering algorithms to detect swindlers. (see endnote 10) The downside of unsupervised learning is that it is usually not explainable. To overcome this, fraud pros often use locally interpretable, model-agnostic explanations to process results; to improve accuracy, they can also train supervised learning with labels discovered by unsupervised learning. Unsupervised learning models are good at visualizing patterns for human investigators.
And here it comes: “The downside of UML is that it is usually not explainable”. SML has that problem not UML. Please get the fundamentals right. Reading something like this makes me question pretty much the entire article on its accuracy. And as a nuance: “UML is not itself good at visualizing patterns. Some of the algorithms lend themselves to visualize the output. But there is more to turning a clustering algo into a good visual. I mention t-sne in one of my older blog posts. That algorithm actually follows an underlying visualization paradigm (projection of multiple dimensions into two or three dimensions).
Reading on in the article, it says:
As this use case requires exceptional performance and accuracy, supervised learning dominates.
I thought SML doesn’t scale? Turns out, it actually does quite well, not least because you can run a learner offline.
This paper highlights the problem of needing domain experts to build machine learning approaches for security. You cannot rely on pure data scientists without a solid security background or at least a very solid understanding of the domain, to build solutions. What a breath of fresh air. I hole heartedly agree with this. But let’s look at how the authors went about their work.
The example that is used in the paper is in the area of malware detection; a problem that is a couple of decades old. The authors looked at binaries as byte streams and initially argued that we might be able to get away without feature engineering by just feeding the byte sequences into a deep learning classifier – which is one of the premises of deep learning, not having to define features for it to operate. The authors then looked at some adversarial scenarios that would circumvent their approach. (Side bar: I wish Cylance had read this paper a couple years ago). The paper goes through some ROC curves and arguments to end up with some lessons learned:
Training sets matter when testing robustness against adversarial examples
Architectural decisions should consider effects of adversarial examples
Semantics is important for improving effectiveness [meaning that instead of just pushing a binary stream into the deep learner, carefully crafting features is going to increase the efficacy of the algorithm]
Please tell me which of these three are non obvious? I don’t know that we can set the bar any lower for security data science.
I want to specifically highlight the last point. You might argue that’s the one statement that’s not obvious. The authors basically found that, instead of feeding simple byte sequences into a classifier, there is a lift in precision if you feed additional, higher-level features. Anyone who has looked at byte code before or knows a little about assembly should know that you can achieve the same program flow in many ways. We must stop comparing security problems to image or speech recognition. Binary files, executables, are not independent sequences of bytes. There is program flow, different ‘segments’, dynamic changes, etc.
We should look to other disciplines (like image recognition) for inspiration, but we need different approaches in security. Get inspiration from other fields, but understand the nuances and differences in cyber security. We need to add security experts to our data science teams!
Over the weekend I was catching up on some reading and came about the “Deep Learning and Security Workshop (DLS 2019)“. With great interest I browsed through the agenda and read some of the papers / talks, just to find myself quite disappointed.
It seems like not much has changed since I launched this blog. In 2005, I found myself constantly disappointed with security articles and decided to outline my frustrations on this blog. That was the very initial focus of this blog. Over time it morphed more into a platform to talk about security visualization and then artificial intelligence. Today I am coming back to some of the early work of providing, hopefully constructive, feedback to some of the work out there.
The researcher paper I am looking at is about building a deep learning based malware classifier. I won’t comment on the fact that every AV company has been doing this for awhile (but learned from their early mistakes of not engineering ‘intelligent’ features). I also won’t discuss the machine learning architecture that is introduced. What I will argue is the approach that was taken and the conclusions that were drawn:
The paper uses a data set that has no ground truth. Which, in network security is very normal. But it needs to be taken into account. Any conclusion that is made is only relative to the traffic that the algorithm was tested, at the time of testing and under the used configuration (IDS signatures). The paper doesn’t discuss adoption or changes over time. It’s a bias that needs to be clearly taken into account.
The paper uses a supervised approach leveraging a deep learner. One of the consequences is that this system will have a hard time detecting zero days. It will have to be retrained regularly. Interestingly enough, we are in the same world as the anti virus industry when they do binary classification.
Next issue. How do we know what the system actually captures and what it does not?
This is where my recent rants on ‘measuring the efficacy‘ of ML algorithms comes into play. How do you measure the false negative rates of your algorithms in a real-world setting? And even worse, how do you guarantee those rates in the future?
If we don’t know what the system can detect (true positives), how can we make any comparative statements between algorithms? We can make a statement about this very setup and this very data set that was used, but again, we’d have to quantify the biases better.
In contrast to the supervised approach, the domain expert approach has a non-zero chance of finding future zero days due to the characterization of bad ‘behavior’. That isn’t discussed in the paper, but is a crucial fact.
The paper claims a 97% detection rate with a false positive rate of less than 1% for the domain expert approach. But that’s with domain expert “Joe”. What about if I wrote the domain knowledge? Wouldn’t that completely skew the system? You have to somehow characterize the domain knowledge. Or quantify its accuracy. How would you do that?
Especially the last two points make the paper almost irrelevant. The fact that this wasn’t validated in a larger, real-world environment is another fallacy I keep seeing in research papers. Who says this environment was representative of every environment? Overall, I think this research is dangerous and is actually portraying wrong information. We cannot make a statement that deep learning is better than domain knowledge. The numbers for detection rates are dangerous and biased, but the bias isn’t discussed in the paper.
Before even diving into the topic of Causality Research, I need to clarify my use of the term #AI. I am getting sloppy in my definitions and am using AI like everyone else is using it, as a synonym for analytics. In the following, I’ll even use it as a synonym for supervised machine learning. Excuse my sloppiness …
Causality Research is a topic that has emerged from the shortcomings of supervised machine learning (SML) approaches. You train an algorithm with training data and it learns certain properties of that data to make decisions. For some problems that works really well and we don’t even care about what exactly the algorithm has learned. But in certain cases, we really would like to know what the system just learned. Your self-driving car, for example. Wouldn’t it be nice if we actually knew how the car makes decisions? Not just for our own peace of mind, but also to enable verifyability and testing.
Here are some thoughts about what is happening in the area of causality for AI:
This topic is drawing attention because people are having their blinders on when defining what AI is. AI is more than supervised machine learning, and a number of the algorithms in the field, like belief networks, are beautifully explainable.
We need to get away from using specific algorithms as the focal point of our approaches. We need to look at the problem itself and determine what the right solution to the problem is. Some of the very old methods like belief networks (I sound like a broken record) are fabulous and have deep explainability. In the grand scheme of things, only few problems require supervised machine learning.
We are finding ourselves in a world where some people believe that data can explain everything. It cannot. History is not a predictor of the future. Even in experimental physics, we are getting to our limits and have to start understanding the fundamentals to get to explainability. We need to build systems that help experts encode their knowledge and augments human cognition by automating tasks that machines are good at.
The recent Cylance faux pas is a great example why supervised machine learning and AI can be really really dangerous. And it brings up a different topic that we need to start exploring more, which is how we measure the efficacy or precision of AI algorithms. How do we assess the things a given AI or machine learning approach misses and what are the things it classifies wrong? How does one compute these metrics for AI algorithms? How do we determine whether one algorithm is better than another. For example, the algorithm that drives your car. How do you know how good it is? Does a software update make it better? How much? That’s a huge problem in AI and ‘causality research’ might be able to help develop methods to quantify efficacy.
Join me for my talk about AI and ML in cyber security at BlackHat on Thursday the 9th of August in Las Vegas. I’ll be exploring the topics of artificial intelligence (AI) and machine learning (ML) to show some of the ‘dangerous’ mistakes that the industry (vendors and practitioners alike) are making in applying these concepts in security.
We don’t have artificial intelligence (yet). Machine learning is not the answer to your security problems. And downloading the ‘random’ analytic library to identify security anomalies is going to do you more harm than it helps.
We will explore these accusations and walk away with the following learnings from the talk:
I am exploring these items throughout three sections in my talk: 1) A very quick set of definitions for machine learning, artificial intelligence, and data mining with a few examples of where ML has worked really well in cyber security. Check cybersecuritycourses.com here for an overview of the best cyber security courses available. 2) A closer and more technical view on why algorithms are dangerous. Why it is not a solution to download a library from the Internet to find security anomalies in your data. 3) An example scenario where we talk through supervised and unsupervised machine learning for network traffic analysis to show the difficulties with those approaches and finally explore a concept called belief networks that bear a lot of promise to enhance our detection capabilities in security by leveraging export knowledge more closely. And if you plan to test the the vulnerability of your network, make use of Wifi Pineapple testing tool.
I keep mentioning that algorithms are dangerous. Dangerous in the sense that they might give you a false sense of security or in the worst case even decrease your security quite significantly. Here are some questions you can use to self-assess whether you are ready and ‘qualified’ to use data science or ‘advanced’ algorithms like machine learning or clustering to find anomalies in your data:
Do you know what the difference is between supervised and unsupervised machine learning?
Can you describe what a distance function is?
In data science we often look at two types of data: categorical and numerical. What are port numbers? What are user names? And what are IP sequence numbers?
In your data set you see traffic from port 0. Can you explain that?
You see traffic from port 80. What’s a likely explanation of that? Bonus points if you can come up with two answers.
How do you go about selecting a clustering algorithm?
What’s the explainability problem in deep learning?
How do you acquire labeled network data sets (netflows or pcaps)?
Name three data cleanliness problems that you need to account for before running any algorithms?
When running k-means, do you have to normalize your numerical inputs?
Does k-means support categorical features?
What is the difference between a feature, data field, and a log record?
If you can’t answer the above questions, you might want to rethink your data science aspirations and come to my talk on Thursday to hopefully walk away with answers to the above questions.
Late June, my alma mater organized an event in Brooklyn with the title: “ETH Meets New York”. The topic of the evening was “Security Technologies Enabling the Future: From Blockchain to IoT”. I was one of the speakers talking about “AI in Practice – What We Learned in Cyber Security”. The video of the talk is available online. It’s a short 10 minutes where I discuss some of the problems with AI in cyber, and outline how expert knowledge is more important than algorithms when it comes to detecting malicious actors in our systems.
Spark your interest? Don’t miss my talk at BlackHat next month where we will have an hour to explore the topics of analytics, machine learning, and artificial intelligence in cyber. I recorded a brief teaser video to help you understand what I will be covering.
It's too easy to implement "AI" in your security product – algorithms are openly available. But the more important thing is a process of training, cleaning data sets, etc. And dealing with biases is super challenging – @raffaelmarty#TheSAS2018pic.twitter.com/eii5W0vrWr
The post basically says that VR helps the SOC be less of an expensive room you have to operate by letting a company take the SOC virtual. Okay. I am buying that argument to some degree. It’s still different to be in the same room with your team, but okay.
Secondly, the article says that it helps tier-1 analysts look at context (I am paraphrasing). So in essence, they are saying that VR helps expand the number of pixels available. Just give me another screen and I am fine. Just having VR doesn’t mean we have the data to drive all of this. If we had it, it would be tremendously useful to show that contextual information in the existing interfaces. We don’t need VR for that. So overall, a non-argument.
There is an entire paragraph of non-sense in the post. VR (over traditional visualization) won’t help monitoring more sources. It won’t help with the analysis of endpoints. etc. Oh boy and “.. greater context and consumable intelligence for the C-suite.” For real? That’s just baloney!
Before we embark on VR, we need to get better at visualizing security data and probably some more advanced cyber security training for employees. Then, at some point, we can see if we want to map that data into three dimensions and whether that will actually help us being more efficient. VR isn’t the silver bullet, just like artificial intelligence (AI) isn’t either.
This is a gem within the article; a contradiction in itself: “More dashboards and more displays are not the answer. But a VR solution can help effectively identify potential threats and vulnerabilities as they emerge for oversight by the blue (defensive) team.” – What is VR other than visualization? If you can show it in three dimensions within some google, can’t you show it in two dimensions on a flat screen?