683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller

This is episode number 683 with Dr. Mattar, Haller, VP of Data and AI at Active Fence. Today's episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.AI, the company bringing humanity into AI. Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, John Crone. Thanks for joining me today. And now, let's make the complex simple. Welcome back to the Super Data Science Podcast today. I'm joined by the wildly intelligent data scientist and communicator, Mattar, Haller. Mattar is the vice president of Data and AI at Active Fence and is really firm that has raised over $100 million in venture capital to protect online platforms and their users from malicious behavior and malicious content. She's renowned for her top-rated presentations at leading global data science conferences. She previously worked as director of Algorithmic AI at Spark Beyond an analytics platform. She holds a PhD in neuroscience from UC Berkeley and prior to data science, she taught soldiers how to operate tanks. Today's episode has some technical moments that will resonate particularly well with hands-on data science practitioners, but for the most part, the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge AI applications. In this episode, Mattar details the database of evil that Active Fence has amassed for identifying malicious content. How contextual AI considers adjacent and potentially multimodal information when classifying data, how to continuously adapt AI systems to real-world adversarial actors, the machine learning model deployment stack she uses, the data she collected directly from human brains using recording electrodes, and how this research relates to the brain computer interfaces of the future, and why being a preschool teacher is a more intense job than the military. All right, you ready for this captivating episode? Let's go. Mattar, welcome to the Super Data Science podcast. It's awesome to have you on the show, where are you calling in from? I'm calling in from Israel, sunny sunny Israel. So thanks for having me. Sunny sunny Israel, is that always true? It's very sunny Israel. Most of the time, it's pretty sunny. We have two seasons. One is really long, and it's really, really hot, and the other one is shorter and beautiful and not as hot, but still, we have a lot of sun. And that's not... We have Fahrenheit beaches. We have tropic areas that are more green and nice and forest, wildflowers, mountains, not all camels and deserts, although we have that too. Cool. Well, I guess it isn't cool, but I... It sounds hot, but I will have to visit there sometime. I actually... I have a grandmother who recently visited and said that it was her favorite place she's ever been. Oh, wow. Nice. So come visit. I'll introduce you to my my chickens. There you go. This episode brought to you by the Israel Tourism Board. And but you do travel a lot as well. So you were recently in New York. You were at MLConf, the machine learning conference in New York, which I was unable to make it to this year, but you were a speaker at MLConf and Deborah Williams, who's a friend of mine and the acquisitions editor Pearson that I've worked with for the books that I've created, all the video content I've created, she wrote me a long email summarizing how MLConf had gone and she said that by far the best speaker hands down, not just her opinion, but the opinion of quote, everyone that she spoke to was that you, Matar, were by far the best speaker at MLConf. So I was like, well, get around the show. So that's very, very flattering and now like take your expectations and lower them. Thank you. Very, very flattering. Thank you. That was a fun that was a fun conference. There's lots of interesting ideas and good talks. So it was a if she said that it's there was a high bar. So thank you. And so let's dig into what you do. So you are the VP of data and artificial intelligence at active fence, which is a platform for content moderation, harmful content detection and threat intelligence. And so to be clear, active fence is not a company that is doing the content moderating. It's not like there's this army of people at active fence that are monitoring for harmful content. But you provide tools, data in AI enhanced tools that allow your customers to be able to do that content moderation themselves more efficiently. And this seems to be quite a good niche. I could see on crunch base that active fence has over $100 million in funding. So yeah, it seems like a very valuable niche to be filling for your customers. So tell us a bit about what this means. How do you use AI to be moderated content? How is that useful for threat intelligence? That kind of thing. Sure. So, active fence, you're right, we are a platform that basically our clients are any company that has user generated content. So whether it's, you know, comments or chats or uploading videos or audio or any any place that you have a user that's able to upload content, there's a potential for misuse of that and for uploading malicious content. And our goal or mission is basically to help platforms ensure that their users are safe, that they have safe online interactions. And so we do, we provide the tools stuff to help them, to help them do that. And really one of the, this is one of the biggest challenges that face UGC's or platforms with user generated content is basically how can they detect this malicious behavior? Especially since as we know, items can be in any format, right? So we need to be able to detect whether it's video, audio, text, images, all of that. And we, and also can be in any language and it can also be any number of violations, right? So you have sort of these, these big ones that you know, or you say absolutely not like I do not want child pornography, I do not want terror, I do not like one supremacy. But but there's, there's like many, many, many, many more. And different, different companies, different platforms have different levels of sensitivity to it, right? Even something that you can say as blame is like, I do not want child pornography, no one wants child pornography on their platform. But let's define it, right? What does that mean? Is, you know, babies first bath? Is that, is that something that we need to be aware of? And so the tools that we provide need to be sort of contextually aware of, you know, the policy, the way, the way that things are being used or presented. And so for me, and for all, you know, my, my, my teens, it's a super, super interesting space to be in, because not only are the algorithms that we use really exciting and sort of interesting, but I think the application, right, we're not, we're not selling air like we're actually making it like impact, like making a real impact on like human interactions in a positive way. Right. So to what extent can you tell us about those exciting algorithms? So that's, that's excellent question. Thank you for asking. So I think that, so there's, there's many different levels of things that we can do. So the first thing is that we sort of, we have our, a platform, right. And this, this is a platform that basically enables it enables users to, to, or like moderators to come in to view the content to look at sort of where it is and then to make a decision whether or not something should be removed or not. And this is a platform that we provide to our users. In order to basically ensure that we'll be able to protect the well-being of the moderators and to make sure they're only seeing things they actually need to be seen in order to be more efficient. They, there's absolutely no need to review everything. There's like a, most of the things are benign and even within the things that are harmful, there isn't really any need to view anything. Then in that case, basically, you want to make sure that you have sort of some sort of automated content moderation on top. And that's where sort of we come in. Yes. I guess that, that ends up being important for the mental health of the people who are doing the content moderation as well. Because I've read how people in those roles, they can be quite a harrowing experience when you're just watching the headings and child porn all day. Absolutely. Absolutely. Like moderator well-being is a huge, huge, huge issue. It's in the news, like periodically it comes up as like this, this, this huge thing. And there, and activists is like very, very concerned about this, right? We, we deal with data that is not pleasant, right? And so in the same way that I actively work to protect my data scientists and my engineers from exposure to this and only when it's really needed. And with a lot of safeguards, we want to make sure that, you know, we're all human and want to make sure that that moderators are also protected in the same way. And so if there's things that are sort of blatant, you know, a beheading, why do they need to watch that? There really isn't a need, right? There's things that are clearly obviously violations are clearly obviously malicious and should just and should just be removed and banned. And so the algorithms that we use, basically we use what we, what we call contextual AI. What this means is that we look at sort of the item in the context that it is being used. But also within the item, right, we have our data model basically enables us to take sort of an item, even if it's just an image and start breaking it apart into the components that it has, so that then we can build those together into a coherent risk core where this risk core can take into account, you know, like, do we see any like weapons? Do we see any known logos? Do we see any known people of interest that we know have, you know, from their history, whatever we know that they're, you know, spewing hate speech or misinformation and so forth. And then all those components together can combine to basically say, yes, this item is very probably, very probable to be risky. And so that's sort of how we build the full picture. And then of course, there's other layers of that, right, even for example, for chats, right? You can say, well, I can just use keywords, right? Like, if I find the N word, then clearly this is very violative. But what if it's someone saying, please don't call me that or what if it's a rap song or what if it's, you know, someone like, like, you know, like, community sort of reowning a word. And so like, you know, I'm proud to be whatever some some slur. And so in those cases, clearly, I don't want to ban that. And if I'm just doing keywords, which are sort of contextually unaware, then I lose that ability. And so in those cases, we do need to use sort of language models that are more contextually aware. And these language models need to be trained and tuned on these specific cases, because these are the cases that are always interesting. That does sound really interesting. And it sounds like the kind of thing that in this brave new world that we have of these really powerful large language models, that this is the kind of thing that they could do really well, that a few years ago, it might have been a lot tougher. And so it's it's great that you're presumably able to leverage these kinds of new technologies, especially these kind of multimodal technologies that are emerging. So I don't think it's available to the public yet, but GPT-4 has this image component where you can have an image, you can provide an image a photo of your fridge and ask GPT-4, what can I cook based on the ingredients that you see in this image. And so that that kind of multimodality, it sounds like it's something that you've been working with for a while. Yeah, we've been looking a lot of multimodality, because for, you know, if I'm going back to to the child pornography example, because that's for people something that's like so obvious, right? Like you should be able to know whether something is child porn or not, like we all sort of viscerally know what is bad. And yet, sometimes you'll see, you know, a picture of a child and it looks fine, but you know, it's sort of like only the people that are in the know will know that, you know, whether like that's a face of a victim that's known or in the comments, there's links to off platform sites or something about the angle or a logo that's just like the picture itself is benign, but there's a logo that's associated with the studio that's been associated with child porn or the title or the description. And so sometimes sometimes it's enough to look at the image, sometimes it's enough to look, it's enough to look at the surroundings, but oftentimes it's the combination. And I mean, in terms of like these generative AI, right now is sort of this sort of like perfect storm for trust and safety, right? Because we're going to be having sort of US elections soon, and so political disinformation is something that's like very, very current and is now sort of having these large language models sort of lowers the bar for the entry of bad actors, right? Suddenly, if it used to be that things could either be like really high quality, but low scale or low quality and easy to catch in high scale, now that's not even that's not an issue, right? And so it's like an enabling technology, you know, it's obviously I like, I don't like no fear mongering here, I think it's like has a lot of good that it can do, but we need to be aware of how it can be used and how we can kind of be prepared for it. This episode is brought to you by Posit, the open source data science company. Posit makes the best tools for data scientists who love open source period, no matter which language they prefer. Posit's popular RStudio IDE and enterprise products like Posit Workbench, Connect and Package Manager, these all help individuals teams and organizations scale R and Python development easily and securely. Produce higher quality analysis faster with great data science tools. Visit Posit.co. That's POSIT.co to learn more. Yeah, no question. So we, you know, famously in the United States, in the 2008 election cycle, there was a lot of there were foreign actors involved in creating disinformation in Eastern European kind of farms and you can imagine exactly like you're saying, these kinds of tools like GPT-4 make it a lot easier to create a lot more content because you don't need to have a human typing out everything. So much more cheaply, it probably like several orders of magnitude less expensive to be generating malicious content misleading content disinformation. So yeah, it is interesting that heading into, yeah, I mean, I guess we're always heading into a election cycle somewhere. And so it's like, yeah, it's crazy in the US to me that people in the in the lower house that they they have a two-year election cycle. And so like you spend a few months litigating that it's back to fundraising. Exactly. It's wild. Yeah, but I think it's it wasn't just that like disinformation is only one aspect of it, right? We're seeing like computer generate or you know a generator. I generated a child pornography. And then at this point, the question is, is is it still violative? And I think yes, right? Like we don't want that stuff out there. I don't care whether something is real or fake. It's it's still trying the child porn and it should be right. It should be banned. And then and then there's a second level of like, well, unless I'm trying to find who the victim is and then I do care. And then there's like another level of detection that needs to be built on top of that. Is it tricky? I mean, it must be tricky. Something that must add an extra level of complexity to this is that presumably the nefarious actors out there are constantly shifting and trying to evade detection by you. So in with when many of our listeners and myself, when we're building machine learning models, we don't have to worry about somebody trying to outwit the model. You know, like you can build a machine learning classifier to detect images of cats and dogs. And it's not like the cats are like trying to look like dogs and are going to come up with ways of like dressing to look more like dogs. Totally. So yeah, I mean, I think I may think that in terms of like how how is what I do different from car detection, right? Like that sort of, you know, or cat detection or anything. Or yeah, anyway, there's a million examples. And so I think there's in addition to the fact that it's invasive and adversarial, right? So there's, you know, examples of like QAnon, which is a group that's bands on some platforms will, you know, change their text to be cute like CUE. And then you have to basically catch it by knowing to look for that. And to know to look for that, that's already subject matter expertise. And that's one thing that activists we have. You know, you mentioned threat intelligence. And so we have intelligence analysts that this is what they do, right? They're experts in, you know, misinformation or in hate speech or in terror. And they research these groups. They know about sort of the latest hashtags trends, like what emojis they're using now. And so forth. And then basically that is able to, you know, they're able to sort of surface data that's relevant. You know, for example, you know, the latest, there's, you know, a hate group that was founded in like this last June or this last October. And already, you know, on, on different social networks with their logo spewing hate. And so to catch those, to know those, to put, and then those can feed back into our algorithms. And I can know to look for those logos, to look for those phrases, to look for those actors. And so then I'm able to sort of stay on top of it on the fact that yes, it's adversarial. I have the subject matter expertise. It's extremely non-stationary, right? I have new actors coming up all the time, right? If I'm looking for cats and dogs, like how much of they really changed? They're not going to change, right? They're going to have four likes, a tail in years. Versus here, the landscape is very non, because it's adversarial, it's a criminal on stationary. And so that's why I need to have my subject matter experts that are constantly feeding me more information of like, oh, this is a new slang term. Oh, this is a new slur. And so far. Is there a flywheel that gives active fence a defensible mode between this content moderation and the threat intelligence? So I'm just kind of, I just kind of have this brainwave here. And you can correct me if I'm thinking about this incorrectly. But so you have the content moderation aspect of the platform. So machine learning models are detecting, hey, you know, we think that there's, this is high risk content over here automatically. And then maybe that automation can assist the threat intelligence part of the company. And then the threat intelligence people in turn are keeping tabs on what's going on through a combination of more manual intelligence work, as well as this automated, automatically assisted intelligence work that your content moderation side is helping with. And then they can feed back into the content moderation, like, hey, like here's something else you need to be able to look for. We need to train a machine learning model to be able to detect this kind of thing. Because yeah, like, you know, there's this new logo that you need to be looking out for. Totally. You nailed it. Yeah. We love flywheels at active fence. We always say like no strategy, no strategy deck is complete without a flywheel. And so, and so absolutely. It's exactly as you described. So we have our intelligence analyst that are, you know, finding finding things, right? Those feet into the algorithms, they grow algorithms better. And then we have incoming data collection, whether the data is like, we go out and proactively collect it, or we get data, you know, clients are sending us data, being like, hey, is this violated or not? And then that's basically can then be fed back into the intelligence analyst. So for things that come from clients, sometimes it's things that they haven't seen before that they don't know. Often they do know it. But because we're also out there collecting data proactively, then that's basically able to feed back in. And one core component of this flywheel is something that is, we've, it's, it's, it's our proprietary database of sort of a valid of content. And what this means is that basically we have data that we've already identified, whether it's images or audio or videos or text, so we've already identified as being valid of a policy or malicious. We can hash that. And then new content that comes in can be compared to that, right? And then that also helps us first of all to be more efficient. But also, we can proactively enlarge, we don't need to wait for data to come in, right? We can proactively enlarge this proprietary database by going out there, going to sources that we know are problematic. And that that's what our intelligence analysts can help us with. And then next time that it comes in, basically, we've, we've already seen it before. And so there's definitely this sort of interaction, this flywheel between the intelligence analysts, the humans and the AI, one sort of feeding off the other. Yes. So this is the database of evil that you've talked about publicly before, right? And so to give a bit of an analogy, this is kind of like how antivirus solutions have a database of known viruses. And then if that line of code that's known to be malicious has come across on your own hardware, that can be compared against this database. And you can say, okay, this is the threat. We need to remove this part of your file system. So similarly, you have this database of bad content, of harmful content. And yeah, it's proprietary. So, okay, I think that, yeah, you use more to say about that. So my, the more thing that I have to say about that is only that to bring us back to the idea that we're in an adversarial space. And if we keep this in mind, it's like we're in the space that's adversarial that requires subject matter expertise, it's, or it's adversarial on the basis of requires subject matter expertise. It's non-stationary, it's multi-dimensional. And so once we keep those, and it requires context, and once we keep those in mind, then it sort of helps us frame how we want to use this database of evil, but also this proprietary database, but also like what it needs to sort of be robust against, right? So if we're in a place that requires subject matter expertise, and sure that we can keep enlarging our data, like all of our databases, right, with our intelligence analysts. And it's a non-stationary, so we need to make sure that it's always updated, like having a snapshot of this database isn't enough, right? There's going to be new things. But also if we're in a place that's inherently adversarial, then we need to make sure that this database is also robust to adversarial manipulations. What does that mean? For example, if I have like a very hateful song, like glorifying the Holocaust, for example, like a love song glorifying the Holocaust, right? These things exist. Then, and I know that this is banned on platforms, then I can speed it up, right? And then in the comments, or in the title or somewhere, I can say, listen to this at half speed. And then now I've basically made it against all, you know, past all kinds of defenses. And so we need to make sure, and the same thing with images, right? I can like rotate it, I can grayscale it, I can mirror it, I can do all sorts of things. And so I need to make sure that my hashing, any hashing under them that I have is robust to these manipulations up to a point, right? Because I'm always going to, it's always this idea of like precision versus recall. Do I want to now unfairly capture things that shouldn't be captured, right? And unfairly say that they are a violated, probably not. And so it's a tricky line, but it's that's the line that is when any content moderation algorithm, we're always trying to figure out where things should, where the boundaries should go. And I guess that's key to why having a human in the loop in these kinds of decisions so that people can, if they are unfairly forced to remove content, there should be some kind of appeals process in the platform, or yeah, or a human reviewer that can make some final decisions. So I, I guess kind of going back to a point early in the conversation when something is really flagrant and your risk score, which I assume is kind of similar to in machine learning, having a binary classifier where you have a confidence on, yeah, whether how likely is this to be malicious content, harmful content. And so if that is very high, if you're like, okay, this is point 99999, we're just like, there's no point in sending this to a human to review. But if it's point 8 or point 7, then like there might be something here, somebody should review before a decision is made. And yeah, same thing on the flip side when it's, yeah, even, even things that are, yes, if something does get flagged automatically, because there's still everything is probabilistic in machine learning. So there's going to be cases where the algorithm is very confident, and there's still, and due to some circumstance, you're describing where like a group has re-owned something that has been a racial slur historically, there should be the opportunity for that person to say, no, like I have, like, I should be able to, yeah, so this is, it kind of works both ways. So, so our risk score is exactly as you describe, and like everything is probabilistic, and also it's a business use case where they want it, like how much manual review they want, right? Like maybe for a child's platform, like a platform for children, they say, you know, I just ban everything, like so what if the kids like can't chat about, you know, it's fine. But maybe for like other platforms, like a music platform, the kids can chat about racial slurs. Yeah, but fine, I'm okay with that, like I don't care for you. My kids can just like type three words, that's okay, and my kids are anyway, they're getting cell phones when they're 35, like until then deal with it. But, but yeah, but so it's exactly that, right? But other platforms would be like, you know what, even if it's at 99, like for the things that are 99, like we want to review them, because we would we would rather err on the side of like Reese Beach or whatever. And so I think that also in terms of appeals, that's super important point, that's something that is definitely critical in this, because this is like, you know, people are posting things and they don't want to be unfairly punished. And actually right now, I think that the world of trust and safety is having its GDPR moment, right? Like GDPR was, is for those of you that are not familiar, it was like privacy regulation passed in the EU, that ended up having huge sweeping effect, because it was basically any time that a, that a citizen of the EU is on an online platform, then GDPR is effect, like is effective on them in terms of like, you know, what cookies and what can be stored and so far, then you've probably all seen like, you know, the notifications on, on your browsers, about privacy regulations. And so now trust and safety is having its GDPR moment with the DSA. Which is the Digital Services Act. It's passed by the EU last year. And it basically also puts in protections for trust and safety. It codifies them by law. It's sort of in a similar way with fines and so forth. And while it's still new for smaller tech companies, big, like the very big online companies, they have like, they're already, it's like already being rolling in and they're rolling out and there's like very strict regulations on them, that they need to follow and it'll, it'll trickle down probably to everyone. And so like, regulation finds on the table. And so these businesses need these tools to be compliant. And part of that is also auditing and understanding, like part of the DSA is like auditing and understanding why things were banned and explaining it and so forth. And so that's another thing that we invest in is explainability, right? Like, if I'm giving a score, then I want to be able to explain why. Like, because a lot of times these things are, you need that subject matter expertise to understand that, oh, like this particular logo is actually associated with this particular tear group or hatred group or child pornography studio or whatever. Nice to see. So I can see how the evolving regulatory landscape ends up being important, probably helpful to you as you develop these algorithms. We've talked already about about harmful content kind of in static in, you know, in posted content. But there's also, there's something that we hear a lot in the news recently. We see increasingly in the news is not just content that's been posted, but content that is streaming real time. So there have been incidences in the US recently of shootings being livestreamed to social media platforms. And so this happening in real time, that must add an extra layer of complexity to some of the work that you're doing. Absolutely. There's been like horrific instances of live streaming in the US and elsewhere. And so there's a couple of ways to approach that. One of which is that we can put really, really small content moderation models sort of on the edge device, right? So that does sort of something basic to catch sort of the blatant stuff. And then, you know, raise it up for for human review. Because I think in these in these cases of live streams, it's tricky. We're still learning the space and we would want someone to just like if we flag it to take a look at it, maybe err on the side of flagging too much and then having someone take a look at it. Again, it's always it's a business question of like what's the platform, what are we looking for? And so, you know, have some sort of like detector of, I don't know, gunshots or if something that's, you know, a small model edge device able to flag it right away. We can also do something where we are, you know, once the content makes it to the servers, sample frame, like there's a question like, what do you want to moderate? Like every single frame, do you want to sample every minute, every second? Like at what there's, there's a huge question, I think what makes this so challenging is just the scale, right? You have like so much data streaming in. And then do the same thing, sample and then look for maybe more complex things, right? So those are sort of like the typical things and that's where you're really focusing on the content itself, right? That's coming in. But a lot of these times when you're when you're live streaming, something like this, then, you know, the perpetrator may have like, you know, pre-shared this somewhere. There's people that are, you know, joining the stream, that are commenting. And now it's, now suddenly you have a much, much richer source of information, right? You can look at who are the other users? Who is the user that's streaming? What else have they stream in the past? What other groups are they been in? What are people writing in the comments? And so forth. And suddenly now you might be able to catch it or at least flag it just from the surrounding information, right? Like there's, there's enough indicators of risk from the things that are around it. We're sure you want to moderate the content. You want to look at it and it's not worth, however, you would want to basically look at the other markets of risk around the content itself to make your job easier and faster and more efficient. Did you know that Anaconda is the world's most popular platform for developing and deploying secure Python solutions faster? Anaconda solutions enable practitioners and institutions around the world to securely harness the power of open source. And their cloud platform is a place where you can learn and share within the Python community. Master your Python skills with on-demand courses, cloud-hosted notebooks, webinars, and so much more. See why over 35 million users trust Anaconda by heading to superdatasigns.com slash Anaconda. You'll find the page pre-populated with our special code SDS so you'll get your first 30 days free. Yep. That's 30 days of free Python training at superdatasigns.com slash Anaconda. Gotcha. So yeah, so it obviously is more complex to be moderating harmful content when you're thinking about it in a real-time situation, but as you point out, having smaller threat detection models on the edge of ISO on mobile phones, maybe on laptops, being able to detect these issues in real-time and potentially flag those to the social media platform. And then also once the real-time data is reaching the servers of these platforms, you can be sampling at some appropriate interval in order to be trying to detect whether harmful content so get the sound of gunshots and then so it can be reviewed as or whether this is like video game gunshots or not and then something. So probably in a circumstance like that, whether it's real-world gunshots or video game gunshots, we're going to be able to tell more easily because of the contextual information that surrounds that. So the kind of text that people post in response is probably going to be quite different and classifiably different in a video game where people, there might be more like, well, I don't want to even speculate. Yeah, but I would say that if we're thinking about it and like I'm kind of thinking out loud and like refining a bit what I said earlier is that so you have something on the edge of ISO that does like, you know, bait more basics like a smaller model that does bait more basic content moderation and then instead of it flagging human, like remember that everything is making it to the cloud and so we're sampling and so things that have been flagged by the edge of ISO can then either have like more like different like more tightly spaced samples or can have more deeper analysis on it. Like it's basically a funnel, right? And again, it depends what you discover on that device. Like maybe you might want to write away flagged tools and be like listen, like there's this is not something that is very likely to be in the gray zone. And then also you can look at the surrounding content. You don't need to wait for the content to be uploaded to serve anything, right? Like you have this surrounding content. It's like text and who the user is and like where, you know, do we recognize this user? Where else have they posted before? Where the users that are commenting, like where else are they? And this is where actually like a graphical data model comes in handy, right? Because now you have all these relations between users and you can see like what if they liked before, where groups are they in, who have they interacted with and so forth. And then if these are people that are known to us, then we can say well actually like this is a user that if we see them here, this is, it adds to the probability of risk. All right. So Matar, you've given us a interesting overview of how content moderation works in your automated platform for detecting harmful content. So things like contextual AI, needing to be able to adapt to adversarial opponents, the flywheel between content moderation and threat intelligence that's helpful to you, the database of evil and how with this flexibility and the way that information is hashed in there, so that you can be detecting new variations that are adjacent to existing known harmful content. And then most recently we just talked about the specific circumstances of real time streaming and how we can be addressing a harmful content in those circumstances. So very interesting. And so I'm curious to what extent you can tell us about the kinds of technologies that you use to make the platform happen. So, you know, what kinds of programming languages, obviously you're not you can't get into a love-loved detail that would allow adversarial actors to be more effective than adversarial. I'm just your actor assistant now. Yeah. And again, I'm giving it with the caveat that like the parts that I deal with are sort of like the data, the MLObs that engineering, the API, the world of frontend is beautiful and mysterious to me. So I can like list technologies that they use there, but they don't mean that much to me. I'm more I've always been sort of like a back end geek. And so in terms of what we do, obviously data people do Python stuff. We also use node and text script. We serve our models on Kubernetes. We have we've done a lot of work in house of selecting of like writing stuff, basically select the correct instance type for a given model. So that you get really good the best utilization. We've also are working on like model inverses model out being like do we bake the model into the image or does the or do we like or when we spin up the pod like do we bring them all from outside basically in order to maximize our or minimize our uptime because we basically need to be able to deal with really, really high throughput and low latency. And so and also we don't want to be just like burning many on machines that are up for no reason. And so we have HPA that we've been tuned and then we can basically spin up and spin down our machines as we need. And then be smart about which machines we're spinning up. And also if we're able to sometimes sort of put multiple models on the machine or batch the requests to the machine. And we do all sorts of optimizations to make sure that we're high throughput low SLA. Nice. Very interesting. Thank you for being able to go into even that level of detail. So clearly you have a really deep understanding of not just data science and modeling, but of back end engineering. So like scaling being able to meet SLA's Kubernetes. So super interesting. I didn't know from my research beforehand that you had that kind of expertise as well. So let's dig into your background a little bit to see how this all came about. So you did a neuroscience PhD at UC Berkeley, which is I think a great decision. I also did a neuroscience PhD. So for me, that was something that I got into it because I was fascinated as to how chemicals, biology, physics create a conscious experience. And so like everything that you think, everything that you do in some way that we obviously are nowhere near fully elucidating can be reduced down to physical processes. And so I wanted to dig into that as much as I could. But then as I got started in the PhD, I was like, wow, data set sizes are getting really big. It seems like there's really interesting things that we could be doing there, detecting patterns in data, identifying calls of direction in data. And so I went down this road of focusing on programming and machine learning because I knew that whether I stayed in academia or not, those would be transferable skills. And I'm not surprised I guess that that ended up being true. However, so in your PhD, I know that you were recording activity from surgically implanted electrodes in human brains. And I made this joke before we started recording about how I felt, you know, I really feel like I made the right choice as sticking to silicon experiments or analyzing data as opposed to doing things in learning how to implant electrodes into a ferret. And I was making this joke about how people in my cohort and my PhD were doing that kind of thing. But then the exact person that I was thinking of has a really nice job at Google Deep Mind. So it seems like you have insight into that. So I ended up getting here. Where are we going? Long run away, but we kind of have this. So tell us about your PhD, how that relates to what you're doing today. There's also, there's the insight data science fellowship program that you use to transition from your PhD from your academic background into industrial data science to be interesting to hear about that. And then to finally have it all make sense, as to how I started off this entire long transition is that I mentioned how you have it's you have a rich understanding of the back end of a software platform. And so just kind of how this all came about your your risk depth of knowledge in the field. Yes. The first question, let's start with the first one. So my PhD, so yes, my PhD, I was recording electrodes. We're recording data from electrodes, surgically implanted human brains. Basically, what I wanted was, you know, animal quality data from humans, right? With animal research, you can stick electrodes where you want, get really beautiful data. You could do in slices. You could do it, you know, from trained monkeys for years to do a task and then get just beautiful recordings of just like signals, flowers, and hours of neurons at work. And with humans, you're often limited to things that are either slow, so FMRI. And it's like, you know, you see it many, many, many, many, many, degrees of room. You're actually measuring blood flow. You're not even measuring like direct brain activity. So you're like measuring a side effect of thinking, or you can do EEG, which is then, you know, electrical signals filtered through the scalp. And even with technology like MEG and so forth, which is magnetoencephalography, it's it's not the same. You're not you're not at you're not on the brain. And then this really unique opportunity opens up in the laboratory of Dr. Robert Knight at Berkeley, which basically to work with patients that are undergoing brain surgery, often for epilepsy. So in epilepsy that can't be treated with medicine, right, they keep having recurrent seizures. And so the only solution is to go into surgically remove the problematic area of the brain. However, before that's done, you need to map out the brain to ensure that, you know, you stop the seizures, but the person is left, you know, a phasically they can't speak, or you stop seizures and suddenly they can't, you know, they're blind. And so what you want to do is you want to basically map out the areas of the cortex around the region of interest and ensure that first of all, you can localize exactly where the seizure is because remember, until you really get in there, everything is filtered through the scalp and you have like, it's really, you can't tell. And so to figure out like where the focal point is and also what's around it. And so these people basically, what happens is they come in for surgery, they have a craniotomy, the scalp is removed, the electrodes are implanted, and then they're bandaged up, and then they're in the hospital for a week with electrodes coming out of the brain hooked up to an ample, a preamp and antonium. And the best case scenario for the, and then their meds are tapered. And there's all kinds of things, many, many things are not to try to induce a seizure, because the best case scenario is basically on the first day, they have seizures, you localize it, you figure out exactly what's around it. And then like the next day, they're in surgery, they remove it and you're done. Oftentimes, it's not the case. You need to work to induce a seizure, sleep deprivation, strobe lights, all sorts of things to get them right. And so we get, most of the time these people are like sitting in the hospital, just kind of like waiting around, right? Watching TV or, you know, whatever. And then we come in and if they, if they consent, then we can come and we can run all sorts of tasks based, we, we ensure that the task is matched to the regions of the brain that are mapped right. There's no point in doing a memory task if, you know, the parts of the brain are only motor and so forth, or a motor task if they're only looking at language regions. And I found this job like very, very meaningful in terms of science outreach, like explain to them the value that they have, that they have for science and why I'm doing what I'm doing. And I spent a lot of time just, you know, talking to them about the brain and about my research and so forth. I also found it like emotionally, incredibly difficult because you're meeting these people pretty much at the worst time of their lives, right? Like this is just a terrible situation to be in. And so it's, it's challenging, it's rewarding, it's basically, you know, in a way that you don't expect your PhD to be, right? Like you go and you're like, give me data, I'm always data, great. And then there's, there's this. And so I would come in and I would, the task that I was interested in, I was basically interested in sort of tracking the path of a decision in the brain. So I would basically, in the beginning, I just had my one task, but then I realized that basically all tasks that we record are decision tasks. So I entered, I like dip my foot into like the world of, you know, big data where I could basically take all the tasks that were run. And anytime that there's sort of a decision to be made, then you have sort of like a motor decision, right? Like you see a stimulus, right? So that goes, or you hear something so that goes into like your visual cortex or through auditory cortex. And then it needs to make it to the decision making area of the brain, right? To the prefrontal cortex, which is where sort of moves forward in the brain, you need to make a decision. The decision is made and then it needs to go back to the motor cortex to execute the decision. And what I want to do is I, I basically tracked that, that loop in the brain and was basically able to look at the activity in the prefrontal cortex and basically say, aha, look, like the sustained activity that I see in the prefrontal cortex correlates with, you know, the reaction time, right? I'm able to sort of see when they'll trigger a reaction without looking at at the motor cortex. I can look, I can see errors. I can see that like the amplitude is also correlated with errors, basically tracking a throttle through the brain. And because I was on the brain, I had extremely fast recordings of what was going on. So it wasn't filtered through anything. The future of AI shouldn't be just of a productivity. An AI agent with a capacity to grow alongside you long term could become a companion that supports your emotional well-being. Peridot, an AI companion app developed by With Feeling AI, reimagines the way humans interact with AI using their proprietary large language models, Peridot AI agents, store your likes and dislikes in a long-term memory system, enabling them to recall important details about you and incorporate those details into dialogue without LLM's typical context window limitations. Explore with a future of human AI interactions could be like this very day by downloading the Peridot app by the Apple App Store or Google Play or by visiting Peridot.ai on the web. That is so fascinating. I really didn't just sit at a computer and learn programming languages and machine learning algorithms, which was interesting, but wow, I mean, yeah, you're really doing real valuable work. And so I think some of that work would go back to like, Wilder Penfield. Yeah, so that's when you're actually looking and you're, yeah, I mean, that's like the multiple, it's like the grandfather of everything that we did. The electrodes that I was using, and again, like things here move really, really fast. And my PhD was a while ago. I'm not dating myself, but it wasn't yesterday. And electrodes have gotten since then smaller. There's also a lot of single unit recordings where you can actually put in an electrode and record from, like, smaller populations, right? My electrodes were pretty big and kind of far apart. So I'm recording from larger populations, but yes, it all comes back to like classic neuroscience. And as I was doing it, so that's like the data collection, right? But then you collect the data and it's such a rich data set that like they can sustain you forever. And like the data sets that I collected are still being used for studies because it's like so it's like rare data, it's expensive data, it's rich data, you can look at it in many different ways. And so I took my data and data from other studies and as I was working on it, I said, you know, the brain is amazing, right? There's like, no, I don't think anyone in our field can argue about that. However, what I found myself like drawn to where the algorithms, the machine learning, the statistics, the signal processing, the programming languages, like all of the things that maybe you would say, oh, those are just the methods. I found that those were the parts of the papers that I was reading, those are like the algorithms, those were the parts that I was like most drawn to. And so then it was kind of like a natural transition from there to be like, okay, like this is this is what I this is what I'm actually more more interested in most of the time. So that that was how I found myself there. I did an like an NLP class towards the end event PhD kind of in secret. And then like the rest is history. Nice. Yeah. And then the the insight program. So we've had guests on the past that did this fellowship as well. So it's intended for I think primarily people who have already done an academic PhD have have a strong quantitative background like you did. You're doing tons of like you say machine learning kind of data science techniques. So things like time series analysis, dimensionality reduction, clustering regression, data permutation. So you had all this existing experience. And then so the the insight data science program was that a useful transition after you've done all that you've done the and the secret NLP course. And yeah, it was that still useful for making the transition industry. Oh, absolutely. Insight data science is wonderful. They what they do is basically they take people that you know we already have all of these skills and we've done we've done data science and you know we've done machine learning and we've done programming. But we're totally totally clueless about the real world because we're academics, right? We know nothing. And basically they help us sort of frame what we've done in the context of industry. So talking about startups and funding and jobs and like what is it like to work. And then also things just like best best practices, right? Like you know version control and things like that, which some people do in their PhD. Some people don't do it. Like some people's PhD is like 100% in like a Jupiter in like a single Jupiter number. And so basically like they kind of get you up to speed for like and like some like for gaps that you have there. But also just frame what you've done in the context of okay, this is industry. Like here is why what you have is valuable. Here is how you can use the things that you've done in industry. So they show you know people come in. They sort of like you know hey look here's how here's like fraud detection at this company. You're like oh hey like I've done that. I've done it and sort of kind of tie it together for you. Also you know like salary negotiate like all of these like things. And then at the what what you end up doing is they say okay in your PhD you had like five six seven years to work on something. Now you have three weeks to put together a project that actually brings concrete impact that like can bring be impactful and that you can like pitch it. And then that kind of puts you in the mindset of like what is a POC. And that also helps employers get around the bias of like God I don't want to hire an academically. I'm not gonna get anything done. Right right right. And I think there's there's a lot of relationships between insight and future employers. Lots of employers are looking for great data science talent like the people that are in in take in taken into the inside data science program. And so yes you end up with this flywheel. So and I just I want to quickly go back before we transition away from what you were doing with your PhD and how you got to what you're doing today. I just I want to talk a little bit more about so I mentioned while they're penfield and how he was so he was I guess the first person to be able to map the human cortex to the level of detail. And it was I think it was the same situation. It was it was many decades ago like the 1950s or something. But same kind of thing stroke patients open skull. And so recording from these individual electrodes over the whole brain and so that gave this map of the whole so there's this somatosensory homunculus as motor homunculus. So it's this like really cool. I encourage our listeners and I'll try to remember to include in the show notes a link to some images of this homunculus that it shows you so I think homunculus is Latin for like little man. And so it's the idea of this when you as you go over the motor cortex or the sensory cortex in the brain there's this map of of your body. And it isn't anatomically correct in terms of scale. So for example for both this the sensory as well as the motor homunculus the hands are huge because because you have so much you have so much detailed sensory perception as well as motor perception in your hands. And I remember like the lips are huge. But you can do it back is small. Yeah exactly. I think we had the same textbook. So one thing you could do is you could take a you could take a paper clip right and you can like put it together so like the it's points or like had some distance apart and you could find like what is the closest distance on your back basically that you that you can tell them apart because at some point like you just you're not sensitive enough on your back to tell them apart and then you put on your lips and immediately you're like oh wow these are super far apart and that basically is reflective of the fact that you have less representation less sensory representation of your back than of your lips in your in your sensory strip. Cool. So yeah so I wanted to recap on that. Oh and then yeah so and there's a very specific reason why I brought this back up. Not just because it's really interesting which I think it is in and of itself. But you were talking about how recording electrodes. So I was thinking about how the recording electrodes that while the Penfield would have been working with many many decades ago. Not many centuries. Many decades ago. That was just a few years ago. This podcast will be listened to for millennia. It'll be confusing. So the recording electrodes would have been much bigger that he was working with and you were working with and you were talking about how in recent years they've become even smaller. And so then Nakami thinking about how there is this push with companies like Elon Musk's neural link to have brain computer interfaces eventually that aren't just for people who have serious issues and have their skull opened. But there's this move towards in our lifetime potentially having some way of having recording electrodes on our brains without needing to have invasive surgery. And so I don't know if you have any thoughts on that. And I also I'm going to be asking you after this episode as to whether you haven't been known. You may as well guess that you dig deep into that topic. Yeah, I have some ideas. So actually even today not all brain computer interface is super invasive. I mean it's invasive in the sense that like you have an electrode in the brain like that's invasive. But not for all of them do you need like a creamyonomy that opens everything. So for example, you have for Parkinson's patients deep brain stimulation right where they're they basically behold they put in an electrode very very specific to the substantiate nigra which is a place in the brainstem where the block substance. Yeah, there you go. So I want to remember some of your anatomy where we're basically you know produces dopamine and then when cells start to die then you need to basically stimulate it in order to get to get around the fact that you're not it's it's not functioning. So that that's again that's just like an electrode that's that's that's brought in and that's used to stimulate. And actually there's like how do you decide how much to stimulate or whatever there's like a like a device that you can like calibrate and then decide how much to to stimulate. And so that's you know that doesn't require massive cranium. I mean that's already like a feedback loop that you have that's been it's been around for a really long time. You also have things that are more invasive but long term. So the patients that I was discussing they basically have it have the electrodes in temporarily right there they have the cranium that electrodes are put in wires coming out of the head once they have seizures and everything's localized in the best case they remove it skulls back in electrodes gone. However, there's also different companies for example neuropace which they actually permanently implant an electrode strip in the in the person's brain and then they're able to basically record ongoing and the idea is that they're able to predict or give some sort of like lead way before a seizure happens and then stimulate the stimulate to stop the seizure. And they and they record them that's uploaded to to their servers. And so and that's already that's also already there that's like there's patients walking around at the right now. And so I think you know on masks is like the next step of that where it's like okay, it's not clinical and let's say we can get it smaller and smaller and if we think about like more lawn waste about things that are like you know more and more fitting and getting smaller and smaller. I think I think what will be there will be there shortly and then it's a matter of like what like how do you make it mentally invasive how do you make it so that because at some point like in terms of like how long can it be in before cells start to die before the body starts to reject it and basically there's a difference between just recording and stimulating and versus stimulating and what does it mean to like to stimulate and at what frequency and I think there's some really really interesting questions because already we see so here's another tangent we see that brains oscillate right so we have like oscillations in the brain basically where there's different frequency bands that are associated with like different processes right so like alpha which is between each 12 hertz that's often for like visual cortex but it's you have beta which is like 15 to 30 hertz and that's from motor movement like when you initiate bait when you initiate motor movement you have beta suppression so there's things that but what we do what we also see is that there's individual variability between these frequency bands right so like my beta is not your beta my alpha is not your alpha my beta is not your theta and so forth and so anytime that now we're going to go in and we're going to start stimulating you're going to say okay well I'm going to stimulate a particular frequency but what is that frequency right and how do I determine that frequency and how do I know what frequency is ideal for me versus for you to for whatever there's also now this is like again right now it's it's it's far away but if but we we there there are already like stimulation protocols that we see even completely non-invasive like there's things with TMS transcranomagnetic stimulation where people have been experimenting with that and there's also research about using stimulation for psychiatric disorders and so forth so it's it's a huge huge field and hopefully that wasn't too much of a tangent no not at all I obviously found it super fascinating and I I mean I think that any of these kinds of any of these kinds of discussions around how we can be using technologies to adapt our brains either to resolve some you know some some negative issue like you're saying from strokes to psychiatric issues all the way through to potentially having enhancements which is I know like you know some of these brain computer interface BCI technologies are designed to yet not just be for resolving issues but also to potentially augment human capabilities in ways that probably we can't predict yet so I don't know I think it's super super interesting and yes I will be following up with you to see if you have other recommendations for for people who can begin to kind of a BCI episode so you mentioned metar how your PhD was more intense than some other people's PhDs neuroscience PhDs certainly much more intense than orders of magnitude more intense than my PhD was in terms of like being really in the real world and dealing with dealing with patients and but that wasn't your only intense job that you had so my reading this correctly you were teaching children how to use tanks preschool tank instructor no wait it's two separate items um yeah so you were a preschool teacher you were also tank instructor so I'm curious as to whether those experiences helped you prepare for your career and in particular maybe this might seem tangential but it wouldn't surprise me of somehow this does tie into an answer which is that I know that you're passionate about expanding leadership opportunities for women in STEM careers including data science and so I wonder if we can somehow tie those two topics together yeah sure why not so so my military so in Israel there's a mandatory military service I actually did kind of like a strange route when normally when you're 18 you basically that's when you you start your military service I actually did my you went to Berkeley I did right yeah you went to Berkeley for an undergrad and then back to Israel to be a tank instructor and then back to Berkeley to do your PhD yes correct true story and then now you're back in Israel again and now why can't I make a decision I guess and so I did it with backwards and I actually decided that for my military service not like there's there's not a ton of like flexibility in what you do but like there's some and I decided that I want to I wanted to try out to do something that was like very different from anything that I would ever do I said I'm probably going to be in an office for the rest of my life I want to do something very different and I also want to do something that's like kind of scary to me that like I'm pretty sure I'll fail at it I think it'll be really difficult it's like it goes totally completely out of my comfort zone because I think that that's important and so and and I said you know and the risk here isn't that high like worst case like I won't be that great in the military that's fine and so I tried out to be like an instructor and then specifically I saw a tank I was like that machine is amazing I want that and so then I tried out specifically to be a tank instructor and my the way that it works is at least then was that they have women that basically we train the the tank the like the the soldiers it where basically in it's in a tank you have the gunner you have the driver you have the loader near the commander and my role was basically to train the gunners and and then also commanders have to do all of them and officers have to know all of them and so also training like the commanders and the officers and specifically for training the gunners I always train on the wet on the weapon subsystems so basically all of the computers and that basically helps them like understand like all of the computer system within within within the tank for the the weapon the weapon subsystem and it was kind of tricky to do my undergraduate before the military service because I would ask my commanders questions like so the algorithm that it uses to understand like what angle to open is that like does it does it learn the reinforcement learning or and they were just like maybe like what were you like what planet did you come from like what are and so that's that's what I did my military service in and so it included like it was like incredibly physically difficult because they basically make us go through you know we did basic training had to do like a million and one push-ups run around a lot the outside not sleep learn how to basically I was trained on all of the subsystems of the tank not only the gunners basically to learn everything that goes on there and then focus in on one because I only got basically after you do basic training after you learn everything that's only only after that and they're like okay now you're going to focus on this and it basically checked all the boxes of being like really really hard incredibly challenging turns out physical isn't the tough isn't the tough parts like mentally very very difficult and so that kind of set me up to being kind of less afraid of failure because it's tough um after that I was a preschool teacher uh which was by far by far the most difficult job I ever had by far by far oh really oh yeah wow I even as you were saying that I thought you were going to say like I had this idea of like cuddles and like laughter yes tons of cuddles tons of cuddles tons of laughter but like so physically draining uh and like so emotionally like you know I would dream about my kids and like it never leaves you like you dream about these kids and you're like thinking about them and like they you know and I was also like more sick than I've ever been like constantly sick I was like always on like some sort of antibiotics um nobody it's like very very challenging but it's also um you know it gives you it's also another way to do something it's like really tough and gives you a different perspective and it's both the military and being a preschool teacher incredibly incredibly humbling like very very humbling um and so I think that's like the biggest takeaway for me uh of those things that I did um and now wait wait I need to tie it back in to women and stems so uh now I'm a mom I have three kids I have a two-year-old and almost five-year-old and like less than a month five-year-old and an eight-year-old and so first of all if I'm tying it all into like content moderation why I do what I do I think it's like extremely obvious like uh you know online harm can turn into offline harm and I do want to make sort of all interactions safer um and in terms of like I see my daughter and I see the world that she's she's eight and though like what it means to be to be to be a woman in this world and the leader in this world I want to make sure that she has role models so that she isn't the only woman in her computer science class been there um so that she has you know she isn't the only woman uh in meetings been there um I want to make sure that she has like a much more welcoming environment for whatever she for whatever she wants to do and what's really sad to me is that even now uh there's like just like like I'm hearing from her things like oh well you know boys are better at that than me no not true very not true here's why it's not true um and so these are the kinds of things that like I want to first of all want to make sure that they're they're not out there online these kinds of uh you know speaking of disinformation uh but also want to make sure that sort of the environment that she's growing up into is is much more uh welcoming nice well it's cool to hear how you are you know your passions are coming through um across all aspects of your life and that you're tying together um you know these personal things these you the personal things that you'd like to see in the world with what you're doing professionally uh with respect to things like disinformation um so we were talking about uh you being an Israel obviously that's come up number of times at this episode in the military service um another thing that is unique about Israel is that it has very high R&D expenditure per capita so it is markedly higher than any other nation uh on the planet and so that probably creates an interesting flywheel between the strong tech uh startup ecosystem that there is in Israel um so that you know helps uh generate more things that R&D can be spent on and but another interesting piece related to this is that um I can remember this was a podcast conversation that I had in the past I don't think it was so I think this is the first time we talked about it on air but my understanding is that another thing that's fueling tech startups in Israel is this mandatory military service so you went and did tank instruction but a lot of people um particularly I suspect a lot of people that already had undergraduate degrees like you did end up doing things that aren't you know you know they're not training to be on the front lines and more they're training how to do bread intelligence they're training how to do signal detection they're using machine learning and data analysis in the field and then so having developed that skill set for several years when you finish it you're like well uh what could I do and one idea that I guess a lot of these people have as well I could be making a startup um I could be using using these technology skills um in industry so um so we have these flywheels um of I guess there's two flywheels here there's one where the mandatory military training um creates uh leads people to be tech entrepreneurs and then that probably in turn also um is helpful for military capabilities in general uh and then you have this separate flywheel of um of R&D where this strong tech ecosystem um is a self-fulfilling prophecy of oh great you know we should be investing more in this and so then more people go into that and yeah so I've I've now talked a lot of lengthy transition the floor is yours uh so yes yes and yes um so yeah we have mandatory military service it's currently set uh on like in general roughly speaking two years for women uh three years for men again lots and lots of caveats um and and uh and there's sort of first of all there's definitely like a big investment of the military in um in uh technology whether it's like signal processing or AI or whatever and so um then you have people that are basically trained in that like you said and then they can go out and they have this skill set and we we hire people you know everyone's hiring uh those sets of people but even people that aren't going into these sorts of fields the fact that there's this mandatory military service means that already um from a young age you're in a place where you're picking up skills that are necessary to succeed in you know in these companies right so uh for example leadership skills right like you can go uh basically in most cases in order to become an officer in our military you have to start at the bottom like it's not like uh in the US where you have like West Point and the Naval Academy or whatever and then and then that's how you become an officer uh basically start when you're 18 and then uh based on different parameters you can you can be you can elect or you can be chosen to to do officers training and so then you have these people that are that are leaving the military with a skill set of you know being like very focused you know focused and uh and leadership skills and managerial skills and time management skills and all these things that basically go that's first successful watching um or successful uh CEO um and so yes it's one of them is like on the job training and the other one is just like in general these like other skills that you have and another thing that I think um is really positive of the fact that it's a sort of mandatory military service is that it's sort of it's like equalizing for us right so everyone goes into the military almost huge caveat which is like causing right now a lot of social unrest here but we'll leave that for a different start for a different time uh but you go into the military and you're mixed with with different people right uh and so that's also way of like meeting people that you wouldn't necessarily otherwise meet then kind of out of your you know echo chamber out of your specific place and then that can also be like a very uh like an incubator for like new relationships that can then go off and start new companies and then um yes I think the fact that uh we have a very very strong investment in R&D is also like you said the self fulfilling prophecy like what do people want to go into doing what people end up doing is they go to this field right it's like that's that's what we know that's what we see that that's that's sort of like um and that's also a very good way for upward mobility um for for people and so with our field in particular with data science do you think that all of this R&D in Israel will give Israel and edge in AI technology in particular? Yeah so I uh yes absolutely um I think that um already uh we're seeing that uh so I have uh people that I that I work with uh uh one of which uh like it she used to be like very very senior uh in the military and in in in in AI um she's uh uh she I work with her right closer she's she's our VPI product uh in the weight and so she was very very senior in the military in building uh AI infrastructure capability so there's already this sort of cross-pollination uh we also have people that either like I said we hire right out of the military or uh in some rare cases we have people that sort of start doing their studies first before right the military says okay like you can take this time you you know we pay for your studies and you sign on for a certain amount of time later uh for the military and in some cases we can also hire these people while they're in their studies uh and then the skills that they learn with us they can then go and use uh in in in in the military um so there is this this definite cross-pollination that we're seeing and I think that it also definitely puts you know there's puts AI uh is like a very very strong and core component of the industry here uh because it's so useful not only in the military right but in general in in in all of the companies they're going on and so there's like this very very rich uh community here um of you know researchers practitioners and so forth great answer crystal clear and exciting to see uh what act of fence and other AI companies will be uh doing out of Israel in the coming years and the coming decades this has been an awesome episode of Mattar so I was promised that you were this extraordinary speaker and you have proved to be an amazing communicator it's been a real joy to speak to you thank you and so I'm sure our audience loved this conversation as well uh thank you and um so we covered a lot of interesting topics um automated uh harmful content detection neuroscience military service uh preschoolers uh so uh I'm sure our uh listeners will want to hear more from you um so first uh my pen ultimate question that I always ask yes is whether you have a book recommendation for our audience of course uh so uh it says nothing to do with anything that we talked about um but I really like the book under the banner of heaven it's by john cracker who wrote uh into the wild um I think uh oh yeah okay that um I love reading books about sort of like other other lives uh or other other places and so under the banner of heaven is a good one nice yeah john cracker is a uh a standing author based on uh into the wild so I'm sure that that's a great recommendation he's also he's an annoying person for me when I type my start typing my name into google he's the one who comes up until I get to the oh in my last name um this uh always wonderful maybe that's what primed me to think of that book of all books I could get there you go um and yeah and then my final question for you is how should people follow you and glean more insights from you after the program uh so I'm on LinkedIn like everyone um I also we have a rnd tech blog for act defense engineering grad defense dot com and that's where you can read more about the things that we do uh and uh dive into some more some more details uh and please feel free to shoot me an email or reach out to me on LinkedIn I'm always happy to chat nice thank you for making that offer to our listeners metar and thank you so much for being on the program especially on such short notice we booked you just days before recording this episode oh don't say that it makes you seem like I have no life yeah yeah as uh I mean it actually it just shows you know how kind you were to make this time because you've got three kids with a bp of data in a i add a very fast growing high value company and so thank you for making the time despite that uh to fit our super data science listeners in happy to nice well yeah so you mentioned uh potentially being on the show again in the future and uh that sounds pretty to me we can hear about um how um act defense uh continues to shape this uh this this harmful content reduction space in the years to come thanks but are thank you for having me this was fascinating and a lot of fun I love this conversation today I hope you did too and today's episode metar filled us in on how an ml model such as a binary classifier can become contextual by taking into account additional context for example we can pull out a logo from an image identify the individual in an image and compare it with a database we can examine natural language comments and consider the content posters history and graph network affiliations she also talked about how real-time streaming of harmful content presents unique challenges that can be addressed by smaller models on edge devices like phones sampling on servers and again taking into account context she talked about how we can create a flywheel of defensible commercial AI systems by amassing proprietary data curated by internal experts and she talked about how she uses python no.js typescript and kubernetes for developing ml models deploying them into production and scaling them up for active fence users as always you can get all those show notes including the transcript for this episode the video recording any materials mentioned on the show the urls from atars social media profiles as well as my own social media profiles at superdatascience.com slash 683 that's superdatascience.com slash 683 your feedback is invaluable both for spreading the word about this show as well as helping me shape future episodes more to your liking so please rate the show on whichever platform you listen to it through and feel free to converse with me directly through public posts or comments on linkedin twitter and youtube all right thanks to my colleagues at nebula for supporting me while I create content like this superdatascience episode for you and thanks of course to vana mario Natalie surge silvia zara and kearl on the superdatascience team for producing another captivating episode for us today for enabling that super team to create this free podcast for you we are deeply grateful to our sponsors who might hands-liked as partners because I expect their products to be genuinely of interest to you please consider supporting this show by checking out our sponsor's links which you can find in the show notes and if you yourself are interested in sponsoring an episode you can get the details on how by making your way to johncron.com slash podcast finally thanks of course to you for listening it's because you listen that i'm here until next time my friend keep on rocking it out there and i'm looking forward to enjoying another round of the superdatascience podcast with you very soon