683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller
This is episode number 683 with Dr. Mattar, Haller, VP of Data and AI at Active Fence.
Today's episode is brought to you by Posit, the open-source data science company,
by Anaconda, the world's most popular Python distribution, and by WithFeeling.AI,
the company bringing humanity into AI.
Welcome to the Super Data Science Podcast, the most listened to podcast
in the data science industry. Each week, we bring you inspiring people and ideas to help you
build a successful career in data science. I'm your host, John Crone. Thanks for joining me today.
And now, let's make the complex simple.
Welcome back to the Super Data Science Podcast today. I'm joined by the wildly
intelligent data scientist and communicator, Mattar, Haller. Mattar is the vice president of Data
and AI at Active Fence and is really firm that has raised over $100 million in venture capital
to protect online platforms and their users from malicious behavior and malicious content.
She's renowned for her top-rated presentations at leading global data science conferences.
She previously worked as director of Algorithmic AI at Spark Beyond an analytics platform.
She holds a PhD in neuroscience from UC Berkeley and prior to data science,
she taught soldiers how to operate tanks. Today's episode has some technical moments that will
resonate particularly well with hands-on data science practitioners, but for the most part,
the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge
AI applications. In this episode, Mattar details the database of evil that Active Fence has
amassed for identifying malicious content. How contextual AI considers adjacent and potentially
multimodal information when classifying data, how to continuously adapt AI systems to real-world
adversarial actors, the machine learning model deployment stack she uses, the data she collected
directly from human brains using recording electrodes, and how this research relates to the
brain computer interfaces of the future, and why being a preschool teacher is a more intense job
than the military. All right, you ready for this captivating episode? Let's go.
Mattar, welcome to the Super Data Science podcast. It's awesome to have you on the show,
where are you calling in from? I'm calling in from Israel, sunny sunny Israel.
So thanks for having me. Sunny sunny Israel, is that always true?
It's very sunny Israel. Most of the time, it's pretty sunny. We have two seasons. One is really long,
and it's really, really hot, and the other one is shorter and beautiful and not as hot, but still,
we have a lot of sun. And that's not... We have Fahrenheit beaches. We have
tropic areas that are more green and nice and forest, wildflowers, mountains,
not all camels and deserts, although we have that too. Cool. Well, I guess it isn't cool,
but I... It sounds hot, but I will have to visit there sometime. I actually... I have a grandmother
who recently visited and said that it was her favorite place she's ever been. Oh, wow.
Nice. So come visit. I'll introduce you to my my chickens.
There you go. This episode brought to you by the Israel Tourism Board.
And but you do travel a lot as well. So you were recently in New York. You were at
MLConf, the machine learning conference in New York, which I was unable to make it to this year,
but you were a speaker at MLConf and Deborah Williams, who's a friend of mine and the
acquisitions editor Pearson that I've worked with for the books that I've created,
all the video content I've created, she wrote me a long email summarizing how MLConf had gone
and she said that by far the best speaker hands down, not just her opinion, but the opinion of
quote, everyone that she spoke to was that you, Matar, were by far the best speaker at MLConf.
So I was like, well, get around the show. So that's very, very flattering and now like take
your expectations and lower them. Thank you. Very, very flattering. Thank you. That was a fun
that was a fun conference. There's lots of interesting ideas and good talks. So it was a if she said
that it's there was a high bar. So thank you. And so let's dig into what you do. So you are the VP of
data and artificial intelligence at active fence, which is a platform for content moderation,
harmful content detection and threat intelligence. And so to be clear,
active fence is not a company that is doing the content moderating. It's not like there's this
army of people at active fence that are monitoring for harmful content. But you provide tools,
data in AI enhanced tools that allow your customers to be able to do that content moderation
themselves more efficiently. And this seems to be quite a good niche. I could see on crunch
base that active fence has over $100 million in funding. So yeah, it seems like a very
valuable niche to be filling for your customers. So tell us a bit about what this means. How do you
use AI to be moderated content? How is that useful for threat intelligence? That kind of thing.
Sure. So, active fence, you're right, we are a platform that basically our clients are any
company that has user generated content. So whether it's, you know, comments or chats or uploading
videos or audio or any any place that you have a user that's able to upload content, there's a
potential for misuse of that and for uploading malicious content. And our goal or mission is basically
to help platforms ensure that their users are safe, that they have safe online interactions.
And so we do, we provide the tools stuff to help them, to help them do that. And really one of
the, this is one of the biggest challenges that face UGC's or platforms with user generated
content is basically how can they detect this malicious behavior? Especially since as we know,
items can be in any format, right? So we need to be able to detect whether it's video, audio, text,
images, all of that. And we, and also can be in any language and it can also be any number of
violations, right? So you have sort of these, these big ones that you know, or you say absolutely
not like I do not want child pornography, I do not want terror, I do not like one supremacy. But
but there's, there's like many, many, many, many more. And different, different companies,
different platforms have different levels of sensitivity to it, right? Even something that you can say
as blame is like, I do not want child pornography, no one wants child pornography on their platform.
But let's define it, right? What does that mean? Is, you know, babies first bath?
Is that, is that something that we need to be aware of? And so the tools that we provide need to be
sort of contextually aware of, you know, the policy, the way, the way that things are being used or
presented. And so for me, and for all, you know, my, my, my teens, it's a super, super interesting
space to be in, because not only are the algorithms that we use really exciting and sort of interesting,
but I think the application, right, we're not, we're not selling air like we're actually making it
like impact, like making a real impact on like human interactions in a positive way.
Right. So to what extent can you tell us about those exciting algorithms?
So that's, that's excellent question. Thank you for asking.
So I think that, so there's, there's many different levels of things that we can do. So the first
thing is that we sort of, we have our, a platform, right. And this, this is a platform that basically
enables it enables users to, to, or like moderators to come in to view the content to look at sort of
where it is and then to make a decision whether or not something should be removed or not. And
this is a platform that we provide to our users. In order to basically ensure that we'll be able to
protect the well-being of the moderators and to make sure they're only seeing things they actually
need to be seen in order to be more efficient. They, there's absolutely no need to review everything.
There's like a, most of the things are benign and even within the things that are harmful,
there isn't really any need to view anything. Then in that case, basically, you want to make sure
that you have sort of some sort of automated content moderation on top. And that's where sort of
we come in. Yes. I guess that, that ends up being important for the mental health of the people
who are doing the content moderation as well. Because I've read how people in those roles,
they can be quite a harrowing experience when you're just watching the headings and child porn all day.
Absolutely. Absolutely. Like moderator well-being is a huge, huge, huge issue. It's in the news,
like periodically it comes up as like this, this, this huge thing. And there, and activists is like
very, very concerned about this, right? We, we deal with data that is not pleasant, right? And so in
the same way that I actively work to protect my data scientists and my engineers from exposure to
this and only when it's really needed. And with a lot of safeguards, we want to make sure that,
you know, we're all human and want to make sure that that moderators are also protected in the same
way. And so if there's things that are sort of blatant, you know, a beheading, why do they need
to watch that? There really isn't a need, right? There's things that are clearly obviously violations
are clearly obviously malicious and should just and should just be removed and banned. And so
the algorithms that we use, basically we use what we, what we call contextual AI.
What this means is that we look at sort of the item in the context that it is being used.
But also within the item, right, we have our data model basically enables us to take sort of an
item, even if it's just an image and start breaking it apart into the components that it has,
so that then we can build those together into a coherent risk core where this risk core can take
into account, you know, like, do we see any like weapons? Do we see any known logos? Do we see
any known people of interest that we know have, you know, from their history, whatever we know that
they're, you know, spewing hate speech or misinformation and so forth. And then all those components
together can combine to basically say, yes, this item is very probably, very probable to be risky.
And so that's sort of how we build the full picture. And then of course, there's other layers of
that, right, even for example, for chats, right? You can say, well, I can just use keywords, right?
Like, if I find the N word, then clearly this is very violative. But what if it's someone saying,
please don't call me that or what if it's a rap song or what if it's, you know, someone like,
like, you know, like, community sort of reowning a word. And so like, you know, I'm proud to be
whatever some some slur. And so in those cases, clearly, I don't want to ban that. And if I'm
just doing keywords, which are sort of contextually unaware, then I lose that ability. And so in those
cases, we do need to use sort of language models that are more contextually aware. And these language
models need to be trained and tuned on these specific cases, because these are the cases that are
always interesting. That does sound really interesting. And it sounds like the kind of thing that
in this brave new world that we have of these really powerful large language models, that this
is the kind of thing that they could do really well, that a few years ago, it might have been a lot
tougher. And so it's it's great that you're presumably able to leverage these kinds of new
technologies, especially these kind of multimodal technologies that are emerging. So I don't think
it's available to the public yet, but GPT-4 has this image component where you can have an image,
you can provide an image a photo of your fridge and ask GPT-4, what can I cook based on the
ingredients that you see in this image. And so that that kind of multimodality, it sounds like
it's something that you've been working with for a while. Yeah, we've been looking a lot of
multimodality, because for, you know, if I'm going back to to the child pornography example,
because that's for people something that's like so obvious, right? Like you should be able to know
whether something is child porn or not, like we all sort of viscerally know what is bad. And yet,
sometimes you'll see, you know, a picture of a child and it looks fine, but you know, it's sort of
like only the people that are in the know will know that, you know, whether like that's a face of
a victim that's known or in the comments, there's links to off platform sites or something about
the angle or a logo that's just like the picture itself is benign, but there's a logo that's
associated with the studio that's been associated with child porn or the title or the description.
And so sometimes sometimes it's enough to look at the image, sometimes it's enough to look,
it's enough to look at the surroundings, but oftentimes it's the combination. And I mean,
in terms of like these generative AI, right now is sort of this sort of like perfect storm
for trust and safety, right? Because we're going to be having sort of US elections soon,
and so political disinformation is something that's like very, very current and is now sort of having
these large language models sort of lowers the bar for the entry of bad actors, right? Suddenly,
if it used to be that things could either be like really high quality, but low scale or low quality
and easy to catch in high scale, now that's not even that's not an issue, right? And so it's
like an enabling technology, you know, it's obviously I like, I don't like no fear mongering here,
I think it's like has a lot of good that it can do, but we need to be aware of how it can be used
and how we can kind of be prepared for it. This episode is brought to you by
Posit, the open source data science company. Posit makes the best tools for data scientists who
love open source period, no matter which language they prefer. Posit's popular RStudio IDE and
enterprise products like Posit Workbench, Connect and Package Manager, these all help
individuals teams and organizations scale R and Python development easily and securely.
Produce higher quality analysis faster with great data science tools. Visit Posit.co. That's
POSIT.co to learn more. Yeah, no question. So we, you know, famously in the United States,
in the 2008 election cycle, there was a lot of there were foreign actors involved in creating
disinformation in Eastern European kind of farms and you can imagine exactly like you're saying,
these kinds of tools like GPT-4 make it a lot easier to create a lot more content because you
don't need to have a human typing out everything. So much more cheaply, it probably like several
orders of magnitude less expensive to be generating malicious content misleading content disinformation.
So yeah, it is interesting that heading into, yeah, I mean, I guess we're always heading into
a election cycle somewhere. And so it's like, yeah, it's crazy in the US to me that people in the
in the lower house that they they have a two-year election cycle. And so like you spend a few
months litigating that it's back to fundraising. Exactly. It's wild. Yeah, but I think it's it
wasn't just that like disinformation is only one aspect of it, right? We're seeing like computer
generate or you know a generator. I generated a child pornography. And then at this point,
the question is, is is it still violative? And I think yes, right? Like we don't want that
stuff out there. I don't care whether something is real or fake. It's it's still trying the
child porn and it should be right. It should be banned. And then and then there's a second level
of like, well, unless I'm trying to find who the victim is and then I do care. And then there's
like another level of detection that needs to be built on top of that. Is it tricky? I mean,
it must be tricky. Something that must add an extra level of complexity to this is that presumably
the nefarious actors out there are constantly shifting and trying to evade detection by you.
So in with when many of our listeners and myself, when we're building machine learning models,
we don't have to worry about somebody trying to outwit the model. You know, like you can build a
machine learning classifier to detect images of cats and dogs. And it's not like the cats are like
trying to look like dogs and are going to come up with ways of like dressing to look more like dogs.
Totally. So yeah, I mean, I think I may think that in terms of like how how is what I do different
from car detection, right? Like that sort of, you know, or cat detection or anything. Or yeah,
anyway, there's a million examples. And so I think there's in addition to the fact that it's
invasive and adversarial, right? So there's, you know, examples of like QAnon, which is a group
that's bands on some platforms will, you know, change their text to be cute like CUE. And then
you have to basically catch it by knowing to look for that. And to know to look for that,
that's already subject matter expertise. And that's one thing that activists we have. You know,
you mentioned threat intelligence. And so we have intelligence analysts that this is what they do,
right? They're experts in, you know, misinformation or in hate speech or in terror. And they research
these groups. They know about sort of the latest hashtags trends, like what emojis they're using now.
And so forth. And then basically that is able to, you know, they're able to sort of surface
data that's relevant. You know, for example, you know, the latest, there's, you know, a hate group
that was founded in like this last June or this last October. And already, you know, on,
on different social networks with their logo spewing hate. And so to catch those, to know those,
to put, and then those can feed back into our algorithms. And I can know to look for those logos,
to look for those phrases, to look for those actors. And so then I'm able to sort of stay on top of
it on the fact that yes, it's adversarial. I have the subject matter expertise. It's extremely
non-stationary, right? I have new actors coming up all the time, right? If I'm looking for cats and
dogs, like how much of they really changed? They're not going to change, right? They're going to
have four likes, a tail in years. Versus here, the landscape is very non, because it's adversarial,
it's a criminal on stationary. And so that's why I need to have my subject matter experts that are
constantly feeding me more information of like, oh, this is a new slang term. Oh, this is a new slur.
And so far. Is there a flywheel that gives active fence a defensible mode between this
content moderation and the threat intelligence? So I'm just kind of, I just kind of have this
brainwave here. And you can correct me if I'm thinking about this incorrectly. But so you have
the content moderation aspect of the platform. So machine learning models are detecting, hey,
you know, we think that there's, this is high risk content over here automatically. And then
maybe that automation can assist the threat intelligence part of the company. And then the
threat intelligence people in turn are keeping tabs on what's going on through a combination of
more manual intelligence work, as well as this automated, automatically assisted intelligence
work that your content moderation side is helping with. And then they can feed back into the content
moderation, like, hey, like here's something else you need to be able to look for. We need to train
a machine learning model to be able to detect this kind of thing. Because yeah, like, you know,
there's this new logo that you need to be looking out for. Totally. You nailed it. Yeah.
We love flywheels at active fence. We always say like no strategy, no strategy deck is complete
without a flywheel. And so, and so absolutely. It's exactly as you described. So we have our
intelligence analyst that are, you know, finding finding things, right? Those feet into the algorithms,
they grow algorithms better. And then we have incoming data collection, whether the data is like,
we go out and proactively collect it, or we get data, you know, clients are sending us data,
being like, hey, is this violated or not? And then that's basically can then be fed back into
the intelligence analyst. So for things that come from clients, sometimes it's things that they
haven't seen before that they don't know. Often they do know it. But because we're also out there
collecting data proactively, then that's basically able to feed back in. And one core component of
this flywheel is something that is, we've, it's, it's, it's our proprietary database of sort of
a valid of content. And what this means is that basically we have data that we've already identified,
whether it's images or audio or videos or text, so we've already identified as being
valid of a policy or malicious. We can hash that. And then new content that comes in can be
compared to that, right? And then that also helps us first of all to be more efficient. But also,
we can proactively enlarge, we don't need to wait for data to come in, right? We can proactively
enlarge this proprietary database by going out there, going to sources that we know are problematic.
And that that's what our intelligence analysts can help us with. And then next time that it comes in,
basically, we've, we've already seen it before. And so there's definitely this sort of interaction,
this flywheel between the intelligence analysts, the humans and the AI, one sort of feeding off
the other. Yes. So this is the database of evil that you've talked about publicly before, right?
And so to give a bit of an analogy, this is kind of like how antivirus solutions have a database
of known viruses. And then if that line of code that's known to be malicious has come across on
your own hardware, that can be compared against this database. And you can say, okay, this is the
threat. We need to remove this part of your file system. So similarly, you have this database of
bad content, of harmful content. And yeah, it's proprietary. So, okay, I think that,
yeah, you use more to say about that. So my, the more thing that I have to say about that is only
that to bring us back to the idea that we're in an adversarial space. And if we keep this in mind,
it's like we're in the space that's adversarial that requires subject matter expertise,
it's, or it's adversarial on the basis of requires subject matter expertise. It's non-stationary,
it's multi-dimensional. And so once we keep those, and it requires context, and once we keep those
in mind, then it sort of helps us frame how we want to use this database of evil, but also
this proprietary database, but also like what it needs to sort of be robust against, right? So
if we're in a place that requires subject matter expertise, and sure that we can keep enlarging
our data, like all of our databases, right, with our intelligence analysts. And it's a non-stationary,
so we need to make sure that it's always updated, like having a snapshot of this database isn't
enough, right? There's going to be new things. But also if we're in a place that's inherently adversarial,
then we need to make sure that this database is also robust to adversarial manipulations.
What does that mean? For example, if I have like a very hateful song, like glorifying the Holocaust,
for example, like a love song glorifying the Holocaust, right? These things exist.
Then, and I know that this is banned on platforms, then I can speed it up, right? And then in the
comments, or in the title or somewhere, I can say, listen to this at half speed. And then now I've
basically made it against all, you know, past all kinds of defenses. And so we need to make sure,
and the same thing with images, right? I can like rotate it, I can grayscale it, I can mirror it,
I can do all sorts of things. And so I need to make sure that my hashing, any hashing under
them that I have is robust to these manipulations up to a point, right? Because I'm always going to,
it's always this idea of like precision versus recall. Do I want to now unfairly capture things
that shouldn't be captured, right? And unfairly say that they are a violated, probably not.
And so it's a tricky line, but it's that's the line that is when any content moderation
algorithm, we're always trying to figure out where things should, where the boundaries should go.
And I guess that's key to why having a human in the loop in these kinds of decisions so that
people can, if they are unfairly forced to remove content, there should be some kind of appeals
process in the platform, or yeah, or a human reviewer that can make some final decisions. So I,
I guess kind of going back to a point early in the conversation when something is really flagrant
and your risk score, which I assume is kind of similar to in machine learning, having a binary
classifier where you have a confidence on, yeah, whether how likely is this to be malicious content,
harmful content. And so if that is very high, if you're like, okay, this is point 99999,
we're just like, there's no point in sending this to a human to review. But if it's point 8 or point
7, then like there might be something here, somebody should review before a decision is made.
And yeah, same thing on the flip side when it's, yeah, even, even things that are, yes,
if something does get flagged automatically, because there's still everything is probabilistic
in machine learning. So there's going to be cases where the algorithm is very confident, and there's
still, and due to some circumstance, you're describing where like a group has re-owned something
that has been a racial slur historically, there should be the opportunity for that person to say,
no, like I have, like, I should be able to, yeah, so this is, it kind of works both ways.
So, so our risk score is exactly as you describe, and like everything is probabilistic, and also it's
a business use case where they want it, like how much manual review they want, right? Like maybe for
a child's platform, like a platform for children, they say, you know, I just ban everything,
like so what if the kids like can't chat about, you know, it's fine. But maybe for like other
platforms, like a music platform, the kids can chat about racial slurs. Yeah, but fine, I'm okay
with that, like I don't care for you. My kids can just like type three words, that's okay,
and my kids are anyway, they're getting cell phones when they're 35, like until then deal with it.
But, but yeah, but so it's exactly that, right? But other platforms would be like, you know what,
even if it's at 99, like for the things that are 99, like we want to review them, because we would
we would rather err on the side of like Reese Beach or whatever. And so I think that also in terms
of appeals, that's super important point, that's something that is definitely critical in this,
because this is like, you know, people are posting things and they don't want to be unfairly
punished. And actually right now, I think that the world of trust and safety is having its GDPR
moment, right? Like GDPR was, is for those of you that are not familiar, it was like privacy
regulation passed in the EU, that ended up having huge sweeping effect, because it was basically
any time that a, that a citizen of the EU is on an online platform, then GDPR is effect, like
is effective on them in terms of like, you know, what cookies and what can be stored and so far,
then you've probably all seen like, you know, the notifications on, on your browsers, about
privacy regulations. And so now trust and safety is having its GDPR moment with the DSA.
Which is the Digital Services Act. It's passed by the EU last year. And it basically also puts
in protections for trust and safety. It codifies them by law. It's sort of in a similar way
with fines and so forth. And while it's still new for smaller tech companies,
big, like the very big online companies, they have like, they're already, it's like already
being rolling in and they're rolling out and there's like very strict regulations on them,
that they need to follow and it'll, it'll trickle down probably to everyone. And so like,
regulation finds on the table. And so these businesses need these tools to be compliant. And part
of that is also auditing and understanding, like part of the DSA is like auditing and understanding
why things were banned and explaining it and so forth. And so that's another thing that we
invest in is explainability, right? Like, if I'm giving a score, then I want to be able to explain
why. Like, because a lot of times these things are, you need that subject matter expertise to
understand that, oh, like this particular logo is actually associated with this particular
tear group or hatred group or child pornography studio or whatever.
Nice to see. So I can see how the evolving regulatory landscape ends up being
important, probably helpful to you as you develop these algorithms. We've talked already
about about harmful content kind of in static in, you know, in posted content.
But there's also, there's something that we hear a lot in the news recently. We see increasingly
in the news is not just content that's been posted, but content that is streaming real time.
So there have been incidences in the US recently of shootings being livestreamed to social media
platforms. And so this happening in real time, that must add an extra layer of complexity to
some of the work that you're doing. Absolutely. There's been like horrific instances of
live streaming in the US and elsewhere. And so there's a couple of ways to approach that.
One of which is that we can put really, really small content moderation models sort of on the edge
device, right? So that does sort of something basic to catch sort of the blatant stuff.
And then, you know, raise it up for for human review. Because I think in these in these cases of
live streams, it's tricky. We're still learning the space and we would want someone to just like
if we flag it to take a look at it, maybe err on the side of flagging too much and then having
someone take a look at it. Again, it's always it's a business question of like what's the platform,
what are we looking for? And so, you know, have some sort of like detector of, I don't know,
gunshots or if something that's, you know, a small model edge device able to flag it right away.
We can also do something where we are, you know, once the content makes it to the servers,
sample frame, like there's a question like, what do you want to moderate? Like every single frame,
do you want to sample every minute, every second? Like at what there's, there's a huge question,
I think what makes this so challenging is just the scale, right? You have like so much data streaming in.
And then do the same thing, sample and then look for maybe more complex things, right?
So those are sort of like the typical things and that's where you're really focusing on the content
itself, right? That's coming in. But a lot of these times when you're when you're live streaming,
something like this, then, you know, the perpetrator may have like, you know,
pre-shared this somewhere. There's people that are, you know, joining the stream,
that are commenting. And now it's, now suddenly you have a much, much richer source of information,
right? You can look at who are the other users? Who is the user that's streaming? What else have
they stream in the past? What other groups are they been in? What are people writing in the comments?
And so forth. And suddenly now you might be able to catch it or at least flag it just from
the surrounding information, right? Like there's, there's enough indicators of risk from the things
that are around it. We're sure you want to moderate the content. You want to look at it
and it's not worth, however, you would want to basically look at the other markets of risk
around the content itself to make your job easier and faster and more efficient.
Did you know that Anaconda is the world's most popular platform for developing and deploying
secure Python solutions faster? Anaconda solutions enable practitioners and institutions
around the world to securely harness the power of open source. And their cloud platform is a place
where you can learn and share within the Python community. Master your Python skills with on-demand
courses, cloud-hosted notebooks, webinars, and so much more. See why over 35 million users
trust Anaconda by heading to superdatasigns.com slash Anaconda. You'll find the page pre-populated
with our special code SDS so you'll get your first 30 days free. Yep. That's 30 days of free
Python training at superdatasigns.com slash Anaconda. Gotcha. So yeah, so it obviously is more
complex to be moderating harmful content when you're thinking about it in a real-time situation,
but as you point out, having smaller threat detection models on the edge of ISO on mobile phones,
maybe on laptops, being able to detect these issues in real-time and potentially flag those
to the social media platform. And then also once the real-time data is reaching the servers
of these platforms, you can be sampling at some appropriate interval in order to be trying to
detect whether harmful content so get the sound of gunshots and then so it can be reviewed as
or whether this is like video game gunshots or not and then something. So probably in a circumstance
like that, whether it's real-world gunshots or video game gunshots, we're going to be able to
tell more easily because of the contextual information that surrounds that. So the kind of text
that people post in response is probably going to be quite different and classifiably different
in a video game where people, there might be more like, well, I don't want to even speculate.
Yeah, but I would say that if we're thinking about it and like I'm kind of thinking out loud
and like refining a bit what I said earlier is that so you have something on the edge of ISO that does
like, you know, bait more basics like a smaller model that does bait more basic content moderation
and then instead of it flagging human, like remember that everything is making it to the cloud and
so we're sampling and so things that have been flagged by the edge of ISO can then either have
like more like different like more tightly spaced samples or can have more deeper analysis on it.
Like it's basically a funnel, right? And again, it depends what you discover on that device. Like
maybe you might want to write away flagged tools and be like listen, like there's this is not
something that is very likely to be in the gray zone. And then also you can look at the surrounding
content. You don't need to wait for the content to be uploaded to serve anything, right? Like you
have this surrounding content. It's like text and who the user is and like where, you know,
do we recognize this user? Where else have they posted before? Where the users that are commenting,
like where else are they? And this is where actually like a graphical data model comes in handy,
right? Because now you have all these relations between users and you can see like what if they liked
before, where groups are they in, who have they interacted with and so forth. And then if these
are people that are known to us, then we can say well actually like this is a user that if we see
them here, this is, it adds to the probability of risk. All right. So Matar, you've given us
a interesting overview of how content moderation works in your automated platform for detecting
harmful content. So things like contextual AI, needing to be able to adapt to adversarial
opponents, the flywheel between content moderation and threat intelligence that's helpful to you,
the database of evil and how with this flexibility and the way that information is hashed in there,
so that you can be detecting new variations that are adjacent to existing known harmful content.
And then most recently we just talked about the specific circumstances of real time streaming and
how we can be addressing a harmful content in those circumstances. So very interesting. And so I'm
curious to what extent you can tell us about the kinds of technologies that you use to make the
platform happen. So, you know, what kinds of programming languages, obviously you're not
you can't get into a love-loved detail that would allow adversarial actors to be more effective
than adversarial. I'm just your actor assistant now. Yeah. And again, I'm giving it with the caveat
that like the parts that I deal with are sort of like the data, the MLObs that engineering,
the API, the world of frontend is beautiful and mysterious to me. So I can like list technologies
that they use there, but they don't mean that much to me. I'm more I've always been sort of like a
back end geek. And so in terms of what we do, obviously data people do Python stuff. We also use
node and text script. We serve our models on Kubernetes. We have we've done a lot of work in
house of selecting of like writing stuff, basically select the correct instance type for a given
model. So that you get really good the best utilization. We've also are working on like model
inverses model out being like do we bake the model into the image or does the or do we like
or when we spin up the pod like do we bring them all from outside basically in order to maximize
our or minimize our uptime because we basically need to be able to deal with really, really high
throughput and low latency. And so and also we don't want to be just like burning many on machines
that are up for no reason. And so we have HPA that we've been tuned and then we can basically
spin up and spin down our machines as we need. And then be smart about which machines we're
spinning up. And also if we're able to sometimes sort of put multiple models on the machine or batch
the requests to the machine. And we do all sorts of optimizations to make sure that we're
high throughput low SLA. Nice. Very interesting. Thank you for being able to go into even that
level of detail. So clearly you have a really deep understanding of not just data science and modeling,
but of back end engineering. So like scaling being able to meet SLA's Kubernetes. So super
interesting. I didn't know from my research beforehand that you had that kind of expertise as well.
So let's dig into your background a little bit to see how this all came about. So
you did a neuroscience PhD at UC Berkeley, which is I think a great decision. I also did a neuroscience
PhD. So for me, that was something that I got into it because I was fascinated as to how
chemicals, biology, physics create a conscious experience. And so like everything that you think,
everything that you do in some way that we obviously are nowhere near fully elucidating can be
reduced down to physical processes. And so I wanted to dig into that as much as I could. But
then as I got started in the PhD, I was like, wow, data set sizes are getting really big.
It seems like there's really interesting things that we could be doing there, detecting patterns
in data, identifying calls of direction in data. And so I went down this road of focusing on
programming and machine learning because I knew that whether I stayed in academia or not,
those would be transferable skills. And I'm not surprised I guess that that ended up being
true. However, so in your PhD, I know that you were recording activity from surgically implanted
electrodes in human brains. And I made this joke before we started recording about how
I felt, you know, I really feel like I made the right choice as sticking to silicon experiments
or analyzing data as opposed to doing things in learning how to implant electrodes into a
ferret. And I was making this joke about how people in my cohort and my PhD were doing that kind
of thing. But then the exact person that I was thinking of has a really nice job at Google Deep
Mind. So it seems like you have insight into that. So I ended up getting here.
Where are we going? Long run away, but we kind of have this. So tell us about your PhD,
how that relates to what you're doing today. There's also, there's the insight data science
fellowship program that you use to transition from your PhD from your academic background into
industrial data science to be interesting to hear about that. And then to finally have it all
make sense, as to how I started off this entire long transition is that I mentioned how you have
it's you have a rich understanding of the back end of a software platform. And so just kind of
how this all came about your your risk depth of knowledge in the field. Yes. The first question,
let's start with the first one. So my PhD, so yes, my PhD, I was recording electrodes. We're
recording data from electrodes, surgically implanted human brains. Basically, what I wanted was,
you know, animal quality data from humans, right? With animal research, you can stick electrodes
where you want, get really beautiful data. You could do in slices. You could do it, you know,
from trained monkeys for years to do a task and then get just beautiful recordings of just like
signals, flowers, and hours of neurons at work. And with humans, you're often limited to things that
are either slow, so FMRI. And it's like, you know, you see it many, many, many, many, many,
degrees of room. You're actually measuring blood flow. You're not even measuring like direct brain
activity. So you're like measuring a side effect of thinking, or you can do EEG, which is then,
you know, electrical signals filtered through the scalp. And even with technology like MEG and so
forth, which is magnetoencephalography, it's it's not the same. You're not you're not at you're not
on the brain. And then this really unique opportunity opens up in the laboratory of Dr. Robert
Knight at Berkeley, which basically to work with patients that are undergoing brain surgery,
often for epilepsy. So in epilepsy that can't be treated with medicine, right, they keep
having recurrent seizures. And so the only solution is to go into surgically remove the problematic
area of the brain. However, before that's done, you need to map out the brain to ensure that,
you know, you stop the seizures, but the person is left, you know, a phasically they can't speak,
or you stop seizures and suddenly they can't, you know, they're blind. And so what you want to do
is you want to basically map out the areas of the cortex around the region of interest and
ensure that first of all, you can localize exactly where the seizure is because remember,
until you really get in there, everything is filtered through the scalp and you have like,
it's really, you can't tell. And so to figure out like where the focal point is and also what's
around it. And so these people basically, what happens is they come in for surgery, they have a
craniotomy, the scalp is removed, the electrodes are implanted, and then they're bandaged up,
and then they're in the hospital for a week with electrodes coming out of the brain hooked up to
an ample, a preamp and antonium. And the best case scenario for the, and then their meds are
tapered. And there's all kinds of things, many, many things are not to try to induce a seizure,
because the best case scenario is basically on the first day, they have seizures, you localize it,
you figure out exactly what's around it. And then like the next day, they're in surgery, they
remove it and you're done. Oftentimes, it's not the case. You need to work to induce a seizure,
sleep deprivation, strobe lights, all sorts of things to get them right. And so we get, most of
the time these people are like sitting in the hospital, just kind of like waiting around, right?
Watching TV or, you know, whatever. And then we come in and if they, if they consent, then we can
come and we can run all sorts of tasks based, we, we ensure that the task is matched to the regions
of the brain that are mapped right. There's no point in doing a memory task if, you know, the parts
of the brain are only motor and so forth, or a motor task if they're only looking at language
regions. And I found this job like very, very meaningful in terms of science outreach, like
explain to them the value that they have, that they have for science and why I'm doing what I'm
doing. And I spent a lot of time just, you know, talking to them about the brain and about my research
and so forth. I also found it like emotionally, incredibly difficult because you're meeting these
people pretty much at the worst time of their lives, right? Like this is just a terrible situation to be
in. And so it's, it's challenging, it's rewarding, it's basically, you know, in a way that you don't
expect your PhD to be, right? Like you go and you're like, give me data, I'm always data, great.
And then there's, there's this. And so I would come in and I would, the task that I was interested in,
I was basically interested in sort of tracking the path of a decision in the brain. So I would
basically, in the beginning, I just had my one task, but then I realized that basically all tasks
that we record are decision tasks. So I entered, I like dip my foot into like the world of, you know,
big data where I could basically take all the tasks that were run. And anytime that there's sort of
a decision to be made, then you have sort of like a motor decision, right? Like you see a stimulus,
right? So that goes, or you hear something so that goes into like your visual cortex or through
auditory cortex. And then it needs to make it to the decision making area of the brain, right? To the
prefrontal cortex, which is where sort of moves forward in the brain, you need to make a decision.
The decision is made and then it needs to go back to the motor cortex to execute the decision.
And what I want to do is I, I basically tracked that, that loop in the brain and was basically
able to look at the activity in the prefrontal cortex and basically say, aha, look, like the
sustained activity that I see in the prefrontal cortex correlates with, you know, the reaction time,
right? I'm able to sort of see when they'll trigger a reaction without looking at
at the motor cortex. I can look, I can see errors. I can see that like the amplitude is also
correlated with errors, basically tracking a throttle through the brain. And because I was on the brain,
I had extremely fast recordings of what was going on. So it wasn't filtered through anything.
The future of AI shouldn't be just of a productivity. An AI agent with a capacity to grow alongside
you long term could become a companion that supports your emotional well-being. Peridot,
an AI companion app developed by With Feeling AI, reimagines the way humans interact with AI
using their proprietary large language models, Peridot AI agents, store your likes and dislikes
in a long-term memory system, enabling them to recall important details about you and incorporate
those details into dialogue without LLM's typical context window limitations. Explore with
a future of human AI interactions could be like this very day by downloading the Peridot app
by the Apple App Store or Google Play or by visiting Peridot.ai on the web.
That is so fascinating. I really didn't just sit at a computer and
learn programming languages and machine learning algorithms, which was interesting, but wow,
I mean, yeah, you're really doing real valuable work. And so I think some of that work would go
back to like, Wilder Penfield. Yeah, so that's when you're actually looking and you're, yeah,
I mean, that's like the multiple, it's like the grandfather of everything that we did.
The electrodes that I was using, and again, like things here move really, really fast.
And my PhD was a while ago. I'm not dating myself, but it wasn't yesterday. And
electrodes have gotten since then smaller. There's also a lot of single unit recordings where
you can actually put in an electrode and record from, like, smaller populations, right? My
electrodes were pretty big and kind of far apart. So I'm recording from larger populations,
but yes, it all comes back to like classic neuroscience. And as I was doing it, so that's like
the data collection, right? But then you collect the data and it's such a rich data set that like
they can sustain you forever. And like the data sets that I collected are still being used
for studies because it's like so it's like rare data, it's expensive data, it's rich data,
you can look at it in many different ways. And so I took my data and data from other studies
and as I was working on it, I said, you know, the brain is amazing, right? There's like, no,
I don't think anyone in our field can argue about that. However, what I found myself like drawn to
where the algorithms, the machine learning, the statistics, the signal processing, the programming
languages, like all of the things that maybe you would say, oh, those are just the methods.
I found that those were the parts of the papers that I was reading, those are like the algorithms,
those were the parts that I was like most drawn to. And so then it was kind of like a natural
transition from there to be like, okay, like this is this is what I this is what I'm actually more
more interested in most of the time. So that that was how I found myself there. I did an like an
NLP class towards the end event PhD kind of in secret. And then like the rest is history.
Nice. Yeah. And then the the insight program. So we've had guests on the past that did this
fellowship as well. So it's intended for I think primarily people who have already done an academic
PhD have have a strong quantitative background like you did. You're doing tons of like you say
machine learning kind of data science techniques. So things like time series analysis,
dimensionality reduction, clustering regression, data permutation. So you had all this
existing experience. And then so the the insight data science program was that a useful transition
after you've done all that you've done the and the secret NLP course. And yeah, it was that
still useful for making the transition industry. Oh, absolutely. Insight data science is wonderful.
They what they do is basically they take people that you know we already have all of these
skills and we've done we've done data science and you know we've done machine learning and we've
done programming. But we're totally totally clueless about the real world because we're academics,
right? We know nothing. And basically they help us sort of frame what we've done in the context
of industry. So talking about startups and funding and jobs and like what is it like to work. And
then also things just like best best practices, right? Like you know version control and things
like that, which some people do in their PhD. Some people don't do it. Like some people's PhD is
like 100% in like a Jupiter in like a single Jupiter number. And so basically like they kind of
get you up to speed for like and like some like for gaps that you have there. But also just frame
what you've done in the context of okay, this is industry. Like here is why what you have is valuable.
Here is how you can use the things that you've done in industry. So they show you know people come
in. They sort of like you know hey look here's how here's like fraud detection at this company.
You're like oh hey like I've done that. I've done it and sort of kind of tie it together for you.
Also you know like salary negotiate like all of these like things. And then at the what
what you end up doing is they say okay in your PhD you had like five six seven years to work on
something. Now you have three weeks to put together a project that actually brings concrete impact
that like can bring be impactful and that you can like pitch it. And then that kind of puts you
in the mindset of like what is a POC. And that also helps employers get around the bias of like
God I don't want to hire an academically. I'm not gonna get anything done.
Right right right. And I think there's there's a lot of relationships between insight and
future employers. Lots of employers are looking for great data science talent like the people that
are in in take in taken into the inside data science program. And so yes you end up with this
flywheel. So and I just I want to quickly go back before we transition away from what you were
doing with your PhD and how you got to what you're doing today. I just I want to talk a little bit
more about so I mentioned while they're penfield and how he was so he was I guess the first person
to be able to map the human cortex to the level of detail. And it was I think it was the same
situation. It was it was many decades ago like the 1950s or something. But same kind of thing
stroke patients open skull. And so recording from these individual electrodes over the whole brain
and so that gave this map of the whole so there's this somatosensory homunculus as motor homunculus.
So it's this like really cool. I encourage our listeners and I'll try to remember to
include in the show notes a link to some images of this homunculus that it shows you so I think
homunculus is Latin for like little man. And so it's the idea of this when you as you go over the
motor cortex or the sensory cortex in the brain there's this map of of your body. And it isn't
anatomically correct in terms of scale. So for example for both this the sensory as well as the
motor homunculus the hands are huge because because you have so much you have so much detailed sensory
perception as well as motor perception in your hands. And I remember like the lips are huge.
But you can do it back is small. Yeah exactly. I think we had the same textbook.
So one thing you could do is you could take a you could take a paper clip right and you can like
put it together so like the it's points or like had some distance apart and you could find like
what is the closest distance on your back basically that you that you can tell them apart because
at some point like you just you're not sensitive enough on your back to tell them apart and then
you put on your lips and immediately you're like oh wow these are super far apart and that
basically is reflective of the fact that you have less representation less sensory
representation of your back than of your lips in your in your sensory strip. Cool.
So yeah so I wanted to recap on that. Oh and then yeah so and there's a very specific reason why
I brought this back up. Not just because it's really interesting which I think it is in and of
itself. But you were talking about how recording electrodes. So I was thinking about how
the recording electrodes that while the Penfield would have been working with many many decades
ago. Not many centuries. Many decades ago. That was just a few years ago.
This podcast will be listened to for millennia. It'll be confusing. So the recording electrodes
would have been much bigger that he was working with and you were working with and you were talking
about how in recent years they've become even smaller. And so then Nakami thinking about how
there is this push with companies like Elon Musk's neural link to have brain computer interfaces
eventually that aren't just for people who have serious issues and have their skull opened.
But there's this move towards in our lifetime potentially having some way of having
recording electrodes on our brains without needing to have invasive surgery. And so I don't know if
you have any thoughts on that. And I also I'm going to be asking you after this episode as to
whether you haven't been known. You may as well guess that you dig deep into that topic.
Yeah, I have some ideas. So actually even today not all brain computer interface is super invasive.
I mean it's invasive in the sense that like you have an electrode in the brain like that's invasive.
But not for all of them do you need like a creamyonomy that opens everything. So for example,
you have for Parkinson's patients deep brain stimulation right where they're they basically
behold they put in an electrode very very specific to the substantiate nigra which is a place in
the brainstem where the block substance. Yeah, there you go. So I want to remember some of your anatomy
where we're basically you know produces dopamine and then when cells start to die then you need to
basically stimulate it in order to get to get around the fact that you're not it's it's not functioning.
So that that's again that's just like an electrode that's that's that's brought in and that's used
to stimulate. And actually there's like how do you decide how much to stimulate or whatever there's
like a like a device that you can like calibrate and then decide how much to to stimulate. And so
that's you know that doesn't require massive cranium. I mean that's already like a feedback loop that
you have that's been it's been around for a really long time. You also have things that are
more invasive but long term. So the patients that I was discussing they basically have it have
the electrodes in temporarily right there they have the cranium that electrodes are put in wires
coming out of the head once they have seizures and everything's localized in the best case they
remove it skulls back in electrodes gone. However, there's also different companies for example
neuropace which they actually permanently implant an electrode strip in the in the person's brain
and then they're able to basically record ongoing and the idea is that they're able to predict or
give some sort of like lead way before a seizure happens and then stimulate the stimulate to stop
the seizure. And they and they record them that's uploaded to to their servers. And so and that's
already that's also already there that's like there's patients walking around at the right now.
And so I think you know on masks is like the next step of that where it's like okay,
it's not clinical and let's say we can get it smaller and smaller and if we think about like
more lawn waste about things that are like you know more and more fitting and getting smaller and
smaller. I think I think what will be there will be there shortly and then it's a matter of like
what like how do you make it mentally invasive how do you make it so that because at some point
like in terms of like how long can it be in before cells start to die before the body starts to
reject it and basically there's a difference between just recording and stimulating and versus
stimulating and what does it mean to like to stimulate and at what frequency and I think there's
some really really interesting questions because already we see so here's another tangent
we see that brains oscillate right so we have like oscillations in the brain basically where
there's different frequency bands that are associated with like different processes right so
like alpha which is between each 12 hertz that's often for like visual cortex but it's you have
beta which is like 15 to 30 hertz and that's from motor movement like when you initiate
bait when you initiate motor movement you have beta suppression so there's things that but what we do
what we also see is that there's individual variability between these frequency bands right so
like my beta is not your beta my alpha is not your alpha my beta is not your theta and so forth
and so anytime that now we're going to go in and we're going to start stimulating you're going to
say okay well I'm going to stimulate a particular frequency but what is that frequency right and how
do I determine that frequency and how do I know what frequency is ideal for me versus for you
to for whatever there's also now this is like again right now it's it's it's far away but if
but we we there there are already like stimulation protocols that we see even
completely non-invasive like there's things with TMS transcranomagnetic stimulation
where people have been experimenting with that and there's also research about using stimulation
for psychiatric disorders and so forth so it's it's a huge huge field and hopefully that wasn't
too much of a tangent no not at all I obviously found it super fascinating and I I mean I think that
any of these kinds of any of these kinds of discussions around how we can be using technologies
to adapt our brains either to resolve some you know some some negative issue like you're saying
from strokes to psychiatric issues all the way through to potentially having enhancements which is
I know like you know some of these brain computer interface BCI technologies are designed to
yet not just be for resolving issues but also to potentially augment human capabilities
in ways that probably we can't predict yet so I don't know I think it's super super interesting
and yes I will be following up with you to see if you have other recommendations for for
people who can begin to kind of a BCI episode so you mentioned metar how your PhD was more intense
than some other people's PhDs neuroscience PhDs certainly much more intense than
orders of magnitude more intense than my PhD was in terms of like being really in the real world
and dealing with dealing with patients and but that wasn't your only intense job that you had
so my reading this correctly you were teaching children how to use tanks preschool tank
instructor no wait it's two separate items um yeah so you were a preschool teacher you were also
tank instructor so I'm curious as to whether those experiences helped you prepare for your career
and in particular maybe this might seem tangential but it wouldn't surprise me of somehow
this does tie into an answer which is that I know that you're passionate about expanding
leadership opportunities for women in STEM careers including data science and so I wonder if
we can somehow tie those two topics together yeah sure why not so so my military so in
Israel there's a mandatory military service I actually did kind of like a strange route when
normally when you're 18 you basically that's when you you start your military service I actually did
my you went to Berkeley I did right yeah you went to Berkeley for an undergrad and then back to
Israel to be a tank instructor and then back to Berkeley to do your PhD yes correct true story
and then now you're back in Israel again and now why can't I make a decision I guess
and so I did it with backwards and I actually decided that for my military service not like there's
there's not a ton of like flexibility in what you do but like there's some and I decided that I
want to I wanted to try out to do something that was like very different from anything that I
would ever do I said I'm probably going to be in an office for the rest of my life I want to
do something very different and I also want to do something that's like kind of scary to me that
like I'm pretty sure I'll fail at it I think it'll be really difficult it's like it goes totally
completely out of my comfort zone because I think that that's important and so and and I said you
know and the risk here isn't that high like worst case like I won't be that great in the military
that's fine and so I tried out to be like an instructor and then specifically I saw a tank I was
like that machine is amazing I want that and so then I tried out specifically to be a tank instructor
and my the way that it works is at least then was that they have women that basically we train
the the tank the like the the soldiers it where basically in it's in a tank you have the gunner
you have the driver you have the loader near the commander and my role was basically to train
the gunners and and then also commanders have to do all of them and officers have to know all of
them and so also training like the commanders and the officers and specifically for training the
gunners I always train on the wet on the weapon subsystems so basically all of the computers and
that basically helps them like understand like all of the computer system within within within the
tank for the the weapon the weapon subsystem and it was kind of tricky to do my undergraduate
before the military service because I would ask my commanders questions like so the algorithm that
it uses to understand like what angle to open is that like does it does it learn the reinforcement
learning or and they were just like maybe like what were you like what planet did you come from
like what are and so that's that's what I did my military service in and so it included like
it was like incredibly physically difficult because they basically make us go through
you know we did basic training had to do like a million and one push-ups run around a lot the
outside not sleep learn how to basically I was trained on all of the subsystems of the tank
not only the gunners basically to learn everything that goes on there and then focus in on one
because I only got basically after you do basic training after you learn everything that's only
only after that and they're like okay now you're going to focus on this and it basically checked
all the boxes of being like really really hard incredibly challenging turns out physical isn't
the tough isn't the tough parts like mentally very very difficult and so that kind of set me up
to being kind of less afraid of failure because it's tough um after that I was a preschool
teacher uh which was by far by far the most difficult job I ever had by far by far oh really
oh yeah wow I even as you were saying that I thought you were going to say like I had this
idea of like cuddles and like laughter yes tons of cuddles tons of cuddles tons of laughter but
like so physically draining uh and like so emotionally like you know I would dream about
my kids and like it never leaves you like you dream about these kids and you're like thinking
about them and like they you know and I was also like more sick than I've ever been like
constantly sick I was like always on like some sort of antibiotics um nobody it's like very
very challenging but it's also um you know it gives you it's also another way to do something
it's like really tough and gives you a different perspective and it's both the military and being
a preschool teacher incredibly incredibly humbling like very very humbling um and so I think
that's like the biggest takeaway for me uh of those things that I did um and now wait wait I need
to tie it back in to women and stems so uh now I'm a mom I have three kids I have a two-year-old
and almost five-year-old and like less than a month five-year-old and an eight-year-old
and so first of all if I'm tying it all into like content moderation why I do what I do I think
it's like extremely obvious like uh you know online harm can turn into offline harm
and I do want to make sort of all interactions safer um and in terms of like I see my daughter
and I see the world that she's she's eight and though like what it means to be to be to be a woman
in this world and the leader in this world I want to make sure that she has role models so that she
isn't the only woman in her computer science class been there um so that she has you know she
isn't the only woman uh in meetings been there um I want to make sure that she has like a much more
welcoming environment for whatever she for whatever she wants to do and what's really sad to me is
that even now uh there's like just like like I'm hearing from her things like oh well you know
boys are better at that than me no not true very not true here's why it's not true um
and so these are the kinds of things that like I want to first of all want to make sure that they're
they're not out there online these kinds of uh you know speaking of disinformation uh but also
want to make sure that sort of the environment that she's growing up into is is much more uh welcoming
nice well it's cool to hear how you are you know your passions are coming through um across all
aspects of your life and that you're tying together um you know these personal things these
you the personal things that you'd like to see in the world with what you're doing professionally uh
with respect to things like disinformation um so we were talking about uh you being an
Israel obviously that's come up number of times at this episode in the military service um another
thing that is unique about Israel is that it has very high R&D expenditure per capita so it is
markedly higher than any other nation uh on the planet and so that probably creates an interesting
flywheel between the strong tech uh startup ecosystem that there is in Israel um so that you know
helps uh generate more things that R&D can be spent on and but another interesting piece
related to this is that um I can remember this was a podcast conversation that I had in the past
I don't think it was so I think this is the first time we talked about it on air but my understanding
is that another thing that's fueling tech startups in Israel is this mandatory military service so
you went and did tank instruction but a lot of people um particularly I suspect a lot of people
that already had undergraduate degrees like you did end up doing things that aren't you know
you know they're not training to be on the front lines and more they're training how to do
bread intelligence they're training how to do signal detection they're using machine learning
and data analysis in the field and then so having developed that skill set for several years
when you finish it you're like well uh what could I do and one idea that I guess a lot of these people
have as well I could be making a startup um I could be using using these technology skills um in
industry so um so we have these flywheels um of I guess there's two flywheels here there's one where
the mandatory military training um creates uh leads people to be tech entrepreneurs
and then that probably in turn also um is helpful for military capabilities in general
uh and then you have this separate flywheel of um of R&D where this strong tech ecosystem
um is a self-fulfilling prophecy of oh great you know we should be investing more in this
and so then more people go into that and yeah so I've I've now talked a lot of lengthy transition
the floor is yours uh so yes yes and yes um so yeah we have mandatory military service it's
currently set uh on like in general roughly speaking two years for women uh three years for men
again lots and lots of caveats um and and uh and there's sort of first of all there's definitely
like a big investment of the military in um in uh technology whether it's like signal processing
or AI or whatever and so um then you have people that are basically trained in that like you said
and then they can go out and they have this skill set and we we hire people you know everyone's
hiring uh those sets of people but even people that aren't going into these sorts of fields the
fact that there's this mandatory military service means that already um from a young age you're
in a place where you're picking up skills that are necessary to succeed in you know in these
companies right so uh for example leadership skills right like you can go uh basically in most
cases in order to become an officer in our military you have to start at the bottom like it's
not like uh in the US where you have like West Point and the Naval Academy or whatever and then
and then that's how you become an officer uh basically start when you're 18 and then uh based
on different parameters you can you can be you can elect or you can be chosen to to do officers
training and so then you have these people that are that are leaving the military with a skill set
of you know being like very focused you know focused and uh and leadership skills and managerial
skills and time management skills and all these things that basically go that's first successful
watching um or successful uh CEO um and so yes it's one of them is like on the job training and the
other one is just like in general these like other skills that you have and another thing that I
think um is really positive of the fact that it's a sort of mandatory military service is that
it's sort of it's like equalizing for us right so everyone goes into the military almost huge
caveat which is like causing right now a lot of social unrest here but we'll leave that for a different
start for a different time uh but you go into the military and you're mixed with with different
people right uh and so that's also way of like meeting people that you wouldn't necessarily otherwise
meet then kind of out of your you know echo chamber out of your specific place and then that can
also be like a very uh like an incubator for like new relationships that can then go off and start
new companies and then um yes I think the fact that uh we have a very very strong investment in R&D
is also like you said the self fulfilling prophecy like what do people want to go into doing what
people end up doing is they go to this field right it's like that's that's what we know that's what we
see that that's that's sort of like um and that's also a very good way for upward mobility um for for
people and so with our field in particular with data science do you think that all of this R&D in
Israel will give Israel and edge in AI technology in particular? Yeah so I uh yes absolutely um I think
that um already uh we're seeing that uh so I have uh people that I that I work with uh uh one of
which uh like it she used to be like very very senior uh in the military and in in in in AI um she's
uh uh she I work with her right closer she's she's our VPI product uh in the weight and so she was
very very senior in the military in building uh AI infrastructure capability so there's already
this sort of cross-pollination uh we also have people that either like I said we hire right out of
the military or uh in some rare cases we have people that sort of start doing their studies first
before right the military says okay like you can take this time you you know we pay for your studies
and you sign on for a certain amount of time later uh for the military and in some cases we can
also hire these people while they're in their studies uh and then the skills that they learn with us
they can then go and use uh in in in in the military um so there is this this definite cross-pollination
that we're seeing and I think that it also definitely puts you know there's puts AI uh is like a very
very strong and core component of the industry here uh because it's so useful not only in the military
right but in general in in in all of the companies they're going on and so there's like this very very
rich uh community here um of you know researchers practitioners and so forth great answer crystal clear
and exciting to see uh what act of fence and other AI companies will be uh doing out of Israel
in the coming years and the coming decades this has been an awesome episode of Mattar so I was promised
that you were this extraordinary speaker and you have proved to be an amazing communicator it's been
a real joy to speak to you thank you and so I'm sure our audience loved this conversation as well
uh thank you and um so we covered a lot of interesting topics um automated uh harmful content
detection neuroscience military service uh preschoolers uh so uh I'm sure our uh listeners will
want to hear more from you um so first uh my pen ultimate question that I always ask
yes is whether you have a book recommendation for our audience of course uh so uh it says nothing
to do with anything that we talked about um but I really like the book under the banner of heaven
it's by john cracker who wrote uh into the wild um I think uh oh yeah okay that um I love reading books
about sort of like other other lives uh or other other places and so under the banner of heaven
is a good one nice yeah john cracker is a uh a standing author based on uh into the wild so
I'm sure that that's a great recommendation he's also he's an annoying person for me when I type
my start typing my name into google he's the one who comes up until I get to the oh in my last name um
this uh always wonderful maybe that's what primed me to think of that book of all books I could
get there you go um and yeah and then my final question for you is how should people follow you
and glean more insights from you after the program uh so I'm on LinkedIn like everyone um I also we
have a rnd tech blog for act defense engineering grad defense dot com and that's where you can read
more about the things that we do uh and uh dive into some more some more details uh and please
feel free to shoot me an email or reach out to me on LinkedIn I'm always happy to chat
nice thank you for making that offer to our listeners metar and thank you so much for being on
the program especially on such short notice we booked you just days before recording this episode
oh don't say that it makes you seem like I have no life
yeah yeah as uh I mean it actually it just shows you know how kind you were to make this time
because you've got three kids with a bp of data in a i add a very fast growing high value
company and so thank you for making the time despite that uh to fit our super data science listeners in
happy to nice well yeah so you mentioned uh potentially being on the show again in the future
and uh that sounds pretty to me we can hear about um how um act defense uh continues to shape
this uh this this harmful content reduction space in the years to come thanks but are thank you
for having me this was fascinating and a lot of fun
I love this conversation today I hope you did too and today's episode metar filled us in on how an
ml model such as a binary classifier can become contextual by taking into account additional
context for example we can pull out a logo from an image identify the individual in an image
and compare it with a database we can examine natural language comments and consider the content
posters history and graph network affiliations she also talked about how real-time streaming of
harmful content presents unique challenges that can be addressed by smaller models on edge
devices like phones sampling on servers and again taking into account context she talked about
how we can create a flywheel of defensible commercial AI systems by amassing proprietary data curated
by internal experts and she talked about how she uses python no.js typescript and kubernetes
for developing ml models deploying them into production and scaling them up for active fence
users as always you can get all those show notes including the transcript for this episode the
video recording any materials mentioned on the show the urls from atars social media profiles
as well as my own social media profiles at superdatascience.com slash 683 that's superdatascience.com
slash 683 your feedback is invaluable both for spreading the word about this show as well as
helping me shape future episodes more to your liking so please rate the show on whichever platform
you listen to it through and feel free to converse with me directly through public posts or comments
on linkedin twitter and youtube all right thanks to my colleagues at nebula for supporting me while
I create content like this superdatascience episode for you and thanks of course to vana mario
Natalie surge silvia zara and kearl on the superdatascience team for producing another
captivating episode for us today for enabling that super team to create this free podcast for you
we are deeply grateful to our sponsors who might hands-liked as partners because I expect
their products to be genuinely of interest to you please consider supporting this show by checking
out our sponsor's links which you can find in the show notes and if you yourself are interested
in sponsoring an episode you can get the details on how by making your way to johncron.com slash
podcast finally thanks of course to you for listening it's because you listen that i'm here
until next time my friend keep on rocking it out there and i'm looking forward to enjoying
another round of the superdatascience podcast with you very soon