693: YOLO-NAS: The State of the Art in Machine Vision, with Harpreet Sahota
This is episode number 693 with Harpreet Sahota of Desi AI.
Today's episode is brought to you by AWS Cloud Computing Services
by withfeeling.ai, the company bringing humanity into AI,
and by model bit for deploying models in seconds.
Welcome to the Super Data Science Podcast, the most listened to podcast
in the data science industry.
Each week, we bring you inspiring people and ideas to help you build
a successful career in data science.
I'm your host, John Crone.
Thanks for joining me today.
And now, let's make the complex simple.
Welcome back to the Super Data Science Podcast today.
I'm joined by Harpreet Sahota for a special episode
examining the state of the art in machine vision.
It's hard to imagine a better guest than Harpreet to guide us on this journey.
Harpreet leads the deep learning developer community at Desi AI
and is really startup that has raised over $55 million in venture capital
and that recently open sourced the YOLO NAS deep learning model architecture.
This YOLO NAS model offers the world's best performance on object detection,
one of the most widely useful machine vision applications.
Through his prolific data science content creation,
including his artists of data science podcast and his LinkedIn live streams,
Harpreet also has amassed a social media following in excess of 70,000 followers.
He previously worked as a lead data scientist and as a bio statistician.
He holds a master's in mathematics and statistics from Illinois State.
Today's episode will likely appeal most to technical practitioners like data scientists,
but we did do our best to break down technical concepts so that anyone who'd like to understand
the latest in machine vision can follow along.
In the episode, Harpreet details what exactly object detection is,
how object detection models are evaluated,
how machine vision models have evolved to excel at object detection with an emphasis
on the modern deep learning approaches,
how a neural architecture search algorithm enabled Desi to develop YOLO NAS,
an optimal object detection model architecture.
The technical approaches that will enable large architectures like YOLO NAS
to be compute efficient enough to run on edge devices
and Harpreet's top down approach to learning deep learning,
including his particular recommended learning path.
All right, you're ready for this eye-opening episode?
Let's go.
Harpreet, my friend, welcome back to the Super Data Science podcast.
It's been over two years since your first episode.
That was way back episode number 457 in the spring northern hemisphere.
Spring of 2021, and that was when I met you.
Since then, although we've never met in person, I would say we're friends.
Yeah, absolutely.
So good to be back on the show.
I can't believe it's been two years.
And yeah, like me and John are on text message and phone call kind of basis.
I hit them up whenever I need some advice with whether it's a career,
mover, whatever, just to say happy birthday and all that.
So we've been in touch over the last two and a half years,
but can't wait to meet in person someday.
Yeah, now that the pandemic is over, I'm sure we'll run into each other at a conference.
So we're somewhere in the world soon and I'm looking forward to it.
So when you were last on the show, the episode was focused on landing a data science dream job
because you were, I mean, you still are very involved in encouraging people's careers,
but you were doing it explicitly as part of how you were splitting up your week back then.
At that time, you were also full-blown focused on the artists of data science podcast,
which is a very cool concept for a podcast.
It's designed for a data scientist audience,
but it's talking to guests that aren't necessarily data scientists,
they're philosophers and writers.
So you're really trying to cultivate creativity and possibility
artistry amongst data scientists.
So yeah, really cool format.
And I know that with all the things you got going on,
you're not releasing those episodes as much anymore,
but you've got a new podcast, Deep Learning Daily.
Is that a daily show? Is it really daily?
So Deep Learning Daily, it's the name of the community that's desi's kind of sponsoring.
And so this community, we've got a, I've done a number of virtual events for it and recorded
like podcast episodes. Those will all be released in due time.
I've got like a backlog of 30 something episodes.
It's just a matter of time to edit it and all that.
And I'll actually, I'll be at CVPR next week recording interviews with researchers and stuff.
And that'll all be released on that podcast.
But yeah, Deep Learning Daily is like the discord community.
And then it's also the sub-stack, which is the newsletter.
And that's where the audio and video will be housed for that.
And then also it'll be on
podcast platforms as well.
Like this podcast is like almost 180 from the artist's data science,
because this is completely technical.
And we're just getting into the needs of Deep Learning with people, which I found extremely
fascinating. Big departure from the artist's data science, but it's a good direction.
Well, very cool. I remember over two years ago, when we did record your episode,
you weren't, you were very interested in Deep Learning.
But you hadn't formally studied it that much.
I don't think you'd done anything in production at that time with Deep Learning.
But now, because of working with Dessie, you are full on
into Deep Learning because there's such big Deep Learning specialists.
So it's exciting to see all the posts you've been making.
And I'm looking forward to digging into your technical expertise in this show.
So yeah, back then, you were a data scientist at a company called Price Industries.
And now a couple of job changes later, you are Deep Learning developer relations manager.
I'm getting that currently at Dessie AI.
And so Dessie AI is a Deep Learning Acceleration platform, very cool company actually invested in
by George Matthew, who was a guest on the show here in episode 679.
And so yeah, fast growing company.
What I want to focus on in this episode is specifically the groundbreaking foundation model
that Dessie released called Yolo NAS. And so Yolo NAS is a computer vision algorithm.
So we're going to start with like the basics a little bit.
And then we're going to quickly ramp up into more and more technical detail.
So that hopefully all of our listeners can follow along.
But so start us off by just explaining Harpreet what machine vision is.
Yeah, so when you think of like just traditional image classification rights,
you'll have let's say some image, you present that to your neural network,
and then you'll obtain a class label and some probability kind of associated with that prediction.
So you got one image in, one class label out.
And that's great when the thing that you're interested in is maybe
characterizing the majority of that image space or it's like the most prominent thing in that image.
But object detection takes that one step further because it's not only telling you what is in
the image at the class label, but also where in the image that thing is, right?
And it's telling that via a bounding box. So you get bounding box coordinates.
So to object detection, you'll input an image, you know, video is still just a series of images,
but you input an image and you obtain multiple bounding boxes and class labels as the output.
So just kind of, you know, at the core of this that, you know, object, any object detection
algorithm kind of follows a similar pattern, right? The input is the image that you want to
detect objects in, and then you'll get an output of a list of bounding boxes,
which are the x, y coordinates for each object in the image, the class label, and then
some probability score associated with what the network thinks that thing is.
Nice, very cool. So the kind of the classic machine vision task. So 10 years ago,
machine vision researchers, the state of the art was mostly focused on classifying the whole image,
like you were just describing. So famous architectures like Alex net released in 2012 was working
on the image net large scale visual recognition competition data set. And they were in that
competition, you're just trying to get the whole image right. So like the image would be primarily
of a cat or a dog or a plane or whatever. And you'd want the algorithm to be able to identify
that with object detection that you're describing. It's this more specialized task where
there could be lots of things going on that could be a cat playing with a dog. And so you have to
be able to first put a bounding box, like you said, which is just a rectangle of any dimensions
that, you know, so it could end up being that in the image is just a big cat face. And then the
bounding box would just be like basically like the whole image. Or it could be that there's like
it just a cat in the corner. And there's nothing else like it. I don't know. The whole rest of the
image is just like plain brown is nothing going on. So they need to have a little bounding box
in the corner where the kitten is. Or you could have a bunch of things going on a bus and a
horse and a cat. And so you have these rectangular bounding boxes all over the image. And then
you're tasked with correctly identifying what's in that bounding box. So it's a much more complicated
task because you're you're breaking down the big image into a bunch of smaller images and then
classifying each of those objects that you find that you detect in the bounding boxes. There's
also to maybe help people kind of understand how there are like different kinds of machine
vision tasks. There's also image segmentation, which is instead of being bounding boxes,
it's pixel by pixel. So every single pixel in the image gets classified as a particular category.
And so yeah, like different kinds of approaches. I think image segmentation, like
maybe that's better when you have a relatively constrained set of classes where like maybe in like
a self-driving car application, you're like, okay, we need to be able to identify like pedestrians,
cars, road. And so you have kind of this relatively finite set of
classes that you'd want to be able to buck it that pixels into. But I think that object detection,
and I'm not an expert in this stuff, you might know more than me, but with object detection,
you could have, I mean, with modern deep learning algorithms, it's probably almost limitless.
Like you could have a crazy number of possible objects. You might even be able to do, whoa.
I just thought of this, right? I don't spend much time thinking about this Harprey, but you could
even use object detection to find where the bounding boxes are. And then you could use like a clip-style
algorithm to just, you could literally describe anything like then you have an infinite number of
classes. Like, so that's kind of a, say, anyway. That's a lot of lines of that new model, I think,
meta-put-out segment, anything. So I think that's right along the lines of that. I haven't gone too
deep in the paper, but it's essentially what you're describing, a promptable segmentation type of
model. But yeah, segmentation, that's an interesting thing is segmentation is this concept of
things versus stuff, right? Stuff is just the amorphous stuff in the background and things are
the group of pixels that are together that belong to a particular thing. Yeah.
Are you stuck between optimizing latency and lowering your inference costs as you build your
generative AI applications? Find out why more ML developers are moving toward AWS
training and inferential to build and serve their large language models. You can save up to 50%
on training costs with AWS training chips and up to 40% on inference costs with AWS
inferential chips. Training and inferential will help you achieve higher performance,
lower costs, and be more sustainable. Check out the links in the show notes to learn more.
All right. Now back to our show.
Yeah. And so with object detection, which we're going to focus on this episode,
you're identifying the things in these bounding boxes in the image. And so tell us a bit about
there's in recent years, YOLO style architectures, which YOLO is like when you're a college student,
I guess going on spring break, it's you only live once with these architectures. It's you only
look once. So there was this original paper a number of years ago on YOLO architectures. But I
think you probably know enough about this that you can even go back before YOLO and tell us kind of
how object detection has worked historically at a high level. And then what these YOLO architectures
changed and why those are like the dominant architecture for object detection today.
Yeah. Even before like deep learning, there was, you know, people were doing object detection,
but they're using, you know, something like histogram of gradients plus maybe some linear
support vector machines on top of that to do whatever they need to do. But of course, with
classic ML, like you have to hand engineer all those features, which is, which is a pain.
But then right around 2014 is, you know, the the dialects in that moment happened in 2012,
and obviously people started going crazy over neural nets and deep learning. But 2014 was like
one of the earliest object detection, the earliest deep learning based approaches to object
detection that was called RCNN. So that's region based convolutional neural network.
So that one, like you would take in an input image, then you extract a bunch of,
what they call region proposals, then, you know, then you'd use some CNN to do some type of
feature extraction and then classify whatever the regions are, right? Yeah. CNN being convolutional
neural network, which was like, and we get actually, so just to like really quickly digress,
in natural language processing, which is, by expertise, really, we have now gone completely from
using recurrent neural networks, RNNs or LSTM's, long short-term memory networks. We now are
completely in this transformer world where we're like, wow, transformers are way better because
they can take into account like all of the context in a given piece of language. There are trade-offs,
the people are working on where like the the compute complexity squares polynomially,
when you're when you're working with these transformer architectures. But is RCNN still used a lot
in machine vision today? Or are those also moving over towards transformers?
I think they're still being used for sure, but transformers have crept into the
computer vision world as well. So there's a lot of new transformer-based kind of architectures
for detection and classification tasks as well. So yeah, it's definitely creeping into there.
I haven't done too much research into that. Like, I've only really been doing deep learning for
about a year. And so this is all my long list of things I need to figure out.
It's just a quick digression because yeah, so you're confirming for me that transformers are
being increasingly used in machine vision, but CNNs are also still being used. So that is a bit
different from natural language processing where like I don't really know anyone who's using
RNNs anymore. But anyway, so CNN, so you're talking about RCNN in 2014 being the first big
deep learning architecture for object detection. And then I started interrupting it.
Yeah, yeah. And then there's improvements on RCNN. There's fast RCNN, faster RCNN.
And the thing with these type of methodologies is that you need to do two passes through your
neural network to classify and then get the bounding box coordinates. But then Yolo came along and
Yolo kind of just changed all of that. So instead of having to have two passes through your
neural network, you only look once. You only have to look at the image once and do one pass through
the neural network. And so they became they took off in popularity because they're impressive speed
and accuracy. So I don't know if you want to quickly just talk about kind of like the anatomy of
an object detection model. Like you've got the backbone neck ahead, right? So the backbone is
is where we were extract features from an input image. And you know, typically that's done using
a convolutional neural network. And then it's capturing kind of hierarchical features at different
scales. So lower level features like, you know, edges and textures are being extracted in,
you know, the more shallow layers and then as you get deeper, you get more higher level features.
Like, you know, more kind of semantic type of information. And then the neck is the part that
is connected to the backbone ahead kind of does that actual classification and bounding box prediction
for you as well. Yeah. So Yolo came around in at the paper was published in 2015, but it was presented
at the CVPR conference in 2016. And it just, you know, took the world by storm with how fast it was.
And it smashed the previous state of the art. So map MAP mean average precision. That's the metric
that we look at when we measure how good a object detection model is. Do you want to talk about
that we could, I can give it, you know, yeah, let's let's talk about that. Just really quickly,
I'm going to get you to elaborate a little bit on CVPR. So you've mentioned that you said how
with your deep learning daily podcast, you're going to be there at the time of recording in the
future. And so probably by the time this episode's released, it'll be in the past. But you're going
to the CVPR conference, which is it's the premier computer vision conference, right?
Yeah. Yeah. Computer vision pattern recognition, but really it's like a deep learning
type of conference. And yeah, one of like one of the well-known ones kind of up there with
with Nero Ips and and what's the other one? ICML. Yeah. Big conference.
But yeah, and I guess this one skews a bit more towards machine vision applications than those
other two necessarily would super cool. So yeah, so yeah, so you're saying, yeah, these kinds of
big breakthroughs like YOLO get published in the CVPR proceedings. And then you are about to tell
us about map, which is a key metric for assessing the quality of a object detection models output.
So I guess like there's these two kinds of trade-offs that you're trying to work through like you
you want it to be fast. And so you've been talking about how from RC&N to fast RC&N to faster RC&N
obviously they're going faster. But then YOLO was a big step change because it didn't require
these multiple passes. It was able to in a single pass be able to both identify where objects are
in the images as well as classify those images, which is kind of a mind-blowing thing to me like you
you described it there through the backbone neck and head anatomy. But like because I haven't
spent enough time with it myself it's still something that like I'm kind of mind blown by. But
so obviously speed is a key consideration. But then simultaneously another big consideration is
accuracy of course. And so I guess map is like the key way to measure that accuracy.
Yeah and it's based on precision and recall so familiar terms to classical you know data science
folks. So so map gives you kind of a balanced assessment. And then there's another assessment that
you call IOU that this intersection over union that tells you how good your bounding boxes is.
And then there's another thing in there in object detection that we call non-max suppression that
helps filter out you know redundant bounding boxes that you might get during training. But
so just kind of breaking it down right so we all know precision is right it's just the model's
ability to make an accurate positive prediction right. And then recall is the number of actual
positive cases that the model correctly identifies right. So precision you can think of it as like a
sharpshooter and it's hitting the target accurately and and recall is like a detective who is
catching all the suspects in the crime. So there's always that tradeoff between precision and
that's what I love that analogy that is yeah that is easily the best analogy I've ever heard for
explaining precision precision and recall. And maybe now like I'll be able to remember it I
don't want to have to keep going and we can be there every time someone's like yeah I know what
that is everyone's always talking about that I of course know what it is. Yeah yeah that dude tries
me like I get confused all the time I don't ask for the formula but I don't know what formula is
I'll stop my head but so what mean average precision does is you know we have the precision recall
curve and map mean average precision is calculating the area under the precision recall curve to give
us like a balanced assessment for how good this particular model is. So what it does is it's
incorporating the precision recall curve like I've mentioned and plots precision at against recall
for different thresholds right. So when you do it this way you get kind of a more more balanced
assessment because you're considering the area right under the curve and then you have to think
about the multiple objects that are you know being detected in an image. So map is able to handle
multiple object categories by calculating each category's average precision separately and then
taking the average across all the categories right. So it's measuring average precision for the
object you do that for all the different objects and then you average that and you get the average
precision. Now there's another concept to measure how good the object detection model is intersection
over union and this just measures the the quality of the predicted bounding box by comparing kind of
intersection and union of areas. Got it. So the map the mean average precision is for figuring out
once like within the bounding boxes how accurate are the predictions and the IOU is more about
how accurate are the placements of the bounding boxes. Yeah yeah exactly. That makes a perfect sense
sweet. So you've given us an introduction to YOLO what has happened since because I think the
original YOLO architecture said it was like 2016 and so since then there's been a bunch of
different versions. There's like YOLO version 2 and YOLO version 3 and I don't know if those
were like incremental changes but ultimately it leads to what Dessie's been up to and just
this year made a big splash with the release of this YOLO NAS architecture. So like take us on
the journey to YOLO NAS. Yeah so yeah the first YOLO YOLO V1 like the paper was published
2015 it was presented at CVPR 2016 so it's you know it's been around for a while. And since then
it's been 16 different YOLO models have been released. There's this really really good paper on
archive called a comprehensive review of YOLO that goes from YOLO V1 and beyond by Juan
Terven and Diana Cordova Esperanza and they recently just updated it a few days ago to include
YOLO NAS on it as well. 35 page research paper but I summarized their findings in a edition of
the Deep Learning Daily newsletter but yeah like there's been a bunch of YOLOs and you know
what characterizes all of them is just speed and accuracy. So you know the first three YOLOs
YOLO V1, YOLO V2, YOLO V3. These were all created by some renamed Joseph Redman and Ali
Faradia, I believe his name is. These are the original creators of YOLO. Redman left
computer vision research for you know ethical principles after YOLO V3 but people have kind of
just adopted that name as you know as a framework there's there's this brand affinity with
with the YOLO name. Did you say you know so I don't I don't know if it was just like a like a glitch
in the recording or what but just as you were saying why Joseph Redman left did you say for ethical
reasons? Yeah for ethical reasons he was not he was not happy about the application of his research
for military purposes and obviously envision the military purposes that somebody would use
object detection to target things and yeah yeah yeah yeah this stuff like okay like detect you know
what kind of plane this is oh it's it's a passenger plane versus it's a Russian MIG and yeah
and then it can be used to yeah I don't know the extent to which they are like automated systems
in terms of like actually firing it or if there's a human in the loop but obviously yeah there's
some pretty concerning ethical implications. Yeah yeah and so he left computer vision research and
then somebody named Lexi Bach Koshch I can't say his last name Bach Koshchowski he started off with
the YOLO V4 and so YOLO V4 hit the ground and after YOLO V4 was YOLO V5 so okay so YOLO V3 was
originally I think it was like in the darknet framework for C++ but then an engineer named Glenn
Jocker took YOLO V3 ported it over to PyTorch so it became available to the PyTorch community so then
Glenn then created YOLO V5 you know completely new architecture I released it in you know
as a PyTorch kind of model there's a bunch of controversy about that don't delve into that too much
but yes since then there's been a number of YOLOs like scaled YOLO YOLO R you only learned one
representation YOLO X which is exceeding YOLO series in 2021 there's the YOLOs that came out
of the paddle paddle research group in China then you know Glenn and Ultralytics published YOLO V8
earlier in 2023 January 2023 but really like you know the prior to YOLO and asked the real state
of the art was YOLO V6 YOLO V7 and YOLO V8 right so our model was inspired by YOLO V6 and YOLO V8 some
of the blocks that they had in there we kind of fed that to our neural architecture search algorithm
and ended up with YOLO and asked so what is YOLO and asked YOLO and asked is in a nutshell it's an
object detection model a new state of the art and it's outperforming YOLO V6 and YOLO V8 in terms of
mean average precision and inference latency so that means it's more accurate and it's faster
and it's improving upon some of the limitations of the previous YOLOs you know previous YOLOs
didn't really have adequate quantization support you know the tradeoff between accuracy and latency
wasn't the best and you know we're able to now be faster in real-time detection as well
now me that YOLO and asked is you know supports intake quantization so it's just the natural next step
and it's you know it's not going to remain state of the art forever like object detection is
this super competitive like field of research there's people around the world working on this
you know I'm sure somebody will beat us soon enough but you know we'll be we'll be back at it
that's why I was so urgent that I get you on the show now
I knew this episode would be live while you guys are still the number one object detection algorithm
exactly
the future of AI shouldn't be just of a productivity an AI agent with a capacity to grow alongside
you long-term it could become a companion that supports your emotional well-being
paradot an AI companion app developed by with feeling AI
reimagines the way humans interact with AI today using their proprietary large language models
paradot AI agents store your likes and dislikes in a long-term memory system enabling them to
recall important details about you and incorporate those details into dialogue
without LLM's typical context window limitations explore with a future of human AI
interactions could be like this very day by downloading the paradot app by the Apple App Store
or Google Play or by visiting paradot.ai on the web
so yeah so super cool so you'll and as is the fastest most accurate object detection algorithm
yet awesome that you're saying that that means it's so fast now that it can be used in kind of
real-time applications and so I think a key thing here is we know yolo stands for you only look
once but what does the NAS NAS stand for in yolo NAS
yeah so that stands for neural architecture search because the way this this architecture was
discovered was through this you know auto ML kind of technology called neural architecture
search typically you know people discover architectures they're doing tons of research
and all that well we just looked at what was out there what worked input that into our giant
auto-nack engine and got this architecture so let's just kind of talk a little bit more about
neural architecture search so what is this thing trying to do right it's it's trying to find
like an optimal network architecture for a specific task like for example that task could be
detection classification segmentation whatever and what you know neural architecture search does
is that it's automatically searching through possible architectures so in the case of yolo and
NAS like our architecture we search space well we'll talk about search space in a second here but our
architecture search space was 10 to the 14 different architectures a ton of architectures yeah so instead
of like you know relying on manual trial and error or human intuition you know NAS is is using
optimization algorithms to find the architecture so that we're balancing accuracy flops you know
flimpling operations that's computational complexity and like the actual size of the model
so you know how is it doing this well you know the search algorithm could be as simple as
grid search or random search or it could be more complex like Bayesian genetic reinforcement learning
but it's kind of just talk about neural architecture search though so we need we need basically
three things to make this happen a search space search strategy and then some way to estimate
the performance of the architecture that we end up with so the search space itself this defines
all the possible sets of architectures that the that are our algorithm can explore and so what's
the search space consist of it could be as simple as the number of layers in a network or it could be
as complex as the types of layers the types of blocks the connections between layers various other
hyper parameters all you know all the different you imagine like you know building a Lego house
right like there's a myriad of different Lego pieces that we have right if you are trying to
maximize the square footage of your Lego house by using the optimal blocks right this is what
neural architecture is kind of deep at a high level like intuitively we're trying to find the right
blocks to maximize something so at desi like the thing that we have the auto-nack engine it
takes it just a step further because in addition to everything I mentioned before we also consider
the hardware that you're deploying on and your data characteristics so the hardware could be you
know in the case you all and ask we optimize it for the T4 GPU which is industry standard for
detection you can even look at you know compiler's quantization as well but you know this search
spaces it influences how the end architecture you know ends up being so then we have a search
space but then how we need a way to search through this space because that's a ton of different
pieces and so again the various methods random search basing search reinforcement learning
evolutionary algorithms gradient methods whatever and this impacts how long you're searching for
once you got those in place then you need to have like some way to estimate the performance of your
of your outcome architecture and so this could be as simple as just training each architecture that
you end up with on the data set that you're intending to use it for and just measuring the performance
or you can do more advanced techniques that I really don't know how these work but like curve
extrapolation one shot NAS weight sharing things like that but you put all that in right that's
what goes into NAS the search space the search strategy performance estimation strategy and then the
output is an architecture that's optimal or near optimal according to whatever metric you have
in a nutshell very cool that was a great explanation and I love the lego analogy you are the king
of analogies to make this to take yeah difficult to visualize concepts and make them suddenly
instantly visualizable so very cool so all of this neural architecture search all of this
NAS concept this is something that desi has developed right yeah so yeah neural architecture search
it's it's an active field of research the thing that differentiates desi's neural architecture
search is the actual algorithm itself so that's what's proprietary for desi is our
algorithm got it got neural architecture search yeah so NAS is both like the name of a field of
research as well as like in capital letters a specific algorithm that desi's developed
yeah in our case we call it auto-nac auto-nac architecture construction that way we can put the
TM on there got it nice so auto-nac is desi's proprietary NAS algorithm perfect that makes a lot of
sense so that was an amazing tour heart rate of computer vision object detection and then the
architectures that have led us to the state of the art yellow NAS architecture today including
the auto-nac approach that allowed the neural architecture search to identify this optimal
architecture and so if there are people out there who would like to learn more from you about
computer vision I understand that you are working on an intro to computer vision course which is
expected to be out in September or October of this year on LinkedIn learning that's cool yeah yeah
yeah doing on LinkedIn learning it's it's gonna be a cool course so like at the audience for this
course are people who are like me before I got into deep learning so if you're comfortable with
statistics math Python programming classical ML if like you're good with all that and you're like
looking at this deep learning thing and wondering like okay how how can I get into this then this is
the course that I made for you I made it for an earlier version of me and it goes through like I
start with like a history of computer vision for image classification and I talk about you know
important concepts like the things that I felt I needed to understand before I got into deep learning
so I kind of structured it that way I start from pre deep learning methods just briefly touch on
those I talk about the importance of image net because I didn't know what the hell image net was
like when I first got into deep learning and the importance it had and why it's important and
then going to like Alex net and then I talk about a few different architectures there like kind of
fundamental architectures you know a pay homage to Alex net and Lynette but also you know I
we talk about resonant efficient net mobile net and a red net as well I think I talk about that
and we do everything it's you know it's I do it devoid of much math I try to make it
as math unintensive as possible and the example projects are all done using the super gradients
training library nice I've never heard of that before super gradients training library
yeah yeah it's it's desi's library it's the official home of yellow and ass and it's it's it's
a pie torch based training library is just it includes a bunch of like train tricks right out of the box
and it just abstracts a lot of the workflow a lot of the boilerplate so you don't have to write it
so I mean I know you're huge fan of pie torch lightning you can think of it kind of like that
cool yeah so it's like a wrapper around pie torch that lets you do things more quickly when
you're needing to be training a model so like in base pie torch for example like you need to even
be writing this structure of your loop through epochs of training and then each step within the
epoch but pie torch lighting abstracts out of way so this does a similar kind of thing and is
particularly for computer vision or it's it's quite general yeah if they're related through
theoretically can use any pie torch mod model like any and then module for super gradients it'll
work just fine but our models do the pre-trained models that we have they're all mostly computer
vision at the moment and we got pre-trained models for classification detection segmentation
and pose estimation as well and then we'll be expanding further in the near future nice very
cool yeah you can see how that's kind of CV focused with stuff like pose estimation which I'm
assuming is like predicting what kind of pose a person has in an image or video which is obviously
going to be specific to CV very cool so a concept that you've mentioned a couple of times
and that I've touched on in recent episodes of the show as well but I'd love to hear you tell us
more about in the specific context of your loan as is this idea of quantization so quantization
allows us to to reduce the size of our model by instead of using full precision data types we
use half precision or maybe even less precision so you're you're reducing the amount of
memory and compute required for say the parameters in your model and so yeah
fill us in on the yellow NAS approach my understanding from my vague understanding of yellow NAS is
that it uses something called a hybrid quantization method so maybe fill us in on what that means
relative to standard quantization yeah yeah so like exactly like you said like quantization is we're
taking the weights in our model from like some high precision floating point representation like 32
bit for example to some lower precision representation 16 8 or bit whatever two bit that'd be really
impressed but it you know reduces the model size and increases inference speed but you suffer
inaccuracy like you pay for it somewhere but this is a great approach because you're going to be
able to deploy that model on hardware with limited you know computational resources so that's
kind of like the standard naive quantization method then there's the hybrid quantization method which
is a little bit more advanced form of quantization and in this case you're selectively applying
quantization to different parts of the architecture based on the impact of the model's performance
on the hardware so it's more of a selective approach that helps maintain the performance of your
model but you still get the benefits you know reduce model size and increase inference speed so
within the context of yellow and asked this hybrid quantization method it's like selectively
quantizing specific layers in the model and so it's trying to optimize the accuracy latency trade
off while still maintaining performance as opposed to like you know standard quantization which
is going to uniformly quantize like all the layers and that causes more accuracy loss
and there's pros and cons to both approaches right so standard quantization of course we mentioned
it reduces model size increases inference speed but drops at accuracy right so it treats every part
of the model the same which is probably not always ideal hybrid quantization we're maintaining
the model performance but still reducing the model size increasingly inference speed but the thing
is like it's more complex to implement and it involves identifying which parts the model should
be quantized and to what degree they should be quantized very cool that was a crystal clear
explanation and you even kind of did the summarization back to the audience that I usually do so I
don't even have anything else really to add yeah it makes perfect sense so yeah quantization is
reducing the complexity of the data type and that does have this trade off of accuracy going down
when you're having the speed increase and so yeah hybrid quantization is this cool way of
selectively quantizing only parts of the model so that you get the speed increases without
the accuracy loss super cool and so I guess that kind of quantization could come in handy along
with potentially choosing your model size so I know that there are a few different yellow
NAS versions there's a small and medium and a large version and so yeah there's particular
circumstances where you might want to be using one or another I'm guessing that something like
large is going to be slightly more accurate but a little bit slower that kind of thing
yeah yeah the only difference between small medium large is just the number of parameters
that that architecture has so a small version has about 19 million parameters the medium version
has like 51 million and the large version has like like 67 million something like that so those
are just you know that's that's all those small medium large means deploying machine learning models
into production doesn't need to require hours of engineering effort or complex homegrown solutions
in fact data scientists may now not need engineering help at all with model bit you deploy
ML models into production with one line of code simply call model bit dot deploy in your notebook
and model bit will deploy your model with all its dependencies to production in as little as 10
seconds models can then be called as a rest endpoint in your product or from your warehouse as
a sequel function very cool try it for free today at modelbit dot com that's m-o-d-e-l-b-i-t dot com
nice and so there might be listeners out there who are thinking of particular object detection
use cases that could be useful like maybe they work for a municipal government somewhere and they're
like oh we could be using this to be monitoring traffic or maybe they work for a national park
and they're like we could be using object detection to be monitoring wildlife and so they might
want to be able to deploy their model onto a very small low energy device maybe something that can
be like solar powered so they might want to have their object detection model on like a raspberry pie
or an Nvidia Jetson these very very small processors does Yolandaz like support that kind of
or like particularly paired with quantization is Yolandaz gets small enough for those kinds of
edge devices yeah Phyllison yeah yeah there's uh we we offer a bit quantized version of Yolandaz
so Yol the full Yolandaz itself was optimized for Nvidia T4 GPU we're working on
making a version for the Jetson device so we're almost there it's a little bit more research
work to go into it but you know we offer the intake quantization model that that could be used but
in general like you know like we can talk about the things to consider when we're deploying on
these edge devices right because like you mentioned they're small devices right so
they typically they're not a full blown laptop for example right you can only fit
you know certain hardware on there and certain memory on there so the things you have to consider are
model size right because the model that you have needs to be small enough to fit on the memory of
the edge device and deep learning models they get huge right they can get into the gigabytes
so in order for you to get that model to fit right you you could be in the lab building the most
accurate model ever and it's amazing but then you go to deploy you're like oh this thing does
not fit on the actual device so then you can look at isn't techniques like model pruning
where you're just you know getting rid of the layers in the network quantization knowledge
distillation we can talk about knowledge distillation a little bit if you like but you know these
techniques help reduce the size of the model without too much of a loss in performance so like
I mentioned yellow and asked it's intake you know we're quantizable to int 8 which means that you
can reduce the model size to get on those edge devices but that's not the only challenge you have
you also have inference speed right because the model needs to be able to run quickly enough
to process incoming data in real time or near real time right and so this you can achieve by
using you know efficient architectures and some of the optimization techniques that we
talked about in the case of Yulenaas like we use auto-nac to find the optimal architecture
that was hardware aware for the t4 and it's considering all the components in the inference stack
you know compilers and quantization and all then the other thing got to worry about is power
efficiency because like mentioned you might not have this thing plugged in it might be running
just hanging somewhere it might be solar power whatever so there's a tradeoff between accuracy
and power consumption why because high model accuracy usually means more complex computations
more complex computations means more power needed to do those computations and then finally
it's just software compatibility because you have to deploy this thing on that device
and there's you know what runtime are you going to deploy it on so Yulenaas it's compatible with like
like Nvidia tensor rrt which is a common one yes so those are the considerations to make in those
scenarios nice very cool that was a really nice in-depth breakdown of the kinds of considerations
we'd want to make as we deploy to these smaller devices very cool I didn't know before I asked
you the question that you had this level of expertise in it so that's awesome definitely not an
expert just just learning and listening and and taking notes nice yeah clearly very well you
it's it's been awesome the level of depth that you've gotten into and all the topics so far
so we've obviously talked about Yulenaas a lot in this episode people might be excited to use it
are there like commercial restrictions for people using it either you know out of the box
or fine-tuned to some particular object detection task that might be relevant to their users
yeah so the so it's a bit of a non-standard license for Yulenaas right so if you want to take our
pre-train model with its weights you're free to use them but if you're using it for commercial
purposes then you need to get permission from desi so the pre-train model weights themselves are
not open for commercial use however the architecture itself the architecture itself is like you could
take the Yulenaas architecture start from scratch and train it on your own data right and so that
has like kind of its own you know that that affects a lot of things is pros and cons with that as well
but yeah long story short the weights for Yulenaas as they are are are not available for commercial
use but you could take the untrained model with this architecture trainer on your own data and
you're off nice that's cool that's a nice compromise and we see that that kind of thing with
very large models that are released today so Metta's Lana architecture for example
for natural language processing they were releasing it for people to use for academic use but
then someone that they gave permission to do that with released the model weights to be available
to torrent and then but it but you know you still if you're a responsible business owner and you
should be if you don't want to get into a legal quagmire in the future you then can't use Lama
for commercial purposes you'd be insane too and so this kind of approach that you're suggesting
with Yulenaas where people can take the general architecture and then put the the considerable
expense in that you guys have and deserve some some reward for if they want to put that considerable
expense in to train the model of himself fine tune it to their particular use case go for it I
think that that's a really nice compromise yeah yeah and I mean it's it's not always easy to train
like models like that right out the the bat so you know sometimes it makes sense to kind of just
fork over whatever it is and and get the pre-trained weights yeah um and yeah they really get to
like there's there would have been a huge amount of compute required in the auto-nack
neural architecture search that you guys did and so now that all of that huge amount of compute
has already been done and this optimal object detection and architecture exists you've already
left people in a really great place um yeah even if they want to just be yeah going from there and
coming up with a commercial use case very cool now so a term that I hear a lot that I talk about a
lot in the context of these very large models whether it's like the long architecture we were just
talking about which is specific to natural language processing or whether it's a big machine
vision model like Yulenaas is we can talk about these as foundation models so Harpreet can you
fill us in on what that means to you and what makes Yulenaas a foundation model yeah so foundation
model to me I kind of just go about the definition from that paper and it's uh any model that was
trained on broad data using semi-supervised or self-supervised techniques and Yulenaas fits that bill
so it was pre-trained on the object 365 data set and this has you know two million images and
365 categories and some crazy number of bounding boxes um in that data set um so this gives
just a large number of images and categories gives you a wide range of examples for this
architecture to learn from and this improves its ability to you know predict on downstream kind of
tasks um but we didn't stop with object 365 for pre-training we actually went uh another
pre-training round after that on pseudolabeled cocoa images so we took so okay so pseudolabeling
what does that mean so that's this this a semi-supervised learning technique so what semi-supervised
learning that's when you use a small amount of label data and lots of unlabeled data pretty much
right so we we we had a pre-trained model a model that we trained already on cocoa right and if
it was our version of YOLO X which we have available in the models for super gradients we took our
version of YOLO X and then on the cocoa test set which nobody knows the labels to um we labeled it
using YOLO X our version of YOLO X and generated labels for that data set so that gave us more data
to train on 123,000 more images to train on um so doing this like you know we're able to use
that test set to generate labels referred to training uh so this improves the models predictions
and ability to work on new data because now we're giving it even more data to learn from so because
it was trained on this you know such an extensive and diverse sets of data um it's Y YOLO X has
its high performance and generalization kind of abilities um yeah so I guess you know
the intuition I guess behind that is that these architectures that you train on large enough
datasets uh end up serving like as kind of a generic model that generalizes well for different
types of downstream tasks um so we tested how well it it it works on um on uh the robo full 100
dataset um which is this new benchmark uh made up of uh it's made up of 100 different datasets
from the robo full universe um and we performed well on that we beat pretty much all the uh modern
YOLOs um but then taking a step further for the training like we use some more training techniques
like knowledge distillation and then something called distribution focal loss uh so I promised
us what quick primer on knowledge distillation so yeah you offered it and I had the opportunity I
was planning on circling back let's do it right now yeah yeah it's just it's the process where you
take where where you you take you know it's you have a smaller simpler model that you call like the
student and when you train this student to reproduce the behavior of a larger more complex model
which you can call the teacher and so the idea is to transfer knowledge from the larger model to
the smaller one so this way the student uh achieves higher performance than it would
learning on its own so this helps create models that are faster and more efficient but still
have that high level of accuracy um and then distribution focal loss if you want to talk about
that if your audience is interested I could talk a little bit about that but um yeah go for
installation yeah um so yeah okay so let's talk about that distribution focal loss so focal loss
what does that do it modifies the standard cross entropy loss that we use for classification
and it's specifically designed to address class imbalance problems in data set so the focal
loss is a function that deals with the class imbalance where there's a lot of easy negative examples
for example the background and then there's relatively few hard positive examples which are
the examples that you want to detect so during training a model might come across a large number
of negative examples and again negative examples is just parts of an image where there's really no
object of interest um you know compared to parts of an image that have an object of interest um so
this leads to to imbalance because the model is now being overwhelmed by easy negative examples
and it ends up not paying enough attention to these hard positive examples so the focal part of this
focal loss function gives a higher loss for hard misclassified examples and the lower loss
for correct easy examples so it's it's focusing on the hard misclassified examples um so this
way it prevents a number of easy negatives from overwhelming um the the detector during training
so then just the distribution part of that distribution focal loss uh instead of calculating loss
based on individual classes it calculates loss based on the distribution of classes
which improves the model's ability to handle class imbalance very cool the focus there on
uh the the negative misclassified cases or um you know getting these misclassified cases correct
it reminds me of the way that gradient boosting works so for example the way that we can take
um a forest of decision trees um but each step of the way we're with gradient boosting we
identify situations where the model was wrong and correct specifically those instances and that's
what allows xg boost to be such a top performing model approach particularly for tabular data and
if people want to hear a lot about that episode 681 with Matt Harrison we focused on xg boost
there but yeah it it's interesting like this that conceptual idea obviously is being implemented
in the way that you just described in a completely different way but that conceptual idea
of taking where a model is wrong and specifically focusing on those instances
to shore up the model in those cases it makes a lot of sense yeah awesome so foundation models
knowledge distillation distribution focal loss you fill this in on a lot of technical concepts
in the last few minutes heart rate and you know that exemplifies in a nutshell this enormous
amount of knowledge about deep learning models and model training particularly related to computer
vision that you've developed um in a relatively short period of time that you've been a desi
so clearly you're studying super hard and so it's interesting this this role that you have as a
developer relations manager some people might kind of think of that in their heads is being
like almost like a marketing role and it is like it is like the purpose of this is to help develop
the community but clearly there is a deep technical aspect to what you're doing like you are able
to come on this show and be able to go into any seemingly any level of depth on these complex
deep learning topics and so yeah I just I find that interesting and so I guess relatively quickly
what kind of data scientist would you recommend transition from a data scientist to this kind of
deeply technical developer relations role like you have yeah yeah developer relations it's like
an umbrella term much like data science is an umbrella term right because when I think of data
science I think of well there's data engineering there's you know business intelligence there's
data analysts there's traditional data scientists so and so forth right machine learning engineer
deep learning engineer so and so forth developer relations is the same way it's an umbrella term for
a set of kind of skills and you know obviously community building is one of them advocating for
the needs of your users on the product roadmap the advocate is another the developer evangelist
type of role where you're one to many communications right then you know the role I love is the developer
educator type of role so the type of data scientist that I think would fit well in this role is
like you have to be comfortable doing what I'm doing right now right like just coming up on
on stage on screen multiple times a week and just being there and just you know being okay and
comfortable with your own lack of knowledge because you have to learn a time right your community
is interested in a number of different topics you have to help them by creating content on those
topics or bringing in experts on those topics but when you bring in those experts you gotta get
yourself knowledgeable so you can have a good conversation with them right so it's really
you know you have to be you have to be the type of person that just loves being uncomfortable loves
feeling like they don't know anything and be driven by that but then also just you know curious
you gotta be curious about what's happening in the industry the tools that are out there
keeping up on trends all that stuff yeah I have in a hard time articulating the exact
type of data scientist but look at me if you hear me talking and you feel like you know you like
me then you might be good fit for developer relations yeah yeah I think this ability to to
communicate clearly both orally as well as written is going to be essential and yeah being
comfortable continuously learning new topics so for a really broad range because your community is
going to be interested in a lot of different topics that makes a lot of sense to me so at desi you
have specifically specialized in deep learning and you've been doing that as this generative AI
cornucopia possibilities has been taking off so what's that been like and what kinds of trends
do you think will drive the future growth for artificial intelligence it's been exciting
being you know kind of in the space as this generative wave has been taken over like at
you know we're working on generative use cases at desi like when we're at cvpr we'll be showcasing
our version of stable diffusion which is then it runs faster than normal stable diffusion and
I'm looking forward to us open sourcing those models so that I could talk about them and create
content about them and learn more about them so it's just been a whirlwind like you know my
Twitter feed is like the best Twitter feed ever it's just nothing but like cutting edge research
happening and you know you look at my my read a later list and it's just deep but I love it I love
it and I guess in terms of trends man I wish I wish I could I wish I was insightful enough to
see what kind of trends that that would drive the future of AI but I think you know deploying on
resource constrained devices is going to be a thing right like we all have reforms attached to
us but you know do we always want to send our data to open AI right what I'm curious to see what
apples going to come up with that's going to run locally right here on my phone that's going to
allow me to take advantage of these you know generative models on a small device like this so kind
of the interplay between internet of things and and generative generative models that's what really
really excites me yeah that's nice yeah that makes a lot of sense and I've been blown away by the
incredible capability of relatively small open source large language models so for me in the
national language processing space as I've talked about in a lot of recent episodes and as I am
working on daily at my company nebula we're taking open source architectures and these could be
very small relative to the kinds of open AI LLMs that you're describing out there so the open AI
LLMs like GPT-3 we know was 175 billion model parameters and GPT-4 was probably larger
um they never released that but it takes longer to get GPT-4 results so presumably it's an even bigger
model so the these kind of state-of-the-art private models are hundreds of billions of parameters
but if you have a relatively constrained set of use cases that you need your model to be able to
handle you know you don't need it to work in every imaginable language for example um you can take
these open source models that are often like 3 billion 7 billion 13 billion parameters so they're
like from a hundredth to a tenth of the size of open AI's private models and you're able to fine-tune
it very efficiently using parameter efficient fine-tuning approaches like low-rank adaptation
and then you have this model that can be deployed on a single GPU but as you point out
they've got to get even smaller like being like having a big cloud GPU as like hey that's
it can even be that small that isn't nearly small enough like we need to be able to have these
models run on people's phones or being able to run on Raspberry Pi's or the Nvidia Jetsons that
we were talking about earlier and so I think you're absolutely right I think that the future of AI
is smaller models that approach the kind of performance that we see from models like GPT-4
but are constrained to more refined use cases that makes a lot of sense to me
um so obviously you're learning a lot about deep learning in particular and I know that you have
a particular philosophy of learning deep learning that you describe as top down do you want to
describe for listeners what that means and why it might also be the way that they should be
learning complex concepts like deep learning yeah yeah so I'll preface this by saying that like you
know I've got a master's mathematics and statistics I was a bio statistician I was an actuary I
was a data scientist so uh it's this coming from that perspective but even with as somebody who
has that background like my approach is to just skip the math first skip the math right ignore it
when you're standing starting out because looking at equations is going to demotivate you right so
what I instead um implore people to do is just look for applications of deep learning right so you
know pick up yolo nass and run run it on some image see the power of it open up you know chat GPT
or any of the other language models and start playing with it like interacting with it um start
interacting with with with with models trying to build something with it trying to do cool stuff with
it right you know open up learn some lane chain and see what you can build right just see the magic
happen get inspired um then once you're kind of inspired right if you think it's cool right some people
probably don't think it's cool they'll just you know that okay cool whatever uh that's fine but if
you think it's cool and you're interested then dig a little bit deeper and dig in deeper there's
like a couple of places I recommend one of them uh Andrew Glassner has this deep learning cloud
crash course it's like three and a half hours long uh but it it gives you just proper intuition
for how how all this works uh very good return on time investment so like the he wrote this book
the deep learning uh they illustrated guide huge huge massive book right um no
deep learning visual approach oh yes deep learning visual approach yes deep learning illustrator is
another book I'm about to talk about but deep learning illustrator approach uh great book um and
then once you do that uh start learning some pie torch right just you need to move away from
psychic learn going from psychic learn to a pie torch is a bit of a mental shift but you know
Daniel Burke I'm not sure if you've interviewed him on your podcast or not he's awesome he's based out
of australia I highly recommend him uh mr. deep Burke on twitter um but he's got the zero to mastery
pie torch course go through that because you're gonna get a bit of intuition about what's happening
under the hood uh then you're getting your fingers on the keyboard you get your hands dirty you're
coding right uh this is nice because it's completely self self-paced and you're gonna learn how to
code in pie torch which I think is the best for deep learning in about a week right you'll be
comfortable with pie torch in a week so now once you got that then you pick up this book uh by
john chrome called deep learning illustrator and this will get you more of the math right so
get more into the math with with this book then john person youtube that teaches you the basics of
deep learning math um you know two two and a half hour time investment and you're learning from
an Oxford PhD come on that's amazing uh and then uh and then I think one of the most important things
just you have to understand back back propagation I think once you've gone to this point right
just spend some time understanding back propagation um they just make sure you kind of understand
what you know intuitively what's happening and you know maybe a little bit mathematically I like
Andre Carpati's um things called neural network zero to hero on youtube great great resource for
that you actually end up building um you end up building like a mini version of pie torch uh I think
he calls it like mini grad or something like that um but it's amazing it's great um once you've
done that then get an understanding of more foundational architectures you could you know
once my LinkedIn learning courses out go through that go through some of the foundational
computer vision architectures uh yannick culture has a great youtube series on classical papers
breaks it down in an easy to understand manner um and then just join some community you know
be around other people who are into the same stuff uh you want to be around people who have a
broad range of experience um from learners to experts um and they're finally just projects just
do projects get on cago do one projects um I think that's the best way to go about this
nice that was a great road map there and yeah for people getting into deep learning that sounds
like a really great flow from Andrew Glasner's deep learning a visual approach I didn't know that he
really I guess he doesn't have very much math at all in there um and yeah I haven't read his book but
you can confirm yeah not not too much math it's it's all like it's just uh you're
understanding the concepts through illustrations which is amazing yeah nice yeah and it's interesting
with my deep learning illustrated book and thank you very much for the shout out there
I set out to have it be as unmathy as possible but it's interesting but but it does have some math
certainly and so it's it's cool to me that somebody else has come up with an approach that is
even more visual so the people who want that completely visual approach can do Andrew Glasner's
first and then yeah I've got some of the math in my deep learning illustrated book after I wrote
deep learning illustrated I realized that a big gap is people's ability to apprehend the underlying
linear algebra and calculus of deep learning like the calculus associated with backprop so since then
since I wrote deep learning illustrated a lot of my content creation has been around these foundational
subjects completely different from the idea of foundational models where yeah these foundation
models are yeah these the huge models that you were describing earlier foundational concepts
completely different I'm just talking about like math and computer science and probability and
statistics that these foundational subjects that you need to know to be able to understand
machine learning well and so I would I'm I put a lot of time into and I think and based on feedback
I can confidently recommend if you don't understand exactly how backprop works today I have no
doubt that Andre Carpathi's resource is great all of Andre Carpathi's resources are awesome
but I have a calculus for machine learning YouTube playlist that goes from and maybe you know if
you already know calculus well you can skip some of the beginning videos where I show you how
calculus works so I explain in an intuitive way and with lots of hands-on code demos how calculus
works how partial derivative calculus in particular works and then how we can use partial derivative
calculus to do back propagation and it and so my guess is that it's a longer journey than Andre
Carpathi's because I think it's something like seven hours of videos and then if you do the
exercises as well you're looking at even more time invested there because I give you like exercises
and solutions at different points checkpoints throughout this curriculum but yeah whether you're
yeah wanting to get just get started on the underline calculus to understand backprop or you
want to jump to later videos and get deep into the weeds on how backprop works using calculus principles
and do it in a hands-on Python-based way yeah I think I it's my own resource but
I've linked to linked to that resource many times like it's a great YouTube course
and another kind of interesting resource I like is there's like this series of
manga books that touch on a wide range of topics I've got the entire set but there's a book
there on calculus and on linear algebra and they're like you know proper like comic books but it
teaches you calculus yeah yeah yeah yeah the manga guy to calculus and the manga guy to linear algebra
super did awesome so near the end of every episode I asked people for book recommendations but
you've just given us a ton so I think we've covered that question unless you have any other books
you'd like to add you know I used to I used to I've traded when I was recording the
article today science podcast I read a lot of books like because I had so many authors on
and since I kind of put the podcast on hold for now I spend most of my time reading research
papers in the morning whenever I have free mornings I have not read a book in like six months
sadly but the one that I have currently just gone back to rereading is a deep work by Calney
Port I think that's a good book yeah important work book for for people who are in a roles like
hours that are knowledge intensive and require a lot of things yeah so important to be able to carve
a little bit of time every day to be able to work deeply it's absolutely essential all right Harpreet
so awesome episode today thank you so much for taking the time and hopefully you can even
consider this to be some of your deep work for the day I actually usually do otherwise if I
didn't include filling podcast episodes I'd have been embarrassingly small amount of deep work done
every day and so if people want to be leaning amazing insights from you after this episode
obviously we know that your deep learning daily podcast is something that they can be checking out
what other what are the ways should people be following you nowadays I'm mostly on Twitter
so data science harp on Twitter so find me there you know I still have I still have like a huge
falling on LinkedIn I'm just not as active on LinkedIn just because the algorithm has been unfair to
me far too much but Twitter is just like a cool like if you're if you're wanting to get into deep
learning and keep up on trends and on research papers and things like that like Twitter is the place
to be LinkedIn I find is more classical ML data analyst data engineering business kind of focused
but twitters are all like that the cool stuff is that I'm into at the moment so find me there
nice so we'll be sure to include those links in the show notes Harpreet thanks again for taking
the time and maybe in a couple of years we'll be catching up with you again absolutely man looking
forward to who knows who knows Robbie in a couple of years but you know Bobby be always be happy to come
back on nice catch you in a bit Harpreet cheers
nice so great to catch up with Harpreet on air he's clearly flourishing in his deep learning
developer relations manager role and making a big impact in today's episode Harpreet
fill us in on how object detection is a machine vision task that involves drawing bounding boxes
around objects in an image and then classifying each of those objects talked about how object
detection models have become much faster in recent years by requiring only a single pass over
the image as with the renowned you only look once yolo series of model architectures he talked
about how desi leveraged their autonac neural architecture search to converge on yolo nas an
optimal architecture for both object detection accuracy and speed he talked about how hybrid
quantization selectively quantizes parts of a model architecture in order to increase inference
speed without adversely impacting accuracy and the future of AI may lie at the intersection of
the internet of things and smaller generative models as always you can get all the show notes
including the transcript for this episode the video recording any materials mentioned on the show
the URLs for Harpreet social media profiles as well as my own social media profiles at superdatascience.com
slash 693 that's superdatascience.com slash 693 if you live in the New York area and you would like
to engage with me more than just online you'd like to engage in person on July 14th I'll be
filming a superdatascience episode live on stage at the New York R conference my guest will be
Chris Wiggins who's chief datascientist at the New York Times as well as a faculty member at
Columbia University focused on applications of machine learning to computational biology so not
only can we meet and enjoy a beer together but you can also contribute to an episode of this podcast
directly by asking professor Wiggins your burning questions on stage all right thanks to my
colleagues at Nebula for supporting me while I create content like this superdatascience episode
for you and thanks of course to Ivana Mario Natalie Surge Sylvia Zara and Curel on the superdatascience
team for producing another eye-opening episode for us today for enabling that super team to create
this free podcast for you work of course deeply grateful to our sponsors please consider supporting
the show by checking out our sponsor's links which you can find in the show notes finally thanks
of course to you for listening I'm so grateful to have you tuning in and I hope I can continue to
make episodes you love for years and years to come well until next time my friend keep on rocking
it out there and I'm looking forward to enjoying another round at the superdatascience podcast with
you very soon