693: YOLO-NAS: The State of the Art in Machine Vision, with Harpreet Sahota

This is episode number 693 with Harpreet Sahota of Desi AI. Today's episode is brought to you by AWS Cloud Computing Services by withfeeling.ai, the company bringing humanity into AI, and by model bit for deploying models in seconds. Welcome to the Super Data Science Podcast, the most listened to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I'm your host, John Crone. Thanks for joining me today. And now, let's make the complex simple. Welcome back to the Super Data Science Podcast today. I'm joined by Harpreet Sahota for a special episode examining the state of the art in machine vision. It's hard to imagine a better guest than Harpreet to guide us on this journey. Harpreet leads the deep learning developer community at Desi AI and is really startup that has raised over $55 million in venture capital and that recently open sourced the YOLO NAS deep learning model architecture. This YOLO NAS model offers the world's best performance on object detection, one of the most widely useful machine vision applications. Through his prolific data science content creation, including his artists of data science podcast and his LinkedIn live streams, Harpreet also has amassed a social media following in excess of 70,000 followers. He previously worked as a lead data scientist and as a bio statistician. He holds a master's in mathematics and statistics from Illinois State. Today's episode will likely appeal most to technical practitioners like data scientists, but we did do our best to break down technical concepts so that anyone who'd like to understand the latest in machine vision can follow along. In the episode, Harpreet details what exactly object detection is, how object detection models are evaluated, how machine vision models have evolved to excel at object detection with an emphasis on the modern deep learning approaches, how a neural architecture search algorithm enabled Desi to develop YOLO NAS, an optimal object detection model architecture. The technical approaches that will enable large architectures like YOLO NAS to be compute efficient enough to run on edge devices and Harpreet's top down approach to learning deep learning, including his particular recommended learning path. All right, you're ready for this eye-opening episode? Let's go. Harpreet, my friend, welcome back to the Super Data Science podcast. It's been over two years since your first episode. That was way back episode number 457 in the spring northern hemisphere. Spring of 2021, and that was when I met you. Since then, although we've never met in person, I would say we're friends. Yeah, absolutely. So good to be back on the show. I can't believe it's been two years. And yeah, like me and John are on text message and phone call kind of basis. I hit them up whenever I need some advice with whether it's a career, mover, whatever, just to say happy birthday and all that. So we've been in touch over the last two and a half years, but can't wait to meet in person someday. Yeah, now that the pandemic is over, I'm sure we'll run into each other at a conference. So we're somewhere in the world soon and I'm looking forward to it. So when you were last on the show, the episode was focused on landing a data science dream job because you were, I mean, you still are very involved in encouraging people's careers, but you were doing it explicitly as part of how you were splitting up your week back then. At that time, you were also full-blown focused on the artists of data science podcast, which is a very cool concept for a podcast. It's designed for a data scientist audience, but it's talking to guests that aren't necessarily data scientists, they're philosophers and writers. So you're really trying to cultivate creativity and possibility artistry amongst data scientists. So yeah, really cool format. And I know that with all the things you got going on, you're not releasing those episodes as much anymore, but you've got a new podcast, Deep Learning Daily. Is that a daily show? Is it really daily? So Deep Learning Daily, it's the name of the community that's desi's kind of sponsoring. And so this community, we've got a, I've done a number of virtual events for it and recorded like podcast episodes. Those will all be released in due time. I've got like a backlog of 30 something episodes. It's just a matter of time to edit it and all that. And I'll actually, I'll be at CVPR next week recording interviews with researchers and stuff. And that'll all be released on that podcast. But yeah, Deep Learning Daily is like the discord community. And then it's also the sub-stack, which is the newsletter. And that's where the audio and video will be housed for that. And then also it'll be on podcast platforms as well. Like this podcast is like almost 180 from the artist's data science, because this is completely technical. And we're just getting into the needs of Deep Learning with people, which I found extremely fascinating. Big departure from the artist's data science, but it's a good direction. Well, very cool. I remember over two years ago, when we did record your episode, you weren't, you were very interested in Deep Learning. But you hadn't formally studied it that much. I don't think you'd done anything in production at that time with Deep Learning. But now, because of working with Dessie, you are full on into Deep Learning because there's such big Deep Learning specialists. So it's exciting to see all the posts you've been making. And I'm looking forward to digging into your technical expertise in this show. So yeah, back then, you were a data scientist at a company called Price Industries. And now a couple of job changes later, you are Deep Learning developer relations manager. I'm getting that currently at Dessie AI. And so Dessie AI is a Deep Learning Acceleration platform, very cool company actually invested in by George Matthew, who was a guest on the show here in episode 679. And so yeah, fast growing company. What I want to focus on in this episode is specifically the groundbreaking foundation model that Dessie released called Yolo NAS. And so Yolo NAS is a computer vision algorithm. So we're going to start with like the basics a little bit. And then we're going to quickly ramp up into more and more technical detail. So that hopefully all of our listeners can follow along. But so start us off by just explaining Harpreet what machine vision is. Yeah, so when you think of like just traditional image classification rights, you'll have let's say some image, you present that to your neural network, and then you'll obtain a class label and some probability kind of associated with that prediction. So you got one image in, one class label out. And that's great when the thing that you're interested in is maybe characterizing the majority of that image space or it's like the most prominent thing in that image. But object detection takes that one step further because it's not only telling you what is in the image at the class label, but also where in the image that thing is, right? And it's telling that via a bounding box. So you get bounding box coordinates. So to object detection, you'll input an image, you know, video is still just a series of images, but you input an image and you obtain multiple bounding boxes and class labels as the output. So just kind of, you know, at the core of this that, you know, object, any object detection algorithm kind of follows a similar pattern, right? The input is the image that you want to detect objects in, and then you'll get an output of a list of bounding boxes, which are the x, y coordinates for each object in the image, the class label, and then some probability score associated with what the network thinks that thing is. Nice, very cool. So the kind of the classic machine vision task. So 10 years ago, machine vision researchers, the state of the art was mostly focused on classifying the whole image, like you were just describing. So famous architectures like Alex net released in 2012 was working on the image net large scale visual recognition competition data set. And they were in that competition, you're just trying to get the whole image right. So like the image would be primarily of a cat or a dog or a plane or whatever. And you'd want the algorithm to be able to identify that with object detection that you're describing. It's this more specialized task where there could be lots of things going on that could be a cat playing with a dog. And so you have to be able to first put a bounding box, like you said, which is just a rectangle of any dimensions that, you know, so it could end up being that in the image is just a big cat face. And then the bounding box would just be like basically like the whole image. Or it could be that there's like it just a cat in the corner. And there's nothing else like it. I don't know. The whole rest of the image is just like plain brown is nothing going on. So they need to have a little bounding box in the corner where the kitten is. Or you could have a bunch of things going on a bus and a horse and a cat. And so you have these rectangular bounding boxes all over the image. And then you're tasked with correctly identifying what's in that bounding box. So it's a much more complicated task because you're you're breaking down the big image into a bunch of smaller images and then classifying each of those objects that you find that you detect in the bounding boxes. There's also to maybe help people kind of understand how there are like different kinds of machine vision tasks. There's also image segmentation, which is instead of being bounding boxes, it's pixel by pixel. So every single pixel in the image gets classified as a particular category. And so yeah, like different kinds of approaches. I think image segmentation, like maybe that's better when you have a relatively constrained set of classes where like maybe in like a self-driving car application, you're like, okay, we need to be able to identify like pedestrians, cars, road. And so you have kind of this relatively finite set of classes that you'd want to be able to buck it that pixels into. But I think that object detection, and I'm not an expert in this stuff, you might know more than me, but with object detection, you could have, I mean, with modern deep learning algorithms, it's probably almost limitless. Like you could have a crazy number of possible objects. You might even be able to do, whoa. I just thought of this, right? I don't spend much time thinking about this Harprey, but you could even use object detection to find where the bounding boxes are. And then you could use like a clip-style algorithm to just, you could literally describe anything like then you have an infinite number of classes. Like, so that's kind of a, say, anyway. That's a lot of lines of that new model, I think, meta-put-out segment, anything. So I think that's right along the lines of that. I haven't gone too deep in the paper, but it's essentially what you're describing, a promptable segmentation type of model. But yeah, segmentation, that's an interesting thing is segmentation is this concept of things versus stuff, right? Stuff is just the amorphous stuff in the background and things are the group of pixels that are together that belong to a particular thing. Yeah. Are you stuck between optimizing latency and lowering your inference costs as you build your generative AI applications? Find out why more ML developers are moving toward AWS training and inferential to build and serve their large language models. You can save up to 50% on training costs with AWS training chips and up to 40% on inference costs with AWS inferential chips. Training and inferential will help you achieve higher performance, lower costs, and be more sustainable. Check out the links in the show notes to learn more. All right. Now back to our show. Yeah. And so with object detection, which we're going to focus on this episode, you're identifying the things in these bounding boxes in the image. And so tell us a bit about there's in recent years, YOLO style architectures, which YOLO is like when you're a college student, I guess going on spring break, it's you only live once with these architectures. It's you only look once. So there was this original paper a number of years ago on YOLO architectures. But I think you probably know enough about this that you can even go back before YOLO and tell us kind of how object detection has worked historically at a high level. And then what these YOLO architectures changed and why those are like the dominant architecture for object detection today. Yeah. Even before like deep learning, there was, you know, people were doing object detection, but they're using, you know, something like histogram of gradients plus maybe some linear support vector machines on top of that to do whatever they need to do. But of course, with classic ML, like you have to hand engineer all those features, which is, which is a pain. But then right around 2014 is, you know, the the dialects in that moment happened in 2012, and obviously people started going crazy over neural nets and deep learning. But 2014 was like one of the earliest object detection, the earliest deep learning based approaches to object detection that was called RCNN. So that's region based convolutional neural network. So that one, like you would take in an input image, then you extract a bunch of, what they call region proposals, then, you know, then you'd use some CNN to do some type of feature extraction and then classify whatever the regions are, right? Yeah. CNN being convolutional neural network, which was like, and we get actually, so just to like really quickly digress, in natural language processing, which is, by expertise, really, we have now gone completely from using recurrent neural networks, RNNs or LSTM's, long short-term memory networks. We now are completely in this transformer world where we're like, wow, transformers are way better because they can take into account like all of the context in a given piece of language. There are trade-offs, the people are working on where like the the compute complexity squares polynomially, when you're when you're working with these transformer architectures. But is RCNN still used a lot in machine vision today? Or are those also moving over towards transformers? I think they're still being used for sure, but transformers have crept into the computer vision world as well. So there's a lot of new transformer-based kind of architectures for detection and classification tasks as well. So yeah, it's definitely creeping into there. I haven't done too much research into that. Like, I've only really been doing deep learning for about a year. And so this is all my long list of things I need to figure out. It's just a quick digression because yeah, so you're confirming for me that transformers are being increasingly used in machine vision, but CNNs are also still being used. So that is a bit different from natural language processing where like I don't really know anyone who's using RNNs anymore. But anyway, so CNN, so you're talking about RCNN in 2014 being the first big deep learning architecture for object detection. And then I started interrupting it. Yeah, yeah. And then there's improvements on RCNN. There's fast RCNN, faster RCNN. And the thing with these type of methodologies is that you need to do two passes through your neural network to classify and then get the bounding box coordinates. But then Yolo came along and Yolo kind of just changed all of that. So instead of having to have two passes through your neural network, you only look once. You only have to look at the image once and do one pass through the neural network. And so they became they took off in popularity because they're impressive speed and accuracy. So I don't know if you want to quickly just talk about kind of like the anatomy of an object detection model. Like you've got the backbone neck ahead, right? So the backbone is is where we were extract features from an input image. And you know, typically that's done using a convolutional neural network. And then it's capturing kind of hierarchical features at different scales. So lower level features like, you know, edges and textures are being extracted in, you know, the more shallow layers and then as you get deeper, you get more higher level features. Like, you know, more kind of semantic type of information. And then the neck is the part that is connected to the backbone ahead kind of does that actual classification and bounding box prediction for you as well. Yeah. So Yolo came around in at the paper was published in 2015, but it was presented at the CVPR conference in 2016. And it just, you know, took the world by storm with how fast it was. And it smashed the previous state of the art. So map MAP mean average precision. That's the metric that we look at when we measure how good a object detection model is. Do you want to talk about that we could, I can give it, you know, yeah, let's let's talk about that. Just really quickly, I'm going to get you to elaborate a little bit on CVPR. So you've mentioned that you said how with your deep learning daily podcast, you're going to be there at the time of recording in the future. And so probably by the time this episode's released, it'll be in the past. But you're going to the CVPR conference, which is it's the premier computer vision conference, right? Yeah. Yeah. Computer vision pattern recognition, but really it's like a deep learning type of conference. And yeah, one of like one of the well-known ones kind of up there with with Nero Ips and and what's the other one? ICML. Yeah. Big conference. But yeah, and I guess this one skews a bit more towards machine vision applications than those other two necessarily would super cool. So yeah, so yeah, so you're saying, yeah, these kinds of big breakthroughs like YOLO get published in the CVPR proceedings. And then you are about to tell us about map, which is a key metric for assessing the quality of a object detection models output. So I guess like there's these two kinds of trade-offs that you're trying to work through like you you want it to be fast. And so you've been talking about how from RC&N to fast RC&N to faster RC&N obviously they're going faster. But then YOLO was a big step change because it didn't require these multiple passes. It was able to in a single pass be able to both identify where objects are in the images as well as classify those images, which is kind of a mind-blowing thing to me like you you described it there through the backbone neck and head anatomy. But like because I haven't spent enough time with it myself it's still something that like I'm kind of mind blown by. But so obviously speed is a key consideration. But then simultaneously another big consideration is accuracy of course. And so I guess map is like the key way to measure that accuracy. Yeah and it's based on precision and recall so familiar terms to classical you know data science folks. So so map gives you kind of a balanced assessment. And then there's another assessment that you call IOU that this intersection over union that tells you how good your bounding boxes is. And then there's another thing in there in object detection that we call non-max suppression that helps filter out you know redundant bounding boxes that you might get during training. But so just kind of breaking it down right so we all know precision is right it's just the model's ability to make an accurate positive prediction right. And then recall is the number of actual positive cases that the model correctly identifies right. So precision you can think of it as like a sharpshooter and it's hitting the target accurately and and recall is like a detective who is catching all the suspects in the crime. So there's always that tradeoff between precision and that's what I love that analogy that is yeah that is easily the best analogy I've ever heard for explaining precision precision and recall. And maybe now like I'll be able to remember it I don't want to have to keep going and we can be there every time someone's like yeah I know what that is everyone's always talking about that I of course know what it is. Yeah yeah that dude tries me like I get confused all the time I don't ask for the formula but I don't know what formula is I'll stop my head but so what mean average precision does is you know we have the precision recall curve and map mean average precision is calculating the area under the precision recall curve to give us like a balanced assessment for how good this particular model is. So what it does is it's incorporating the precision recall curve like I've mentioned and plots precision at against recall for different thresholds right. So when you do it this way you get kind of a more more balanced assessment because you're considering the area right under the curve and then you have to think about the multiple objects that are you know being detected in an image. So map is able to handle multiple object categories by calculating each category's average precision separately and then taking the average across all the categories right. So it's measuring average precision for the object you do that for all the different objects and then you average that and you get the average precision. Now there's another concept to measure how good the object detection model is intersection over union and this just measures the the quality of the predicted bounding box by comparing kind of intersection and union of areas. Got it. So the map the mean average precision is for figuring out once like within the bounding boxes how accurate are the predictions and the IOU is more about how accurate are the placements of the bounding boxes. Yeah yeah exactly. That makes a perfect sense sweet. So you've given us an introduction to YOLO what has happened since because I think the original YOLO architecture said it was like 2016 and so since then there's been a bunch of different versions. There's like YOLO version 2 and YOLO version 3 and I don't know if those were like incremental changes but ultimately it leads to what Dessie's been up to and just this year made a big splash with the release of this YOLO NAS architecture. So like take us on the journey to YOLO NAS. Yeah so yeah the first YOLO YOLO V1 like the paper was published 2015 it was presented at CVPR 2016 so it's you know it's been around for a while. And since then it's been 16 different YOLO models have been released. There's this really really good paper on archive called a comprehensive review of YOLO that goes from YOLO V1 and beyond by Juan Terven and Diana Cordova Esperanza and they recently just updated it a few days ago to include YOLO NAS on it as well. 35 page research paper but I summarized their findings in a edition of the Deep Learning Daily newsletter but yeah like there's been a bunch of YOLOs and you know what characterizes all of them is just speed and accuracy. So you know the first three YOLOs YOLO V1, YOLO V2, YOLO V3. These were all created by some renamed Joseph Redman and Ali Faradia, I believe his name is. These are the original creators of YOLO. Redman left computer vision research for you know ethical principles after YOLO V3 but people have kind of just adopted that name as you know as a framework there's there's this brand affinity with with the YOLO name. Did you say you know so I don't I don't know if it was just like a like a glitch in the recording or what but just as you were saying why Joseph Redman left did you say for ethical reasons? Yeah for ethical reasons he was not he was not happy about the application of his research for military purposes and obviously envision the military purposes that somebody would use object detection to target things and yeah yeah yeah yeah this stuff like okay like detect you know what kind of plane this is oh it's it's a passenger plane versus it's a Russian MIG and yeah and then it can be used to yeah I don't know the extent to which they are like automated systems in terms of like actually firing it or if there's a human in the loop but obviously yeah there's some pretty concerning ethical implications. Yeah yeah and so he left computer vision research and then somebody named Lexi Bach Koshch I can't say his last name Bach Koshchowski he started off with the YOLO V4 and so YOLO V4 hit the ground and after YOLO V4 was YOLO V5 so okay so YOLO V3 was originally I think it was like in the darknet framework for C++ but then an engineer named Glenn Jocker took YOLO V3 ported it over to PyTorch so it became available to the PyTorch community so then Glenn then created YOLO V5 you know completely new architecture I released it in you know as a PyTorch kind of model there's a bunch of controversy about that don't delve into that too much but yes since then there's been a number of YOLOs like scaled YOLO YOLO R you only learned one representation YOLO X which is exceeding YOLO series in 2021 there's the YOLOs that came out of the paddle paddle research group in China then you know Glenn and Ultralytics published YOLO V8 earlier in 2023 January 2023 but really like you know the prior to YOLO and asked the real state of the art was YOLO V6 YOLO V7 and YOLO V8 right so our model was inspired by YOLO V6 and YOLO V8 some of the blocks that they had in there we kind of fed that to our neural architecture search algorithm and ended up with YOLO and asked so what is YOLO and asked YOLO and asked is in a nutshell it's an object detection model a new state of the art and it's outperforming YOLO V6 and YOLO V8 in terms of mean average precision and inference latency so that means it's more accurate and it's faster and it's improving upon some of the limitations of the previous YOLOs you know previous YOLOs didn't really have adequate quantization support you know the tradeoff between accuracy and latency wasn't the best and you know we're able to now be faster in real-time detection as well now me that YOLO and asked is you know supports intake quantization so it's just the natural next step and it's you know it's not going to remain state of the art forever like object detection is this super competitive like field of research there's people around the world working on this you know I'm sure somebody will beat us soon enough but you know we'll be we'll be back at it that's why I was so urgent that I get you on the show now I knew this episode would be live while you guys are still the number one object detection algorithm exactly the future of AI shouldn't be just of a productivity an AI agent with a capacity to grow alongside you long-term it could become a companion that supports your emotional well-being paradot an AI companion app developed by with feeling AI reimagines the way humans interact with AI today using their proprietary large language models paradot AI agents store your likes and dislikes in a long-term memory system enabling them to recall important details about you and incorporate those details into dialogue without LLM's typical context window limitations explore with a future of human AI interactions could be like this very day by downloading the paradot app by the Apple App Store or Google Play or by visiting paradot.ai on the web so yeah so super cool so you'll and as is the fastest most accurate object detection algorithm yet awesome that you're saying that that means it's so fast now that it can be used in kind of real-time applications and so I think a key thing here is we know yolo stands for you only look once but what does the NAS NAS stand for in yolo NAS yeah so that stands for neural architecture search because the way this this architecture was discovered was through this you know auto ML kind of technology called neural architecture search typically you know people discover architectures they're doing tons of research and all that well we just looked at what was out there what worked input that into our giant auto-nack engine and got this architecture so let's just kind of talk a little bit more about neural architecture search so what is this thing trying to do right it's it's trying to find like an optimal network architecture for a specific task like for example that task could be detection classification segmentation whatever and what you know neural architecture search does is that it's automatically searching through possible architectures so in the case of yolo and NAS like our architecture we search space well we'll talk about search space in a second here but our architecture search space was 10 to the 14 different architectures a ton of architectures yeah so instead of like you know relying on manual trial and error or human intuition you know NAS is is using optimization algorithms to find the architecture so that we're balancing accuracy flops you know flimpling operations that's computational complexity and like the actual size of the model so you know how is it doing this well you know the search algorithm could be as simple as grid search or random search or it could be more complex like Bayesian genetic reinforcement learning but it's kind of just talk about neural architecture search though so we need we need basically three things to make this happen a search space search strategy and then some way to estimate the performance of the architecture that we end up with so the search space itself this defines all the possible sets of architectures that the that are our algorithm can explore and so what's the search space consist of it could be as simple as the number of layers in a network or it could be as complex as the types of layers the types of blocks the connections between layers various other hyper parameters all you know all the different you imagine like you know building a Lego house right like there's a myriad of different Lego pieces that we have right if you are trying to maximize the square footage of your Lego house by using the optimal blocks right this is what neural architecture is kind of deep at a high level like intuitively we're trying to find the right blocks to maximize something so at desi like the thing that we have the auto-nack engine it takes it just a step further because in addition to everything I mentioned before we also consider the hardware that you're deploying on and your data characteristics so the hardware could be you know in the case you all and ask we optimize it for the T4 GPU which is industry standard for detection you can even look at you know compiler's quantization as well but you know this search spaces it influences how the end architecture you know ends up being so then we have a search space but then how we need a way to search through this space because that's a ton of different pieces and so again the various methods random search basing search reinforcement learning evolutionary algorithms gradient methods whatever and this impacts how long you're searching for once you got those in place then you need to have like some way to estimate the performance of your of your outcome architecture and so this could be as simple as just training each architecture that you end up with on the data set that you're intending to use it for and just measuring the performance or you can do more advanced techniques that I really don't know how these work but like curve extrapolation one shot NAS weight sharing things like that but you put all that in right that's what goes into NAS the search space the search strategy performance estimation strategy and then the output is an architecture that's optimal or near optimal according to whatever metric you have in a nutshell very cool that was a great explanation and I love the lego analogy you are the king of analogies to make this to take yeah difficult to visualize concepts and make them suddenly instantly visualizable so very cool so all of this neural architecture search all of this NAS concept this is something that desi has developed right yeah so yeah neural architecture search it's it's an active field of research the thing that differentiates desi's neural architecture search is the actual algorithm itself so that's what's proprietary for desi is our algorithm got it got neural architecture search yeah so NAS is both like the name of a field of research as well as like in capital letters a specific algorithm that desi's developed yeah in our case we call it auto-nac auto-nac architecture construction that way we can put the TM on there got it nice so auto-nac is desi's proprietary NAS algorithm perfect that makes a lot of sense so that was an amazing tour heart rate of computer vision object detection and then the architectures that have led us to the state of the art yellow NAS architecture today including the auto-nac approach that allowed the neural architecture search to identify this optimal architecture and so if there are people out there who would like to learn more from you about computer vision I understand that you are working on an intro to computer vision course which is expected to be out in September or October of this year on LinkedIn learning that's cool yeah yeah yeah doing on LinkedIn learning it's it's gonna be a cool course so like at the audience for this course are people who are like me before I got into deep learning so if you're comfortable with statistics math Python programming classical ML if like you're good with all that and you're like looking at this deep learning thing and wondering like okay how how can I get into this then this is the course that I made for you I made it for an earlier version of me and it goes through like I start with like a history of computer vision for image classification and I talk about you know important concepts like the things that I felt I needed to understand before I got into deep learning so I kind of structured it that way I start from pre deep learning methods just briefly touch on those I talk about the importance of image net because I didn't know what the hell image net was like when I first got into deep learning and the importance it had and why it's important and then going to like Alex net and then I talk about a few different architectures there like kind of fundamental architectures you know a pay homage to Alex net and Lynette but also you know I we talk about resonant efficient net mobile net and a red net as well I think I talk about that and we do everything it's you know it's I do it devoid of much math I try to make it as math unintensive as possible and the example projects are all done using the super gradients training library nice I've never heard of that before super gradients training library yeah yeah it's it's desi's library it's the official home of yellow and ass and it's it's it's a pie torch based training library is just it includes a bunch of like train tricks right out of the box and it just abstracts a lot of the workflow a lot of the boilerplate so you don't have to write it so I mean I know you're huge fan of pie torch lightning you can think of it kind of like that cool yeah so it's like a wrapper around pie torch that lets you do things more quickly when you're needing to be training a model so like in base pie torch for example like you need to even be writing this structure of your loop through epochs of training and then each step within the epoch but pie torch lighting abstracts out of way so this does a similar kind of thing and is particularly for computer vision or it's it's quite general yeah if they're related through theoretically can use any pie torch mod model like any and then module for super gradients it'll work just fine but our models do the pre-trained models that we have they're all mostly computer vision at the moment and we got pre-trained models for classification detection segmentation and pose estimation as well and then we'll be expanding further in the near future nice very cool yeah you can see how that's kind of CV focused with stuff like pose estimation which I'm assuming is like predicting what kind of pose a person has in an image or video which is obviously going to be specific to CV very cool so a concept that you've mentioned a couple of times and that I've touched on in recent episodes of the show as well but I'd love to hear you tell us more about in the specific context of your loan as is this idea of quantization so quantization allows us to to reduce the size of our model by instead of using full precision data types we use half precision or maybe even less precision so you're you're reducing the amount of memory and compute required for say the parameters in your model and so yeah fill us in on the yellow NAS approach my understanding from my vague understanding of yellow NAS is that it uses something called a hybrid quantization method so maybe fill us in on what that means relative to standard quantization yeah yeah so like exactly like you said like quantization is we're taking the weights in our model from like some high precision floating point representation like 32 bit for example to some lower precision representation 16 8 or bit whatever two bit that'd be really impressed but it you know reduces the model size and increases inference speed but you suffer inaccuracy like you pay for it somewhere but this is a great approach because you're going to be able to deploy that model on hardware with limited you know computational resources so that's kind of like the standard naive quantization method then there's the hybrid quantization method which is a little bit more advanced form of quantization and in this case you're selectively applying quantization to different parts of the architecture based on the impact of the model's performance on the hardware so it's more of a selective approach that helps maintain the performance of your model but you still get the benefits you know reduce model size and increase inference speed so within the context of yellow and asked this hybrid quantization method it's like selectively quantizing specific layers in the model and so it's trying to optimize the accuracy latency trade off while still maintaining performance as opposed to like you know standard quantization which is going to uniformly quantize like all the layers and that causes more accuracy loss and there's pros and cons to both approaches right so standard quantization of course we mentioned it reduces model size increases inference speed but drops at accuracy right so it treats every part of the model the same which is probably not always ideal hybrid quantization we're maintaining the model performance but still reducing the model size increasingly inference speed but the thing is like it's more complex to implement and it involves identifying which parts the model should be quantized and to what degree they should be quantized very cool that was a crystal clear explanation and you even kind of did the summarization back to the audience that I usually do so I don't even have anything else really to add yeah it makes perfect sense so yeah quantization is reducing the complexity of the data type and that does have this trade off of accuracy going down when you're having the speed increase and so yeah hybrid quantization is this cool way of selectively quantizing only parts of the model so that you get the speed increases without the accuracy loss super cool and so I guess that kind of quantization could come in handy along with potentially choosing your model size so I know that there are a few different yellow NAS versions there's a small and medium and a large version and so yeah there's particular circumstances where you might want to be using one or another I'm guessing that something like large is going to be slightly more accurate but a little bit slower that kind of thing yeah yeah the only difference between small medium large is just the number of parameters that that architecture has so a small version has about 19 million parameters the medium version has like 51 million and the large version has like like 67 million something like that so those are just you know that's that's all those small medium large means deploying machine learning models into production doesn't need to require hours of engineering effort or complex homegrown solutions in fact data scientists may now not need engineering help at all with model bit you deploy ML models into production with one line of code simply call model bit dot deploy in your notebook and model bit will deploy your model with all its dependencies to production in as little as 10 seconds models can then be called as a rest endpoint in your product or from your warehouse as a sequel function very cool try it for free today at modelbit dot com that's m-o-d-e-l-b-i-t dot com nice and so there might be listeners out there who are thinking of particular object detection use cases that could be useful like maybe they work for a municipal government somewhere and they're like oh we could be using this to be monitoring traffic or maybe they work for a national park and they're like we could be using object detection to be monitoring wildlife and so they might want to be able to deploy their model onto a very small low energy device maybe something that can be like solar powered so they might want to have their object detection model on like a raspberry pie or an Nvidia Jetson these very very small processors does Yolandaz like support that kind of or like particularly paired with quantization is Yolandaz gets small enough for those kinds of edge devices yeah Phyllison yeah yeah there's uh we we offer a bit quantized version of Yolandaz so Yol the full Yolandaz itself was optimized for Nvidia T4 GPU we're working on making a version for the Jetson device so we're almost there it's a little bit more research work to go into it but you know we offer the intake quantization model that that could be used but in general like you know like we can talk about the things to consider when we're deploying on these edge devices right because like you mentioned they're small devices right so they typically they're not a full blown laptop for example right you can only fit you know certain hardware on there and certain memory on there so the things you have to consider are model size right because the model that you have needs to be small enough to fit on the memory of the edge device and deep learning models they get huge right they can get into the gigabytes so in order for you to get that model to fit right you you could be in the lab building the most accurate model ever and it's amazing but then you go to deploy you're like oh this thing does not fit on the actual device so then you can look at isn't techniques like model pruning where you're just you know getting rid of the layers in the network quantization knowledge distillation we can talk about knowledge distillation a little bit if you like but you know these techniques help reduce the size of the model without too much of a loss in performance so like I mentioned yellow and asked it's intake you know we're quantizable to int 8 which means that you can reduce the model size to get on those edge devices but that's not the only challenge you have you also have inference speed right because the model needs to be able to run quickly enough to process incoming data in real time or near real time right and so this you can achieve by using you know efficient architectures and some of the optimization techniques that we talked about in the case of Yulenaas like we use auto-nac to find the optimal architecture that was hardware aware for the t4 and it's considering all the components in the inference stack you know compilers and quantization and all then the other thing got to worry about is power efficiency because like mentioned you might not have this thing plugged in it might be running just hanging somewhere it might be solar power whatever so there's a tradeoff between accuracy and power consumption why because high model accuracy usually means more complex computations more complex computations means more power needed to do those computations and then finally it's just software compatibility because you have to deploy this thing on that device and there's you know what runtime are you going to deploy it on so Yulenaas it's compatible with like like Nvidia tensor rrt which is a common one yes so those are the considerations to make in those scenarios nice very cool that was a really nice in-depth breakdown of the kinds of considerations we'd want to make as we deploy to these smaller devices very cool I didn't know before I asked you the question that you had this level of expertise in it so that's awesome definitely not an expert just just learning and listening and and taking notes nice yeah clearly very well you it's it's been awesome the level of depth that you've gotten into and all the topics so far so we've obviously talked about Yulenaas a lot in this episode people might be excited to use it are there like commercial restrictions for people using it either you know out of the box or fine-tuned to some particular object detection task that might be relevant to their users yeah so the so it's a bit of a non-standard license for Yulenaas right so if you want to take our pre-train model with its weights you're free to use them but if you're using it for commercial purposes then you need to get permission from desi so the pre-train model weights themselves are not open for commercial use however the architecture itself the architecture itself is like you could take the Yulenaas architecture start from scratch and train it on your own data right and so that has like kind of its own you know that that affects a lot of things is pros and cons with that as well but yeah long story short the weights for Yulenaas as they are are are not available for commercial use but you could take the untrained model with this architecture trainer on your own data and you're off nice that's cool that's a nice compromise and we see that that kind of thing with very large models that are released today so Metta's Lana architecture for example for natural language processing they were releasing it for people to use for academic use but then someone that they gave permission to do that with released the model weights to be available to torrent and then but it but you know you still if you're a responsible business owner and you should be if you don't want to get into a legal quagmire in the future you then can't use Lama for commercial purposes you'd be insane too and so this kind of approach that you're suggesting with Yulenaas where people can take the general architecture and then put the the considerable expense in that you guys have and deserve some some reward for if they want to put that considerable expense in to train the model of himself fine tune it to their particular use case go for it I think that that's a really nice compromise yeah yeah and I mean it's it's not always easy to train like models like that right out the the bat so you know sometimes it makes sense to kind of just fork over whatever it is and and get the pre-trained weights yeah um and yeah they really get to like there's there would have been a huge amount of compute required in the auto-nack neural architecture search that you guys did and so now that all of that huge amount of compute has already been done and this optimal object detection and architecture exists you've already left people in a really great place um yeah even if they want to just be yeah going from there and coming up with a commercial use case very cool now so a term that I hear a lot that I talk about a lot in the context of these very large models whether it's like the long architecture we were just talking about which is specific to natural language processing or whether it's a big machine vision model like Yulenaas is we can talk about these as foundation models so Harpreet can you fill us in on what that means to you and what makes Yulenaas a foundation model yeah so foundation model to me I kind of just go about the definition from that paper and it's uh any model that was trained on broad data using semi-supervised or self-supervised techniques and Yulenaas fits that bill so it was pre-trained on the object 365 data set and this has you know two million images and 365 categories and some crazy number of bounding boxes um in that data set um so this gives just a large number of images and categories gives you a wide range of examples for this architecture to learn from and this improves its ability to you know predict on downstream kind of tasks um but we didn't stop with object 365 for pre-training we actually went uh another pre-training round after that on pseudolabeled cocoa images so we took so okay so pseudolabeling what does that mean so that's this this a semi-supervised learning technique so what semi-supervised learning that's when you use a small amount of label data and lots of unlabeled data pretty much right so we we we had a pre-trained model a model that we trained already on cocoa right and if it was our version of YOLO X which we have available in the models for super gradients we took our version of YOLO X and then on the cocoa test set which nobody knows the labels to um we labeled it using YOLO X our version of YOLO X and generated labels for that data set so that gave us more data to train on 123,000 more images to train on um so doing this like you know we're able to use that test set to generate labels referred to training uh so this improves the models predictions and ability to work on new data because now we're giving it even more data to learn from so because it was trained on this you know such an extensive and diverse sets of data um it's Y YOLO X has its high performance and generalization kind of abilities um yeah so I guess you know the intuition I guess behind that is that these architectures that you train on large enough datasets uh end up serving like as kind of a generic model that generalizes well for different types of downstream tasks um so we tested how well it it it works on um on uh the robo full 100 dataset um which is this new benchmark uh made up of uh it's made up of 100 different datasets from the robo full universe um and we performed well on that we beat pretty much all the uh modern YOLOs um but then taking a step further for the training like we use some more training techniques like knowledge distillation and then something called distribution focal loss uh so I promised us what quick primer on knowledge distillation so yeah you offered it and I had the opportunity I was planning on circling back let's do it right now yeah yeah it's just it's the process where you take where where you you take you know it's you have a smaller simpler model that you call like the student and when you train this student to reproduce the behavior of a larger more complex model which you can call the teacher and so the idea is to transfer knowledge from the larger model to the smaller one so this way the student uh achieves higher performance than it would learning on its own so this helps create models that are faster and more efficient but still have that high level of accuracy um and then distribution focal loss if you want to talk about that if your audience is interested I could talk a little bit about that but um yeah go for installation yeah um so yeah okay so let's talk about that distribution focal loss so focal loss what does that do it modifies the standard cross entropy loss that we use for classification and it's specifically designed to address class imbalance problems in data set so the focal loss is a function that deals with the class imbalance where there's a lot of easy negative examples for example the background and then there's relatively few hard positive examples which are the examples that you want to detect so during training a model might come across a large number of negative examples and again negative examples is just parts of an image where there's really no object of interest um you know compared to parts of an image that have an object of interest um so this leads to to imbalance because the model is now being overwhelmed by easy negative examples and it ends up not paying enough attention to these hard positive examples so the focal part of this focal loss function gives a higher loss for hard misclassified examples and the lower loss for correct easy examples so it's it's focusing on the hard misclassified examples um so this way it prevents a number of easy negatives from overwhelming um the the detector during training so then just the distribution part of that distribution focal loss uh instead of calculating loss based on individual classes it calculates loss based on the distribution of classes which improves the model's ability to handle class imbalance very cool the focus there on uh the the negative misclassified cases or um you know getting these misclassified cases correct it reminds me of the way that gradient boosting works so for example the way that we can take um a forest of decision trees um but each step of the way we're with gradient boosting we identify situations where the model was wrong and correct specifically those instances and that's what allows xg boost to be such a top performing model approach particularly for tabular data and if people want to hear a lot about that episode 681 with Matt Harrison we focused on xg boost there but yeah it it's interesting like this that conceptual idea obviously is being implemented in the way that you just described in a completely different way but that conceptual idea of taking where a model is wrong and specifically focusing on those instances to shore up the model in those cases it makes a lot of sense yeah awesome so foundation models knowledge distillation distribution focal loss you fill this in on a lot of technical concepts in the last few minutes heart rate and you know that exemplifies in a nutshell this enormous amount of knowledge about deep learning models and model training particularly related to computer vision that you've developed um in a relatively short period of time that you've been a desi so clearly you're studying super hard and so it's interesting this this role that you have as a developer relations manager some people might kind of think of that in their heads is being like almost like a marketing role and it is like it is like the purpose of this is to help develop the community but clearly there is a deep technical aspect to what you're doing like you are able to come on this show and be able to go into any seemingly any level of depth on these complex deep learning topics and so yeah I just I find that interesting and so I guess relatively quickly what kind of data scientist would you recommend transition from a data scientist to this kind of deeply technical developer relations role like you have yeah yeah developer relations it's like an umbrella term much like data science is an umbrella term right because when I think of data science I think of well there's data engineering there's you know business intelligence there's data analysts there's traditional data scientists so and so forth right machine learning engineer deep learning engineer so and so forth developer relations is the same way it's an umbrella term for a set of kind of skills and you know obviously community building is one of them advocating for the needs of your users on the product roadmap the advocate is another the developer evangelist type of role where you're one to many communications right then you know the role I love is the developer educator type of role so the type of data scientist that I think would fit well in this role is like you have to be comfortable doing what I'm doing right now right like just coming up on on stage on screen multiple times a week and just being there and just you know being okay and comfortable with your own lack of knowledge because you have to learn a time right your community is interested in a number of different topics you have to help them by creating content on those topics or bringing in experts on those topics but when you bring in those experts you gotta get yourself knowledgeable so you can have a good conversation with them right so it's really you know you have to be you have to be the type of person that just loves being uncomfortable loves feeling like they don't know anything and be driven by that but then also just you know curious you gotta be curious about what's happening in the industry the tools that are out there keeping up on trends all that stuff yeah I have in a hard time articulating the exact type of data scientist but look at me if you hear me talking and you feel like you know you like me then you might be good fit for developer relations yeah yeah I think this ability to to communicate clearly both orally as well as written is going to be essential and yeah being comfortable continuously learning new topics so for a really broad range because your community is going to be interested in a lot of different topics that makes a lot of sense to me so at desi you have specifically specialized in deep learning and you've been doing that as this generative AI cornucopia possibilities has been taking off so what's that been like and what kinds of trends do you think will drive the future growth for artificial intelligence it's been exciting being you know kind of in the space as this generative wave has been taken over like at you know we're working on generative use cases at desi like when we're at cvpr we'll be showcasing our version of stable diffusion which is then it runs faster than normal stable diffusion and I'm looking forward to us open sourcing those models so that I could talk about them and create content about them and learn more about them so it's just been a whirlwind like you know my Twitter feed is like the best Twitter feed ever it's just nothing but like cutting edge research happening and you know you look at my my read a later list and it's just deep but I love it I love it and I guess in terms of trends man I wish I wish I could I wish I was insightful enough to see what kind of trends that that would drive the future of AI but I think you know deploying on resource constrained devices is going to be a thing right like we all have reforms attached to us but you know do we always want to send our data to open AI right what I'm curious to see what apples going to come up with that's going to run locally right here on my phone that's going to allow me to take advantage of these you know generative models on a small device like this so kind of the interplay between internet of things and and generative generative models that's what really really excites me yeah that's nice yeah that makes a lot of sense and I've been blown away by the incredible capability of relatively small open source large language models so for me in the national language processing space as I've talked about in a lot of recent episodes and as I am working on daily at my company nebula we're taking open source architectures and these could be very small relative to the kinds of open AI LLMs that you're describing out there so the open AI LLMs like GPT-3 we know was 175 billion model parameters and GPT-4 was probably larger um they never released that but it takes longer to get GPT-4 results so presumably it's an even bigger model so the these kind of state-of-the-art private models are hundreds of billions of parameters but if you have a relatively constrained set of use cases that you need your model to be able to handle you know you don't need it to work in every imaginable language for example um you can take these open source models that are often like 3 billion 7 billion 13 billion parameters so they're like from a hundredth to a tenth of the size of open AI's private models and you're able to fine-tune it very efficiently using parameter efficient fine-tuning approaches like low-rank adaptation and then you have this model that can be deployed on a single GPU but as you point out they've got to get even smaller like being like having a big cloud GPU as like hey that's it can even be that small that isn't nearly small enough like we need to be able to have these models run on people's phones or being able to run on Raspberry Pi's or the Nvidia Jetsons that we were talking about earlier and so I think you're absolutely right I think that the future of AI is smaller models that approach the kind of performance that we see from models like GPT-4 but are constrained to more refined use cases that makes a lot of sense to me um so obviously you're learning a lot about deep learning in particular and I know that you have a particular philosophy of learning deep learning that you describe as top down do you want to describe for listeners what that means and why it might also be the way that they should be learning complex concepts like deep learning yeah yeah so I'll preface this by saying that like you know I've got a master's mathematics and statistics I was a bio statistician I was an actuary I was a data scientist so uh it's this coming from that perspective but even with as somebody who has that background like my approach is to just skip the math first skip the math right ignore it when you're standing starting out because looking at equations is going to demotivate you right so what I instead um implore people to do is just look for applications of deep learning right so you know pick up yolo nass and run run it on some image see the power of it open up you know chat GPT or any of the other language models and start playing with it like interacting with it um start interacting with with with with models trying to build something with it trying to do cool stuff with it right you know open up learn some lane chain and see what you can build right just see the magic happen get inspired um then once you're kind of inspired right if you think it's cool right some people probably don't think it's cool they'll just you know that okay cool whatever uh that's fine but if you think it's cool and you're interested then dig a little bit deeper and dig in deeper there's like a couple of places I recommend one of them uh Andrew Glassner has this deep learning cloud crash course it's like three and a half hours long uh but it it gives you just proper intuition for how how all this works uh very good return on time investment so like the he wrote this book the deep learning uh they illustrated guide huge huge massive book right um no deep learning visual approach oh yes deep learning visual approach yes deep learning illustrator is another book I'm about to talk about but deep learning illustrator approach uh great book um and then once you do that uh start learning some pie torch right just you need to move away from psychic learn going from psychic learn to a pie torch is a bit of a mental shift but you know Daniel Burke I'm not sure if you've interviewed him on your podcast or not he's awesome he's based out of australia I highly recommend him uh mr. deep Burke on twitter um but he's got the zero to mastery pie torch course go through that because you're gonna get a bit of intuition about what's happening under the hood uh then you're getting your fingers on the keyboard you get your hands dirty you're coding right uh this is nice because it's completely self self-paced and you're gonna learn how to code in pie torch which I think is the best for deep learning in about a week right you'll be comfortable with pie torch in a week so now once you got that then you pick up this book uh by john chrome called deep learning illustrator and this will get you more of the math right so get more into the math with with this book then john person youtube that teaches you the basics of deep learning math um you know two two and a half hour time investment and you're learning from an Oxford PhD come on that's amazing uh and then uh and then I think one of the most important things just you have to understand back back propagation I think once you've gone to this point right just spend some time understanding back propagation um they just make sure you kind of understand what you know intuitively what's happening and you know maybe a little bit mathematically I like Andre Carpati's um things called neural network zero to hero on youtube great great resource for that you actually end up building um you end up building like a mini version of pie torch uh I think he calls it like mini grad or something like that um but it's amazing it's great um once you've done that then get an understanding of more foundational architectures you could you know once my LinkedIn learning courses out go through that go through some of the foundational computer vision architectures uh yannick culture has a great youtube series on classical papers breaks it down in an easy to understand manner um and then just join some community you know be around other people who are into the same stuff uh you want to be around people who have a broad range of experience um from learners to experts um and they're finally just projects just do projects get on cago do one projects um I think that's the best way to go about this nice that was a great road map there and yeah for people getting into deep learning that sounds like a really great flow from Andrew Glasner's deep learning a visual approach I didn't know that he really I guess he doesn't have very much math at all in there um and yeah I haven't read his book but you can confirm yeah not not too much math it's it's all like it's just uh you're understanding the concepts through illustrations which is amazing yeah nice yeah and it's interesting with my deep learning illustrated book and thank you very much for the shout out there I set out to have it be as unmathy as possible but it's interesting but but it does have some math certainly and so it's it's cool to me that somebody else has come up with an approach that is even more visual so the people who want that completely visual approach can do Andrew Glasner's first and then yeah I've got some of the math in my deep learning illustrated book after I wrote deep learning illustrated I realized that a big gap is people's ability to apprehend the underlying linear algebra and calculus of deep learning like the calculus associated with backprop so since then since I wrote deep learning illustrated a lot of my content creation has been around these foundational subjects completely different from the idea of foundational models where yeah these foundation models are yeah these the huge models that you were describing earlier foundational concepts completely different I'm just talking about like math and computer science and probability and statistics that these foundational subjects that you need to know to be able to understand machine learning well and so I would I'm I put a lot of time into and I think and based on feedback I can confidently recommend if you don't understand exactly how backprop works today I have no doubt that Andre Carpathi's resource is great all of Andre Carpathi's resources are awesome but I have a calculus for machine learning YouTube playlist that goes from and maybe you know if you already know calculus well you can skip some of the beginning videos where I show you how calculus works so I explain in an intuitive way and with lots of hands-on code demos how calculus works how partial derivative calculus in particular works and then how we can use partial derivative calculus to do back propagation and it and so my guess is that it's a longer journey than Andre Carpathi's because I think it's something like seven hours of videos and then if you do the exercises as well you're looking at even more time invested there because I give you like exercises and solutions at different points checkpoints throughout this curriculum but yeah whether you're yeah wanting to get just get started on the underline calculus to understand backprop or you want to jump to later videos and get deep into the weeds on how backprop works using calculus principles and do it in a hands-on Python-based way yeah I think I it's my own resource but I've linked to linked to that resource many times like it's a great YouTube course and another kind of interesting resource I like is there's like this series of manga books that touch on a wide range of topics I've got the entire set but there's a book there on calculus and on linear algebra and they're like you know proper like comic books but it teaches you calculus yeah yeah yeah yeah the manga guy to calculus and the manga guy to linear algebra super did awesome so near the end of every episode I asked people for book recommendations but you've just given us a ton so I think we've covered that question unless you have any other books you'd like to add you know I used to I used to I've traded when I was recording the article today science podcast I read a lot of books like because I had so many authors on and since I kind of put the podcast on hold for now I spend most of my time reading research papers in the morning whenever I have free mornings I have not read a book in like six months sadly but the one that I have currently just gone back to rereading is a deep work by Calney Port I think that's a good book yeah important work book for for people who are in a roles like hours that are knowledge intensive and require a lot of things yeah so important to be able to carve a little bit of time every day to be able to work deeply it's absolutely essential all right Harpreet so awesome episode today thank you so much for taking the time and hopefully you can even consider this to be some of your deep work for the day I actually usually do otherwise if I didn't include filling podcast episodes I'd have been embarrassingly small amount of deep work done every day and so if people want to be leaning amazing insights from you after this episode obviously we know that your deep learning daily podcast is something that they can be checking out what other what are the ways should people be following you nowadays I'm mostly on Twitter so data science harp on Twitter so find me there you know I still have I still have like a huge falling on LinkedIn I'm just not as active on LinkedIn just because the algorithm has been unfair to me far too much but Twitter is just like a cool like if you're if you're wanting to get into deep learning and keep up on trends and on research papers and things like that like Twitter is the place to be LinkedIn I find is more classical ML data analyst data engineering business kind of focused but twitters are all like that the cool stuff is that I'm into at the moment so find me there nice so we'll be sure to include those links in the show notes Harpreet thanks again for taking the time and maybe in a couple of years we'll be catching up with you again absolutely man looking forward to who knows who knows Robbie in a couple of years but you know Bobby be always be happy to come back on nice catch you in a bit Harpreet cheers nice so great to catch up with Harpreet on air he's clearly flourishing in his deep learning developer relations manager role and making a big impact in today's episode Harpreet fill us in on how object detection is a machine vision task that involves drawing bounding boxes around objects in an image and then classifying each of those objects talked about how object detection models have become much faster in recent years by requiring only a single pass over the image as with the renowned you only look once yolo series of model architectures he talked about how desi leveraged their autonac neural architecture search to converge on yolo nas an optimal architecture for both object detection accuracy and speed he talked about how hybrid quantization selectively quantizes parts of a model architecture in order to increase inference speed without adversely impacting accuracy and the future of AI may lie at the intersection of the internet of things and smaller generative models as always you can get all the show notes including the transcript for this episode the video recording any materials mentioned on the show the URLs for Harpreet social media profiles as well as my own social media profiles at superdatascience.com slash 693 that's superdatascience.com slash 693 if you live in the New York area and you would like to engage with me more than just online you'd like to engage in person on July 14th I'll be filming a superdatascience episode live on stage at the New York R conference my guest will be Chris Wiggins who's chief datascientist at the New York Times as well as a faculty member at Columbia University focused on applications of machine learning to computational biology so not only can we meet and enjoy a beer together but you can also contribute to an episode of this podcast directly by asking professor Wiggins your burning questions on stage all right thanks to my colleagues at Nebula for supporting me while I create content like this superdatascience episode for you and thanks of course to Ivana Mario Natalie Surge Sylvia Zara and Curel on the superdatascience team for producing another eye-opening episode for us today for enabling that super team to create this free podcast for you work of course deeply grateful to our sponsors please consider supporting the show by checking out our sponsor's links which you can find in the show notes finally thanks of course to you for listening I'm so grateful to have you tuning in and I hope I can continue to make episodes you love for years and years to come well until next time my friend keep on rocking it out there and I'm looking forward to enjoying another round at the superdatascience podcast with you very soon