VIDEO TRANSCRIPTION
No description has been generated for this video.
Hey, welcome back. In today's video, I'm going to show you another large language model that you can install on your computer, no high-end hardware required, just your CPU. Not only do you get the fully quantized model, you also can download the weights and the raw data as well. So really, really appreciate the work by Gnomic AI. So I'm going to show you how to install this on your computer step by step, and then I'm going to show you some examples of it actually in use. But for today, this is absolutely free. Let's get into it. And before we get into the install, let's look at the actual performance.
A lot of people from the previous video complained that the Alpaca model really wasn't performing well. But what I've found from the GPT for all Laura quantize model is it actually performs really well on par, in my opinion, with GPT 3. 5. So here are a few examples. One thing to note is that everything is done through terminal, so there's no UI yet. I did put in a suggestion that maybe they should integrate into Dolly. I think that would be really cool. But let's take a look. So I typed, tell me a joke, and it did okay.
What's brown and sticky? Before I cleared my screen, I actually said, tell me a joke, and it gave me the full joke. Here it's starting with a joke, but it's actually telling me to insert the punchline, which is kind of weird. Next, I said, write Ruby code to count to 10. And it did. Puts counting down four, and it actually does it. This is good Ruby code. Not only that, it actually outputs the description of what's happening in the code, which is really cool. Next, I said, write a poem in the style of Shel Silverstein about LLMs. And here it is. I am an LLM.
I have been through hell. My life has not always been so grand, but now it'll never be again. So I won't read this whole thing, but you can see it. I asked it, who is Elon Musk? It got most of this information right, which is pretty cool. Next, I asked it what Chat GPT is, and it didn't quite understand that I was asking about the OpenAI version of Chat GPT, but it is saying character level text generation with pre-trained transformer, which is accurate. Next, I asked what the cutoff date for the data was, and it did not have an answer. That was interesting.
I did jump in the Discord and ask what the cutoff date was, and they said it was the same as Chat GPT 3. 5 Turbo. And then last, I said, tell me a story about peanut butter, and it output a story about peanut butter. So let's test one more thing before actually going into the install. What are the first three constitutional amendments? First amendment protects freedom of speech. The second amendment guarantees a citizen's right to bear arms. And finally, the third amendment prohibits quartering troops in private homes during peacetime without consent from the owner or occupant. So there it is. It seems to do really, really well.
The output is fast, it's accurate, and I've tested a bunch of stuff, and it seems to work really well. So let's actually get into the install. Here it is. This is the GitHub for GPT for All by Gnomic AI. So the first thing you need to do is clone this repository. And to do so, come to its GitHub, click this green code button, copy the URL, and then we're going to switch on over to our terminal. Now you can put this folder anywhere you want, but what you're going to do is type git clone, and then paste that URL. Now I've already downloaded it, so I'm not going to do it again.
But what you would do is hit enter, and it's going to clone it to your local computer. It really should only take a few seconds because you're not actually downloading the model itself. You do that separately, which I'll show you now. So on the GitHub page, you're going to scroll down, you're going to look for the try it yourself text. And then right here, it says clone this repository down, which we already did, and download the CPU quantized GPT for All model. And that's a link right here. Now this is a pretty big file, but it's not huge. It's only about four gigabytes. So you can download it.
It'll take a few minutes. So this is the GPT for All folder that we cloned. We're going to open it up and you're going to place that GPT for All loracvantized. bin file that you just downloaded in the chat folder. The next thing it says is there's actually only two commands. Uh, one, if you're running on the M1 and one, if you're running on Windows or Linux. And so what we're going to do is we're going to grab this, we're going to copy. This is the command.
Now I did run into a hiccup and I think they're already fixing this if the, if not already fixed, but you'll actually have an error loading the model if you don't navigate to the chat folder. So I tried to do it from the root cloned repository and I got an error and I went in the discord, I asked how to fix it. And they said, just navigate to the actual chat folder and then run just this part of the command. I'll show you that now. Okay. So I'm in the chat folder, as you can see right here.
So what I'm going to do is dot slash and then put that command right there and I'll hit enter. It's loading the model and there it is. We're in. So this is the prompt right here and we can type anything we want. Let's say what is the day after Saturday, Sunday. And we can say how many days in February, 28 days in February, because it has a leap year. That's cool. It actually gives you much more information than just what I asked. So one thing I'd like to be able to do is actually use temperatures.
So I see it's outputting the temperature right here, but I want to be able to set that temperature. When I went into the discord, it said that the only way to actually configure temperature is if you're running this on a GPU. I'll dig more into that and I'll leave information in the description if that's not true and we can actually do it from here, but I haven't found a way yet. So one other cool thing to point out is that this is actually trained on 800,000 GPT 3. 5 turbo generations to train llama. So they basically take in what alpaca did and kind of reproduced it. They're using GPT 3.
5 turbo based on the llama core data set and they're using it to train it to be an assistant style large language model and it works extremely well. I also again just want to give a shout out to nomic AI and the people there because they open sourced it and I really believe in open source. I think one company holding the keys to large language models is not the way to go. So I appreciate all of these different engineers putting out the open source models. So if you see here you can actually run this yourself.
You can fine tune it on your own machine if you have the right GPU for it and so here it is trained Laura weights. You can download the weights from hugging face. You can also download the raw data. Right now it says they aren't distributing the 7b checkpoint though so that's just something to note. They also give instructions for setting it up locally and training it. So here's how to train it. You're probably going to update this fine tune 7b. yaml with your own custom data and then you're going to run this command and that should do it.
I'm not going to be able to do it because I don't have the right GPU for it but that's how you do it. So a few videos ago I made a tutorial about how to install dolly which allows you to use the llama and alpaca models locally on your computer. A lot of people gave me really great response and feedback from that video and one of the questions that I got over and over again is how to fine tune your models. I did a lot of research into this and the bottom line is you need a very high end GPU which I don't have.
So I think you can minimally get away with a 4090 but a lot of people are saying the a100 which is a multi-thousand dollar GPU and I just don't have that. I saw another youtuber Martin Thyssen actually use a GPU in the cloud which was really interesting and he had a video about that so I may end up recreating that but regardless if you want to fine tune these models there doesn't seem to be a way out of having to actually fork up some cash to do so. And that's it. I hope you got this working and if you didn't feel free to leave a comment below.
I'll try to help you out and of course you can always dive. .