789

July 1st, 2024 × #AI#LLM#Tokens

Do More With AI - LLMs With Big Token Counts

Discussion on using large language models with greater token counts to provide more context, allowing for better and more complex outputs to aid software development.

or
Topic 0 00:00

Transcript

Guest 1

Not much. Not much. How are you doing, Scott? Oh, I'm doing good. Just,

Guest 1

Pretty good. Yeah. We we had a little get together on Saturday, and I I grilled some plant based burgers and baked based,

Topic 1 02:13

Sanity experimental feature allows showing potential solutions for errors

Guest 1

Yeah. So, it is a unit of tech. So, I mean, I like to think about it as, like, a word. I guess it it depends on the LLM that you're using. Could be, like, a single character or a whole word. But you might think, of of any block of text, Count the words in it. That could be the token count. Or count the number of characters in it. That could be the token count.

Guest 1

And so, essentially, the model, when it's processing it, it's going to break down your prompt and anything you you provide in that text there into tokens so that it can process it. And you've probably seen any AI tool that you come across, like ChatGPT or Claude or Perplexity, or we're gonna be talking about, Gemini 1.5 Pro, they all have this this token limit. So that's essentially the maximum number of tokens that you can provide in both your input and the output that that comes from it.

Topic 2 04:31

Context window is number of tokens model can access at a time, limiting previous questions

Guest 1

Definitely. And I think the other thing to think about is this this context window whenever you're let's say, most people have probably interacted with ChatGpT a little bit. So the way to think about this is, let's say you're having a conversation with ChatGpT, and you give it a prompt. It gives you a response. You do another prompt. It gives you another response. The context window is basically the the token limit. And so for GPT 3.5, I believe, it's, like, roughly 4,000 tokens. We'll get we'll try to get a table of, like, exact token counts. But think of a a token limit of 4,000 tokens. That means that the model can only look at 4,000 tokens at a time. And so as you ask it each new question, technically, it can only go back in time 4,000 tokens. So it's kind of losing context for previous questions that you asked it or previous output that it gave. And so that's kind of what you have to worry about when you have a smaller token count in a in a smaller context window.

Topic 3 07:15

More tokens gives models greater sense of context for better output

Guest 1

Definitely. And it also ties into preventing hallucinations.

Guest 1

So that's when the output is it just completely made something up or it's, like, not even related to what you're asking it. So, like, an example that that Scott gave, if your very first prompt JS, this is Node version 20 in ESM, for and let's say so I just looked up the table here. And if you have we have GPT 3.5 Turbo.

Guest 1

They have roughly a context window of 16,000 tokens. And so let's say you've had some back and forth with with this Node, and now you're there's maybe, like, over 20,000 tokens even in this conversation, the next prompt no longer has that context of that very first message that said Node version 20 and, ESM. So now it might output, like, require statements, and it might do something from, like, Node version 16. Like, it's gonna it's not gonna have that that context anymore, so it's it's gonna, like, hallucinate things or do things that you didn't expect. Yeah. You could think of it as, like, a virtualization window. You have a, like, a window,

Topic 4 08:55

Gemini 1.5 Pro has over 1 billion token context window

Guest 1

Yeah. And, we'll share this table in the show notes. But, basically, it has all of the models that you can use with, specifically, OpenAI. I know that's one of the most popular ones and that people use. But, like, specific specifically, GPT 3.5 Turbo can support, 16,385 tokens.

Guest 1

But GPT 4 Turbo can support a 128,000 tokens, so that's much larger.

Guest 1

And then the latest GPT 4 o, which is Omni, apparently, it has the same context window, but it's potentially fine tuned a little bit more. But with GPT 4, if you're paying for, like, the GPT plus, your max context window is 128,000 tokens. I guess, in contrast, we can look at Claude. And I have you used Claude before? I've only used it a couple of times. Yeah. I pay for

Guest 1

And, yeah, and that one looks like on their pricing page, they're saying that it has 200,000 tokens.

Guest 1

If you're using, Cloud 2.1, it also has 200,000. And then Cloud 2 ESLint o has a context window of about a 100,000.

Guest 1

So a fairly big, and I I think if you're at least using 2.1 or 3 ESLint o, it's it's almost twice the size of GPT 4. So it definitely gives you a much bigger window to make sure that you maintain that context in your in your prompts. Yeah. Yeah. I've always liked that you can paste in

Guest 1

that's maths. And whatever that value is, I can't do in my head. That is the maximum output that the the AI could give you for that specific prompt if it was 2048.

Topic 5 13:22

Generated Swagger docs from SQL schema for restaurant API

Guest 1

Definitely. And, just to put this in context, like, a 1000000 tokens is 10 times more tokens than GPT 3.5 tur turbo can handle or 5 times more than, like, Cloud 3 ESLint o. So, like, we're dealing with, inputs and outputs that are, much much bigger than than than those models specifically. But one of the the first things I tried with this was I I wanted to generate some OpenAPI documentation or or Swagger docs, if you've ever heard of those, given a database schema.

Guest 1

So I had the SQL data definition language for a complex database, has, like, roughly 12 or so tables, and I wanted it to generate an API based on that data. And, it did really well. And so one of the things that kind of, like, floored me initially is just how big the responses can be.

Guest 1

Now I think there are sometimes, there are time out issues. Like, sometimes, it'll be responding, and then it'll just stop just like you you might get with ChatGPT where it's, like, it's giving you a response, and then all of a sudden it just Scott. And then you have to say continue or tell it to to keep outputting. In my experience with Gemini 1.5 Pro, it was able to output for, like, almost 2 minutes straight of just, like, giving me all the all the stuff.

Guest 1

And they may or may not change that because that could technically be, like, an API rate limit thing where it, like, maybe at a certain point, they just completely stop it from generating. But I do wanna show you, what was generated here. So I have, if needs to describe all of the schema, so all of the possible data and things that will be involved in inputs and outputs.

Guest 1

And so given the SQL code that I gave it, it output a schema for category and city and comment. And, specifically, this was around a restaurant API. So if you've watched the video that I did on the Syntax YouTube channel about, Drizzle, I showed a complex database there, but I wanted to see, could could Gemini output, some API documentation automatically based on that database schema. And so, it's pretty slick. If if you're watching the Video Node, basically, within Swagger Docs, we can see for every type of thing, it has a specific schema. This is, like, in line with the schemas that are actually in the database. And then we can see all of the different route groupings. So we see endpoints for restaurants, for menu items, for offers.

Guest 1

And then what's cool about this is each one of these also has, like, validation for for input. So if we take a look at the post request for restaurants, restaurant ID, and then menu items, This has a specific schema that describes what a menu item would have. And then whenever we're making a post request to this endpoint, the Swagger UI is gonna basically give us that schema, give us some examples, and and we can kind of, like, play around with the API as well. So the API doc definition itself is roughly 1400 lines of code, and Gemini 1.5 pro was able to output those 1400 lines in a single response. I did kind of do some back and forth because, initially, it gave me, like, a very flat schema that was just, like, Node route grouping for every table. But then I was like, well, maybe we should group these together JS, like, menu items by restaurant. And so after it output the full one, it then after I asked it to reconfigure it, it outputted another full one, but with more context of, like, oh, okay. So let's actually, like, group these these endpoints together. I tried a similar thing in just chat g p t four, and it could never get past just, like, generating schemas and then, like, one endpoint. It would just stop. It would it it wouldn't Yep. It wouldn't generate anymore. And, so, yeah, this this was pretty cool for me because I'm just imagining the working in much larger code bases. Right? So, like, this is giving me a if if there if this if I were building this app for a customer or something like that, this is a huge jumping off point. Right? Like, the doing the the mending work of, like, thinking of what are all the route endpoints and what are the schemas, it's done that. And then now I can go in and do the the more complex work of, like, implementing these endpoints or even further ESLint the the AI to to implement some of these endpoints for me as well. Wow. Yeah.

Topic 6 17:13

Entire codebase in context allows focused questions about code

Guest 1

Definitely.

Guest 1

And, I mean, for me, that's huge. And because you mentioned this earlier, like, maybe you paste a library for the in the context. Or what I was thinking is you could even paste documentation for, like, newer libraries. I think one of the Yep. Issues that, like, ChatGPT has or any of these other AIs have is that they've been trained on older data. So if you in for instance, if you ask it for help generating, drizzle schema, it's gonna hallucinate most of the time because it actually doesn't have the the docs or the latest info about it.

Guest 1

But if you can simply just paste in the documentation or the API examples as part of your prompt, because now you have, like, a huge context window, now you can ask it to do things that it wasn't necessarily even trained on, and you don't have to, like, generate the embeddings for it. You can literally just include all of the the things you wanted to know about in your prompt itself, which is slick. Yeah. It's super slick. And that to me is great because that's where these things should be, you know, should be existing for us right now in this space. I I think too oftentimes people look at this stuff as it's gonna take your job because it's gonna do everything for you. And it you know, Node I know this conversation has been had to death. But Yeah.

Guest 1

Definitely. And I think that's probably the the for me, that's why I am getting benefit out of AI Node. It's like for for these little, like, simple, like, one off, like, solve this leak code or just, like Yeah. Build a to do app. Like, it's it's not useful for me, but I know what I wanna do. I just want it to give me a kick start. Right? So, like, do a lot more that that would, for me, just be, like, very mundane work, that it could figure out very quickly.

Topic 7 20:40

Knowing expected output allows models to aid development

Guest 1

Exactly.

Guest 1

Yeah. That's the big one. And, so the next example I have that I used was generating seed data for a complex database. So Yep. So if you saw over on the Syntax Channel, I did a a video about Drizzle and basically implementing a complex database schema.

Guest 1

And I I'll be honest. I wrote the schema myself. I didn't use AI for it. So I I actually, like, typed out all the code. But what I did use AI for was generating the seed data. So if if you take a look at this repo, it's, github.com/w3cj/bitedash.

Guest 1

And then if you go into the seeds directory under DB, there's a data folder, and then I have a bunch of JSON files. Each one of these JSON files was generated using the Gemini 1.5 Pro. So I, basically, I I pasted in my SQL schema.

Guest 1

I then told it that I was looking to seed data here, and I told it to output as JSON so that I could then write my own code to import these JSON files. But for instance, if we we look at like, the simplest one is categories dot JSON. It's very simple. It's just an array with 6 categories. We have appetizers, lunch, dinner, salads, sides, and desserts.

Guest 1

Easy enough. But it output that. I didn't have to think too hard about what are the different kinds of categories of food. But what was really cool is I told it to generate restaurants as well. And so, and then from there, generate restaurant items. So if you look at this restaurants Scott JSON file, this is, over 1600 lines of code in here. And, basically, for each restaurant, it has a street address, a ZIP code, a city name. And I told it to come up with fake restaurant names. I didn't wanna use anything in the real world. So it that's that's one thing that AI is good at is kind of just coming up with interesting things that you don't have to ideate yourself.

Guest 1

And then from there, it created menu items for each one of these restaurants that were, like, themed to that restaurant. And so in if we were to do this ourselves, there are there are tools like Faker, and there are other tools that can, like, generate this kind of data. But what's nice about this is it's like it looks like real data. Right? These look like legitimate menu items and prices and ingredient lists and stuff like that.

Guest 1

And, it was able to generate all of that for all of my restaurants all in a single go. I I think, actually, for this one, I did kinda do it, like, 1 restaurant on a time because trying to generate all of that at once, it potentially could've caused some some issues. I definitely had some back and forth, but I first had it generate a list of restaurants, and then I was like, alright. For this restaurant, generate 20 menu items that fall into these categories. And so you can take a look at this repo of all these JSON files. Each one of these was just generated using AI. And I think the just having this much data generated automatically to be able to, like, test out your app without needing to, like, run everything through through Postman initially, it was it was pretty sweet.

Topic 8 23:39

Generated complex fake data for testing without much effort

Guest 1

episode well, summaries that get generated by AI. But I guess I don't know the exact details of it, but I know when Wes was talking about it a while, like, back, he basically had to create summaries of summaries because the context window is always so big.

Guest 1

But for this, I literally pasted in an 8 hour transcript of React Scott day 1, and it spit out a a summary with chapter markers and summaries of, like, each talk that that were output. And this was crazy for me. So, so I have a YouTube channel called Coding Garden where I do live streams. And one of the things I've wanted to do for the longest time is to take a 4 hour or 5 hour stream and then easily summarize it, easily come up with, like, timestamps that I can post on YouTube. And I've never been able to find an easy way to do that with with AI.

Guest 1

And this can do it with GBT 1.5. So let me pull up the example really quick. Jeez. If you're watching the VideoPod, you can see the example here where I literally so the the prompt I give it is, given the following autogenerated transcript of day 1 of React Scott 2024, and then I paste it in the the transcript. And, you can see in the token count, this is 348,970 tokens Wow. For for my overall, like, prompt series here. So this is huge.

Topic 9 25:32

Summarized 8 hour conference transcript with timestamps

Guest 1

And, initially, I just I just tried this. I just said, generate a summary with timestamps given this transcript, and it hallucinated a bunch. Like so the all I gave it was the TypeScript, and it would output, like, random timestamps. It would come up with people's names.

Guest 1

Like, I came up with, like, 3 different names for Dan Aberbov. Yeah. Because I think, one of the issues here is you're dealing with auto captions from YouTube. So it doesn't necessarily have the right spellings or whatever else.

Guest 1

So to really make it work, I provided even more context. So, basically, we have the full transcript. And then right after that, I also plugged in let's see.

Guest 1

Yeah. So this is what I said. I said, the transcript might have names or technical topics wrong since it's auto generated.

Guest 1

The agenda talks and correct speaker names for the day Vercel, and then I gave it Perfect. Things. Yes. Yeah. So I I Bos basically pulled this from the React comp site, and I said, these are the speakers. These are the talks that they gave. And so now it has a little more context for how to to fit this, like, unstructured transcript data into a more, like, structured summary. And that's why context matters because Yeah. That context is really what enabled this to

Guest 1

Definitely.

Guest 1

And I I again, I think I think it's it's, basically, we we've taken an output that had a lot of hallucinations and that was like, okay. I'll try my best, and we've kind of, like, started to align it. And with bigger context windows, this is this is something you can you can start to do definitely. And another piece of context I gave it were the timestamps of the start of each talk. I also tried it without giving it this, and it actually did a pretty good job of figuring out when a talk started.

Guest 1

But by giving it this context, we're basically saying the intro starts at 16 minutes 13.

Guest 1

The talk on what's new in React starts at 2 hours and 28 minutes. That gave it even more of a box to fit into. So that way, in the actual output, each of the sections, it knows to start at that specific time stamp and then kind of, like, summarizes each thing that happened, within that section. And so you're watching the video pod, I'm not gonna, like, read the transcript, but I I will link to the the generated summary that it gave me. But it's it's pretty insane because it has the starting timestamp of each talk. It then has bullet points of everything that happened in that talk and then a timestamp that links to that specific section in the talk. And so if you're someone that's trying to very easily review, like, 8 hours of a of a video, this is huge because now you have some some starting points. Right? You have some bullet points you can go by. You can dive into some of these time stamps. And, yeah, just makes, looking at big a large amount of of information to kind of, like, distill it, but in a way that isn't hallucinating JS as far as I could tell so far by verifying it, and was generating, like, good data. Yeah. I think that it

Guest 1

earlier on, I was asking it to generate, like, YouTube transcripts in the style of CJ. It thinks I'm I mean, I'm a pretty nice guy, but it thinks I'm just, like, way too nice and bubbly. I don't know.

Guest 1

But the some of the some of the ideas I had for, like, personal AI are basically asking it about things about your own life. Right? So, like, what if you could preface a prompt with your entire calendar and all of your agenda like, your meetings that are coming up or your previous meetings? What if you could preface a prompt with, like, every note you've taken on a specific topic? And so now it has context to answer in relation to your own notes and your in in your own information about your own life. Yeah. I keep a lot of detailed notes in Obsidian, and that's all just straight markdown. It'd be really interesting to pass it in some of my Obsidian notes. Definitely. And I think, especially, like, if you're doing research on a topic and, like, you've taken your own notes and, like, pulled in from various resources, then you can start to distill down and ask it Wes, based even on, like, the notes that you've taken. Word.

Guest 1

So with all of this, we haven't talked about cost. And so right Node, as as the as of the recording of this podcast, Gemini 1.5 Pro is technically free to use in the dev console. So if you go to AI studio dot google.com, you can use Gemini 1.5 Flash and Gemini 1.5 Pro. We haven't talked about the differences, but I think Flash, eventually, when they start charging, will incur less cost. It's a little bit faster. But in the dev console, you can just have a conversation with it. It's not gonna charge you anything. But Wes they do start charging, it does look like it's gonna be somewhat expensive. So we'll have to be careful with how many 400,000 token prompts that we give it. And it also I guess, I think it depends on, how often you're querying it as well. But if we look at the the pricing page, it is saying that for GPD 1.5 pro, if you're using it via the API, it will be a dollar 5¢ per 1,000,000 tokens, and that's for prompts up to 128,000 tokens.

Guest 1

So this summarizing a 8 hour transcript probably would have costed me $2 if I would've used the API. Like, I I guess we we don't know for sure because a lot of these models, they keep free when you're at least using the API console because I think you're kind of, like, trying it out, making sure that it'll work before you actually make API calls for with it. But, yeah, that is something to note that this probably woulda cost, like, $2 or so to if if they were charging for the API. Get your free queries in while you can't, folks. Yeah. That's, that's the message.

Guest 1

Yeah. Yeah. But and I am excited to see if and when Chat gpt comes out with, bigger context windows and also Claude.

Guest 1

So that way, we can we can do some of these things with our already, like, subscription that we're paying for and necessarily have to pay, like, a dollar per 1000000 tokens or whatever else. Word. Cool. Well, this has been really super neat, and and thank you for,