How good is GPT-4?
Watch the video
GPT-4 may be the most hyped language model we’ve had, with tons of rumors and news even prior to its announcement, and it wasn’t for nothing.
If you want a short tl;dr: GPT-4 is ChatGPT’s big brother.
The OpenAI team has worked on improving little brother since its release in November 2022, thanks to the feedback from millions of users, including you and me.
They argue that the newly published model has better problem-solving abilities and a broader general knowledge, which makes it the best language model to date, and by far.
There’s even more: it’s better than 90% of lawyers on the bar exam! It can even help you with your taxes if you are willing to share all your information with it. Now is the time we can be scared…
If you haven’t tried it yet, I think you should pause this and go play with it right now. You’ll be amazed. The link to try it is in the references below.
But don’t forget to come back! What’s coming is the most interesting part!
Let’s dive into how the OpenAI team could make their ChatGPT model even better at everything. What changes did they make? With what data did they train it? How did they train it? Here are the answers to all those questions based on what we have access to.
First, what do I mean by “having a broader general knowledge”.
This means multiple things. The first meaning is obviously that the model is just more knowledgeable and creative. GPT-4 is better at discussing in a human way and iterating within a discussion with a user. It’s also better at lots of different tasks people tried to use ChatGPT for, like composing music or assimilating someone’s writing style, which makes the model even more powerful, or more dangerous depending on how you see it..!
It can also work with longer text than ever before. GPT-4 can handle more than twenty-five thousand words of text! This means it can easily handle around a hundred-page book or 14 times this current article. Asking it to summarize an article is no longer a challenge. Be ready to see a flood of AI-generated summaries of books, podcasts, and movies in the upcoming days and weeks. You can easily send it your favorite podcast or meeting transcripts if you want easy summaries. It should work like a charm. Try it with this article, and let me know if it got it right!
Get posts like this one directly in your emails!
Now even better is that GPT-4 understands images. Yes, you can upload images, and it can help you generate captions, produce analyses or even classify them. You don’t need to build a specific AI to classify an image anymore. It just replaced my whole master’s thesis work. Something both extremely cool and scary! Yep, just use GPT-4 for everything; this is where we seem to be going! Does this scare you or excite you more? I couldn’t say for myself, it’s definitely a blend, but I’m optimistic about the progress and usage of those models. We’ll also dive deeper into the ethical problems and considerations in an upcoming episode of my podcast if this is a topic you are into! Understanding images is a big reason why GPT-4 is much more capable: it has another way of seeing the world!
And last but not least… GPT-4 has much better reasoning abilities than ChatGPT. We’ve all seen failure cases where ChatGPT’s answers wouldn’t make sense, especially with math or numbers. Well, they made lots of improvements to this new model. Go challenge it, and let me know where it fails! I hope it is indeed much, much better.
GPT-4 is not only more powerful, but it is also safer to use. Something cool is that it is 40% more likely to respond with facts, which is something I personally struggled with as it was hallucinating fake authors and fake facts when trying to summarize my articles or podcast episodes.
But how could they go from an already incredibly powerful model to this mind-blowing fourth version of the GPT language model set?
This is mainly by incorporating even more humans in the training of the new model. Yes, they made it more intelligent by using more humans, more manual work, and more expert human hours. How funny is this?! The best way to improve AI is by using more humans. We can easily doubt how intelligent those models really are, being so dependent on human-curated data and training. Still, it has its upsides since it allows us to have better control of its outputs and capabilities.
By the way, if you are enjoying the article, please consider following the blog. I share weekly videos like this one covering exciting approaches in AI. If you are like me and 5-10 minutes of AI weekly isn’t enough, follow my What’s AI podcast, where we dive deeper into such topics with expert guests. I’m sure you will enjoy these too!
They could also make it better by collaborating with various companies implementing GPT-4 into their product to gather even more feedback and iteratively improve it, like Duolingo for better conversational skills or Khan Academy as a student’s customized tutor. Just this single-use case already makes the model amazing enough to me. Imagine having a personal tutor for any class you are taking. This is an industry game-changer.
All this help from involving more actual people, and more experts and implementing the model in more applications are possible thanks to the training scheme of the model based on reinforcement learning, which is the same used in ChatGPT that you can learn from in my video about it. It is a way of training the model progressively based on feedback we give it, in this case, through human feedback. OpenAI also insists that the most valuable training part is the pre-training they do on pretty much the whole internet, which they then further train and align with what we want to receive using such human feedback. They also highlight the importance of using prompt engineering to receive what you want from the model. Something I believe is an important skill to develop for anyone to better use those powerful models, and it is why we are building learnprompting.org along with my friend Sander and Towards AI. You can see this project as a free Wikipedia for talking with AIs.
Oh, and not to mention that they have access to a supercomputer with Microsoft, which greatly help iterate and train this enormous language model faster. Just a small irrelevant detail.
They also added a cool new feature called System. System can be used to prescribe a style and tasks to achieve to the model instead of having to add it in the text itself, which will be very useful for people building applications using the API. Just tell it to act like a math professor replying to students not too seriously with some puns, and it will do so. It will reenact almost anything you want!
Another cool thing with this new OpenAI publication, even though everything about GPT-4 and ChatGPT is proprietary and thus closed access; don’t ask me where they got their name from, is that they released their evaluation framework, which means open-source approaches will be able to use it to compare themselves to the new OpenAI model’s performances. This can be pretty useful to advance progress, helping compare everyone’s approach in a single broad benchmark.
Of course, GPT-4 is not perfect, but neither are we. What’s cool is that you can ask it to describe its reasoning and understand how it got its answer. It still has many known limitations shared by other language models that they are still working on, like social biases, hallucinations, and adversarial prompts where you “hack” the model when sending your instructions to receive something it is not supposed to do, like asking to give personal information if the model has access to the company’s database. All this to say that you still need to be careful when using language models like GPT-4 and not trust them blindly. Just act as if it was a Wikipedia page; you can probably trust it, but double-check to be sure when it’s an important matter. The main problem is that it looks confident even when creating fake facts; it’s pretty much the perfect liar. OpenAI will keep improving the model over time. So if you use it, either through ChatGPT Plus or their API (which you can apply now to join the waitlist), you are actually contributing to making it better and safer!
Unfortunately, this is pretty much all information we have access to for now, but I invite you to learn more with their technical report and blog post, where you can find tons of graphs and statistics on how good GPT-4 is compared to the previous models. Or just play with it and learn by trial and error, which is lots of fun.
I hope you’ve enjoyed this article. If so, please give it a like and let me know what you think of ChatGPT and this new GPT-4 model in the comments below. I’d love to hear about how you use it! It would also be fantastic if you could share the video with a friend or your favorite group chat to help the channel. Oh, and I will host an episode related to the GPT language models with an expert soon on my podcast to discuss the different challenges, capabilities, ethical concerns, and more. I invite you to check it out and follow it on Spotify, Apple Podcast, or YouTube!
I will see you next week with another amazing AI application!
References
►Try GPT-4 now: https://chat.openai.com/
►API waitlist: https://openai.com/waitlist/gpt-4
►OpenAI blog post: https://openai.com/product/gpt-4
►GPT-4 research: https://openai.com/research/gpt-4
►ChatGPT video: https://youtu.be/AsFgn8vU-tQ
►What is prompting: https://youtu.be/pZsJbYIFCCw
►Learn Prompting: https://learnprompting.org/