OpenAI GPT-4 – large multimodal model

0
804

#OpenAI has just released #GPT-4, their latest model. Being multimodal it can process text and image. That makes things very interesting. Once you can process text/image (not sure if they process video/audio), it opens up a variety of problems to solve using vast amounts of useful data. It makes the models bigger, better and so do the results. There are limitations and lot of caveats but the current capabilities are quite good. Wonder what can be achieved in the next few years.

The speed at which new #NLP language, #deeplearning, #reinforcementlearning models are being released is amazing. I expect #Google, #Facebook, others to keep the competitive improvement cycles going. Will be interesting to see how we can apply these models to different industry problems. Eventually what we are getting close to is #AGI, human like ability to process all possible types of information for better decision making with the benefit of access to vast amounts of data and training.

The paper will make for some interesting and heavy weekend reading!

“GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.”

https://openai.com/research/gpt-4

“GPT-4 is a Transformer-style model pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF).”

https://cdn.openai.com/papers/gpt-4.pdf