By Naveen Narayanan
In recent years, the field of artificial intelligence (AI) has seen significant advancements, particularly in the area of generative AI and large language models (LLMs). Generative AI involves using machine learning algorithms to produce new content based on patterns learned from large datasets. LLMs are a type of generative AI model that focus on natural language processing (NLP), which involves understanding and generating human language.
The real breakthrough in LLM and Generative AI came in 2018 with the development of the first Large Language Model (LLM), GPT-1 (Generative Pre-trained Transformer 1), by OpenAI. GPT-1 was trained on a massive corpus of text data, including books, articles, and websites and could generate new text almost indistinguishable from the human-written text. Since then, researchers have continued to refine and improve LLMs, and OpenAI has released GPT-3, GPT-3.5 (powering ChatGPT) and GPT-4 which are capable of generating text in multiple languages and styles and performing a wide range of other tasks such as translation, summarization, and question-answering.
One of the applications is the ability to generate actionable video descriptors/tags, such as identifying a scene where “a woman is standing in front of the Eiffel Tower or on a moving train.” This can be particularly useful for content discovery and recommendation, as well as for search engines and other content classification systems.
Another application is in video summarization, where Generative AI can analyze a movie or TV show and automatically generate a summary of its plot. This can be used to provide quick overviews of the content, or to help viewers quickly find the most interesting parts of a longer video.
Scene recognition is another area where Generative AI and LLMs can be used. By analyzing the audio and visual components of a video, AI models can identify specific scenes or segments that feature certain characteristics, such as action scenes or scenes that feature a particular actor. This information can then be used to recommend similar content to viewers based on their preferences.
Content recommendation is another area where Generative AI and LLMs can be particularly useful. By analyzing the viewing history and preferences of individual viewers, AI models can suggest content that is likely to be of interest, based on similarities to other content that the viewer has enjoyed in the past. This can help viewers discover new content that they might not have otherwise found, and can also help media companies to retain viewers and increase engagement.
During my April 17 (3:00 – 3:20 PM PT) presentation at NAB’s Broadcast Engineering and IT Conference, I will demonstrate and explain how our Quickplay team has used LLM to develop an application that enables users to search and discover celebrities, objects, genres, and landmarks. One of our examples ishttps://www.imdb.com/title/tt1229238/a thriller with Tom Cruise, a rich international supporting cast, and readily identifiable landmarks, as well as spy intrigue, prison escapes, chase scenes, the explosions – of course! – and more.
Much like the intricate plot and seemingly impossible tasks of the “Mission Impossible” franchise, the field of Generative AI continues to push boundaries and exceed expectations. With each new breakthrough, we move one step closer to unlocking the full potential of artificial intelligence, and the possibilities are truly limitless.
See you on Monday, April 17 in Room W216-W218 at NAB!