In this video, we look at the new interactions API from the Gemini API team, and how you can use it to build and do various tasks with not only Gemini models but also agents.
Blog: https://blog.google/technology/develo...
colab: https://dripl.ink/mY4vF
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: / samwitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:09 Google Gemini Interactions API Blog
00:15 Gemini AI Interactions API Docs
00:22 History of LLM APIs
03:40 Core Capabilities of Interactions API's generateContent
04:28 Build with Gemini Deep Research Blog
06:18 Demo
07:57 Stream Responses Back
08:10 Reasoning or Thinking
09:27 Chat Style with State Server Side
11:52 Retrieving Past Interactions
12:46 Multimodal Understanding
13:20 Audio Understanding, Video Understaning and PDF Understanding
13:52 Multimodal Generation: Images, Audio and Video
15:04 Structured Outputs
16:10 Tools and Function Calling
16:51 Built-in Tools: Grounding Google Search
17:18 Built-in Tools: Code Execution
17:45 Built-in Tools: URL Context
18:18 Remote MCP
19:15 Using Gemini Agents