In this video, I look at the recent release by Microsoft of OmniParser, which is a tool that allows agents to be able to read the screens of various UIs and then be able to produce an output that an LLM can use to interact with those screens.
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: / samwitteveen
Twitter: / sam_witteveen
OmniParser : https://microsoft.github.io/OmniParser/
Colab: https://drp.li/rsuVh
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/langchain-t... (updated)
https://github.com/samwit/llm-tutorials
⏱️Time Stamps: