AnnouncementMay 20, 20264 min read

Poly Goes Multi-Modal: Voice, Vision, and Canvas

PolyChat is no longer just text. Starting today, you can speak to Poly, show it images, and arrange your ideas on an infinite canvas — all within the same workspace. We've added three major capabilities that fundamentally expand what you can do with AI.

Voice brings real-time speech-to-text and text-to-speech to every conversation. Choose from dozens of AI voices, set your preferred language, and have natural back-and-forth conversations with AI agents. Voice calls support live transcription, interruption handling, and context-aware responses. Perfect for brainstorming sessions, language practice, or hands-free productivity.

Vision lets you share images directly in chat. Poly understands what's in the image — text, objects, charts, code — and can analyze, describe, or build on it. Paste a screenshot of a UI bug and ask Poly to write the fix. Show it a whiteboard sketch and get a structured document. Upload a chart and ask for insights.

The Canvas is where everything comes together. It's an infinite spatial workspace where you can place chat streams, documents, images, code editors, and more — arrange them how you think, not how tabs dictate. Draw connections between related items, group by project, zoom out for the big picture. Your AI workspace, finally spatial.

Back to all news