Meeting the Hackathon Requirements: Creative Storyteller
Navigation: README | Architecture & Diagrams | How It Works | Database Schema | Hackathon Requirements | Deployment
Category: Creative Storyteller
Focus: Multimodal Storytelling with Interleaved Output
How "KidStory" Fits the Challenge
KidStory (ai.kidstory.app) is a purpose-built Agentic Storytelling Director that embodies the "Creative Storyteller" category. It doesn't just generate text — it acts as a full-stack creative studio, autonomously weaving together narration, illustration, voice interaction, and interleaved AI output into a single cohesive experience for children aged 3-10.
The Interleaved Output Implementation
The hackathon requires: "an agent that seamlessly weaves together text, images, audio, and video in a single, fluid output stream."
True Interleaved Output: Story Generation with Optimized Quiz Model Selection
The application demonstrates sophisticated model orchestration by using the right model for each task:
1. Interleaved Story Generation (Core Hackathon Requirement)
Unlike traditional apps that generate text then images separately, KidStory generates the entire storybook in a single multimodal stream using Gemini's native interleaved output.
- The Code:
/app/api/generate-story/route.ts - The Model:
gemini-2.5-flash-imagevia Vertex AI - The Flow: Gemini streams the story JSON chunks interleaved with full image data for every page.
- UX Impact: The UI displays each page's illustration the moment it's generated by Gemini, even while the rest of the story's text is still being written. This creates a "magical painting" effect.
2. Optimized Magic Quiz (Model Orchestration)
The Magic Quiz demonstrates intelligent model selection by using different Gemini models for different tasks:
- The Code:
/app/api/live-quiz/route.ts - Question Generation:
gemini-2.5-flash(text-only) for fast, cost-effective quiz generation - Audio Narration:
gemini-2.5-flash-preview-ttsfor spoken questions - Multimodal Feedback: The agent suggests sound effects (e.g.,
[sparkle]) in the text stream, which the UI triggers as audio
Technical Evidence (Story Generation):
const response = await ai.models.generateContent({
model: "gemini-2.5-flash-image", // Interleaved TEXT+IMAGE model via Vertex AI
contents: prompt,
config: {
responseModalities: ["TEXT", "IMAGE"], // Core hackathon requirement
},
});What Happens in Story Generation:
- Gemini generates structured story JSON
- Gemini generates matching illustrations for each page
- Both arrive in the same interleaved stream
- The frontend displays visuals instantly as they arrive
The Full Multimodal Loop:
- Text: Quiz question and options (from text-only model)
- Audio: Spoken narration of the question (Gemini 2.5 Flash TTS)
- Voice Input: Child speaks their answer (Web Speech API)
- Feedback: Encouragement/correction with sound effect suggestions
This creates a fluid, conversational experience where a child hears a question read aloud, speaks their answer, and gets immediate audio feedback with celebratory sound effects.
The Story Generation Stream
When a story is created, our Agentic Orchestrator coordinates specialized Gemini models to produce a multimedia storybook:
- Core Generation (The Creative Director):
gemini-2.5-flash-imagegenerates the story JSON and ALL illustrations in a single interleaved stream via Vertex AI. - Narration (The Voice):
gemini-2.5-flash-preview-ttsgenerates expressive narration for each page in parallel.
The result is a complete storybook where text, images, and audio are woven together page-by-page.
Mandatory Tech Checklist
| Requirement | How We Meet It | Evidence |
|---|---|---|
| Gemini Model | Three specialized Gemini models via Vertex AI | gemini-2.5-flash-image (interleaved story), gemini-2.5-flash-preview-tts (narration), gemini-2.5-flash (quiz) |
| Google GenAI SDK | Vertex AI integration | @google/genai with vertexai: true configuration |
| Interleaved/Mixed Output | Native responseModalities: ["TEXT", "IMAGE"] | /app/api/generate-story/route.ts — single call returns story JSON + page illustrations |
| Hosted on Google Cloud | Google Cloud Run deployment | Standalone Next.js container on Cloud Run |
| Google Cloud Services | Vertex AI + Firebase + GCS | Vertex AI (models), Firestore (database), Firebase Auth, Cloud Storage (media) |
| Public Code Repository | GitHub | Repository with setup instructions in README |
| Architecture Diagram | Mermaid diagrams | See document/all_diagram.md and README |
| Demo Video | Under 4 minutes | Shows story creation, reading, and Magic Quiz interaction |
