Skip to content

Meeting the Hackathon Requirements: Creative Storyteller

Navigation: README | Architecture & Diagrams | How It Works | Database Schema | Hackathon Requirements | Deployment

Category: Creative Storyteller

Focus: Multimodal Storytelling with Interleaved Output


How "KidStory" Fits the Challenge

KidStory (ai.kidstory.app) is a purpose-built Agentic Storytelling Director that embodies the "Creative Storyteller" category. It doesn't just generate text — it acts as a full-stack creative studio, autonomously weaving together narration, illustration, voice interaction, and interleaved AI output into a single cohesive experience for children aged 3-10.


The Interleaved Output Implementation

The hackathon requires: "an agent that seamlessly weaves together text, images, audio, and video in a single, fluid output stream."

True Interleaved Output: Story Generation with Optimized Quiz Model Selection

The application demonstrates sophisticated model orchestration by using the right model for each task:

1. Interleaved Story Generation (Core Hackathon Requirement)

Unlike traditional apps that generate text then images separately, KidStory generates the entire storybook in a single multimodal stream using Gemini's native interleaved output.

  • The Code: /app/api/generate-story/route.ts
  • The Model: gemini-2.5-flash-image via Vertex AI
  • The Flow: Gemini streams the story JSON chunks interleaved with full image data for every page.
  • UX Impact: The UI displays each page's illustration the moment it's generated by Gemini, even while the rest of the story's text is still being written. This creates a "magical painting" effect.

2. Optimized Magic Quiz (Model Orchestration)

The Magic Quiz demonstrates intelligent model selection by using different Gemini models for different tasks:

  • The Code: /app/api/live-quiz/route.ts
  • Question Generation: gemini-2.5-flash (text-only) for fast, cost-effective quiz generation
  • Audio Narration: gemini-2.5-flash-preview-tts for spoken questions
  • Multimodal Feedback: The agent suggests sound effects (e.g., [sparkle]) in the text stream, which the UI triggers as audio

Technical Evidence (Story Generation):

typescript
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image", // Interleaved TEXT+IMAGE model via Vertex AI
  contents: prompt,
  config: {
    responseModalities: ["TEXT", "IMAGE"], // Core hackathon requirement
  },
});

What Happens in Story Generation:

  1. Gemini generates structured story JSON
  2. Gemini generates matching illustrations for each page
  3. Both arrive in the same interleaved stream
  4. The frontend displays visuals instantly as they arrive

The Full Multimodal Loop:

  • Text: Quiz question and options (from text-only model)
  • Audio: Spoken narration of the question (Gemini 2.5 Flash TTS)
  • Voice Input: Child speaks their answer (Web Speech API)
  • Feedback: Encouragement/correction with sound effect suggestions

This creates a fluid, conversational experience where a child hears a question read aloud, speaks their answer, and gets immediate audio feedback with celebratory sound effects.

The Story Generation Stream

When a story is created, our Agentic Orchestrator coordinates specialized Gemini models to produce a multimedia storybook:

  • Core Generation (The Creative Director): gemini-2.5-flash-image generates the story JSON and ALL illustrations in a single interleaved stream via Vertex AI.
  • Narration (The Voice): gemini-2.5-flash-preview-tts generates expressive narration for each page in parallel.

The result is a complete storybook where text, images, and audio are woven together page-by-page.


Mandatory Tech Checklist

RequirementHow We Meet ItEvidence
Gemini ModelThree specialized Gemini models via Vertex AIgemini-2.5-flash-image (interleaved story), gemini-2.5-flash-preview-tts (narration), gemini-2.5-flash (quiz)
Google GenAI SDKVertex AI integration@google/genai with vertexai: true configuration
Interleaved/Mixed OutputNative responseModalities: ["TEXT", "IMAGE"]/app/api/generate-story/route.ts — single call returns story JSON + page illustrations
Hosted on Google CloudGoogle Cloud Run deploymentStandalone Next.js container on Cloud Run
Google Cloud ServicesVertex AI + Firebase + GCSVertex AI (models), Firestore (database), Firebase Auth, Cloud Storage (media)
Public Code RepositoryGitHubRepository with setup instructions in README
Architecture DiagramMermaid diagramsSee document/all_diagram.md and README
Demo VideoUnder 4 minutesShows story creation, reading, and Magic Quiz interaction

Released under the MIT License.