Meeting the Hackathon Requirements: Creative Storyteller

Navigation: README | Architecture & Diagrams | How It Works | Database Schema | Hackathon Requirements | Deployment

Category: Creative Storyteller

Focus: Multimodal Storytelling with Interleaved Output

How "KidStory" Fits the Challenge

KidStory (ai.kidstory.app) is a purpose-built Agentic Storytelling Director that embodies the "Creative Storyteller" category. It doesn't just generate text — it acts as a full-stack creative studio, autonomously weaving together narration, illustration, voice interaction, and interleaved AI output into a single cohesive experience for children aged 3-10.

The Interleaved Output Implementation

The hackathon requires: "an agent that seamlessly weaves together text, images, audio, and video in a single, fluid output stream."

True Interleaved Output: Story Generation with Optimized Quiz Model Selection

The application demonstrates sophisticated model orchestration by using the right model for each task:

1. Interleaved Story Generation (Core Hackathon Requirement)

Unlike traditional apps that generate text then images separately, KidStory generates the entire storybook in a single multimodal stream using Gemini's native interleaved output.

The Code: /app/api/generate-story/route.ts
The Model: gemini-2.5-flash-image via Vertex AI
The Flow: Gemini streams the story JSON chunks interleaved with full image data for every page.
UX Impact: The UI displays each page's illustration the moment it's generated by Gemini, even while the rest of the story's text is still being written. This creates a "magical painting" effect.

2. Optimized Magic Quiz (Model Orchestration)

The Magic Quiz demonstrates intelligent model selection by using different Gemini models for different tasks:

The Code: /app/api/live-quiz/route.ts
Question Generation: gemini-2.5-flash (text-only) for fast, cost-effective quiz generation
Audio Narration: gemini-2.5-flash-preview-tts for spoken questions
Multimodal Feedback: The agent suggests sound effects (e.g., [sparkle]) in the text stream, which the UI triggers as audio

Technical Evidence (Story Generation):

typescript

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-image", // Interleaved TEXT+IMAGE model via Vertex AI
  contents: prompt,
  config: {
    responseModalities: ["TEXT", "IMAGE"], // Core hackathon requirement
  },
});

What Happens in Story Generation:

Gemini generates structured story JSON
Gemini generates matching illustrations for each page
Both arrive in the same interleaved stream
The frontend displays visuals instantly as they arrive

The Full Multimodal Loop:

Text: Quiz question and options (from text-only model)
Audio: Spoken narration of the question (Gemini 2.5 Flash TTS)
Voice Input: Child speaks their answer (Web Speech API)
Feedback: Encouragement/correction with sound effect suggestions

This creates a fluid, conversational experience where a child hears a question read aloud, speaks their answer, and gets immediate audio feedback with celebratory sound effects.

The Story Generation Stream

When a story is created, our Agentic Orchestrator coordinates specialized Gemini models to produce a multimedia storybook:

Core Generation (The Creative Director): gemini-2.5-flash-image generates the story JSON and ALL illustrations in a single interleaved stream via Vertex AI.
Narration (The Voice): gemini-2.5-flash-preview-tts generates expressive narration for each page in parallel.

The result is a complete storybook where text, images, and audio are woven together page-by-page.

Mandatory Tech Checklist

Requirement	How We Meet It	Evidence
Gemini Model	Three specialized Gemini models via Vertex AI	`gemini-2.5-flash-image` (interleaved story), `gemini-2.5-flash-preview-tts` (narration), `gemini-2.5-flash` (quiz)
Google GenAI SDK	Vertex AI integration	`@google/genai` with `vertexai: true` configuration
Interleaved/Mixed Output	Native `responseModalities: ["TEXT", "IMAGE"]`	`/app/api/generate-story/route.ts` — single call returns story JSON + page illustrations
Hosted on Google Cloud	Google Cloud Run deployment	Standalone Next.js container on Cloud Run
Google Cloud Services	Vertex AI + Firebase + GCS	Vertex AI (models), Firestore (database), Firebase Auth, Cloud Storage (media)
Public Code Repository	GitHub	Repository with setup instructions in README
Architecture Diagram	Mermaid diagrams	See `document/all_diagram.md` and README
Demo Video	Under 4 minutes	Shows story creation, reading, and Magic Quiz interaction

Meeting the Hackathon Requirements: Creative Storyteller ​

Category: Creative Storyteller ​

How "KidStory" Fits the Challenge ​

The Interleaved Output Implementation ​

True Interleaved Output: Story Generation with Optimized Quiz Model Selection ​

1. Interleaved Story Generation (Core Hackathon Requirement) ​

2. Optimized Magic Quiz (Model Orchestration) ​

Technical Evidence (Story Generation): ​

What Happens in Story Generation: ​

The Full Multimodal Loop: ​

The Story Generation Stream ​

Mandatory Tech Checklist ​