KidStory: Storybook for Kids

AI-Powered Interactive Storybooks for Children
Speak a story. Watch it come to life.

Overview

KidStory is an AI-powered platform that transforms a child's spoken or typed story idea into a fully illustrated, narrated, and interactive storybook. The app orchestrates multiple AI models to create a cohesive multimedia experience — generating story text, watercolor illustrations, expressive narration, and interactive quizzes.

The application features interleaved AI generation where story content and illustrations are created together in a single stream, providing a magical experience where visuals appear as the story is written.

For Parents

For Kids

Let your child speak their own story idea and watch the AI bring it to life with pictures and narration.

Key Features

Voice & Text Input: Children can speak or type their story ideas
AI Story Generation: Creates complete stories with 4-6 pages based on child's input
Interleaved Illustration Generation: Story text and page illustrations generated together in one AI stream
Character Consistency: Upload character photos to maintain appearance across all illustrations
Multiple Narrator Voices: Choose from 3 AI narrator personalities with different voices
Interactive Storybooks: Page-turning interface with audio narration for each page
Magic Quiz: Post-story interactive quiz with voice input and AI-generated questions
Progress Tracking: Real-time visual feedback during story generation
Child-Safe Content: Built-in guardrails ensure age-appropriate content for children 3-10
User Library: Save and revisit created stories

Technology Stack

Frontend

Framework: Next.js 16.1.6 (App Router)
UI Library: React 19.2.3
Language: TypeScript 5
Styling: Tailwind CSS 4
Animations: Framer Motion 12.35.0
State Management: @tanstack/react-query 5.90.21

Backend & AI

AI Models: Google Gemini via Vertex AI
- gemini-2.5-flash-image (interleaved story + image generation)
- gemini-2.5-flash-preview-tts (text-to-speech narration)
- gemini-2.5-flash (text-only quiz generation)
AI SDKs:
- @google/genai 1.44.0+ (Vertex AI integration with vertexai: true)
Single SDK Architecture: Uses only @google/genai
Database: Firebase Firestore
Authentication: Firebase Auth (Google OAuth)
Storage: Google Cloud Storage (images & audio files)
Cloud Services: Google Cloud Platform

Development Tools

Package Manager: npm
Linting: ESLint 9
Build Tool: Next.js
Containerization: Docker

System Architecture

Click here to view the full diagram in a new tab.

AI Generation Pipeline

1. User Input → Voice/Text Story Idea
2. Story Generation → Gemini 2.5 Flash Image (interleaved text + images)
3. Parallel Processing →
   ├─ Images → Generated via interleaved output
   └─ Audio Narration → Gemini 2.5 Flash TTS
4. Assembly → Combine images + audio + text
5. Storage → Save to Firestore + GCS
6. Delivery → Interactive storybook with quizzes

Key Technical Components

Frontend Hooks:

useStoryGenerator: Manages streaming story generation with real-time updates
useStoryAssembler: Coordinates parallel image/audio generation and assembly
useVoiceInput: Handles speech recognition for story input and quiz answers
useUserStories: Manages Firestore story library operations
useAuth: Firebase authentication wrapper

API Routes:

/api/generate-story: Streams interleaved story JSON + images from Gemini
/api/generate-image: Creates individual page illustrations with character reference support
/api/generate-audio: Generates speech using Gemini TTS with WAV output
/api/live-quiz: Creates interactive quizzes with text-only questions and TTS narration
/api/save-story: Persists completed stories to Firestore
/api/save-quiz-score: Records quiz results

Core Components:

StoryBook: Main story reader with page-turning animations
LiveQuizModal: Interactive quiz with voice input and AI feedback
CreateStoryPage: Story creation wizard with voice/text input
Dashboard: User story library and statistics
StoryPage: Individual story page with image and text

Getting Started

Prerequisites

Node.js 20+
npm or yarn
Google Cloud Project with Gemini API enabled
Firebase Project with Firestore and Authentication
Google Cloud Storage bucket

Installation

Clone and navigate to project
bash
```
cd storybook-for-kids-app
```
Install dependencies
bash
```
npm install
```

Configure Environment Variables Create .env.local in the project root:

env

# Firebase Client Configuration (from Firebase Console)
NEXT_PUBLIC_FIREBASE_API_KEY=your_firebase_api_key
NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN=your_project.firebaseapp.com
NEXT_PUBLIC_FIREBASE_PROJECT_ID=your_project_id
NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET=your_project.appspot.com
NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID=your_sender_id
NEXT_PUBLIC_FIREBASE_APP_ID=your_app_id
NEXT_PUBLIC_FIREBASE_MEASUREMENT_ID=your_measurement_id

# Firebase Admin (Service Account)
FIREBASE_CLIENT_EMAIL=service-account@project.iam.gserviceaccount.com
GOOGLE_CLOUD_PROJECT=your_project_id

# Google Cloud Storage
GCS_BUCKET_NAME=storybook-for-kids-media

# Gemini Models (via Vertex AI)
GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
GEMINI_TTS_MODEL=gemini-2.5-flash-preview-tts
GEMINI_QUIZ_MODEL=gemini-2.5-flash

# App Configuration
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your_secret_key_here

Set up Firebase
- Enable Google Sign-In in Firebase Authentication
- Create Firestore database in native mode
- Set up storage rules for your GCS bucket
Set up Google Cloud
- Enable Vertex AI API in Google Cloud Console
- Create service account with appropriate permissions
- Set up Application Default Credentials for Vertex AI access
Run the development server
bash
```
npm run dev
```
Open http://localhost:3000 in your browser

Development

bash

# Development server
npm run dev

# Build for production
npm run build

# Start production server
npm start

# Lint code
npm run lint

Deployment (NO DOCKER NEEDED!)

The app uses Google Cloud Build for simple, containerless deployment:

bash

# Deploy with ONE command (no Docker needed!)
gcloud builds submit --config=cloudbuild.yaml

What happens:

✅ Your code uploads to Google Cloud
✅ Dependencies installed with npm ci
✅ Next.js app built with npm run build
✅ Deployed directly to Cloud Run
✅ Environment variables set automatically
✅ HTTPS setup automatically

After deployment:

bash

# Get your app URL
gcloud run services describe storybook-for-kids --format='value(status.url)'

# Check logs
gcloud run logs read storybook-for-kids

Project Structure

storybook-for-kids-app/
├── app/                          # Next.js App Router
│   ├── (auth)/                   # Authentication routes
│   │   └── login/                # Login page
│   ├── (dashboard)/              # Dashboard routes
│   │   └── dashboard/            # User dashboard
│   ├── api/                      # API routes
│   │   ├── generate-audio/       # TTS narration
│   │   ├── generate-image/       # Image generation
│   │   ├── generate-story/       # Story generation (interleaved)
│   │   ├── live-quiz/            # Interactive quiz
│   │   ├── save-quiz-score/      # Quiz results
│   │   └── save-story/           # Story persistence
│   ├── create/                   # Story creation page
│   ├── story/[id]/               # Story viewer
│   ├── layout.tsx                # Root layout
│   ├── page.tsx                  # Home page
│   └── globals.css               # Global styles
├── components/                   # React components
│   ├── auth/                     # Authentication components
│   ├── providers/                # Context providers
│   ├── story/                    # Story-related components
│   │   ├── LiveQuizModal.tsx     # Interactive quiz
│   │   ├── MicButton.tsx         # Voice input button
│   │   ├── PageTurn.tsx          # Page turning animation
│   │   ├── StoryBook.tsx         # Main story viewer
│   │   ├── StoryCard.tsx         # Story library card
│   │   └── StoryPage.tsx         # Individual story page
│   └── ui/                       # UI components
│       ├── AudioPlayer.tsx       # Audio playback
│       ├── LoadingSparkles.tsx   # Loading animation
│       └── Toast.tsx             # Notification system
├── lib/                          # Utility libraries
│   ├── firebase/                 # Firebase configuration
│   │   ├── admin.ts              # Server-side Firebase
│   │   └── client.ts             # Client-side Firebase
│   ├── gcp/                      # Google Cloud services
│   │   ├── gemini.ts             # Gemini model configuration
│   │   ├── imagen.ts             # Image generation
│   │   ├── storage.ts            # Cloud Storage operations
│   │   └── tts.ts                # Text-to-speech synthesis
│   ├── hooks/                    # Custom React hooks
│   │   ├── useAuth.ts            # Authentication hook
│   │   ├── useStoryAssembler.ts  # Story assembly coordination
│   │   ├── useStoryGenerator.ts  # Story generation state
│   │   ├── useUserStories.ts     # Story library management
│   │   └── useVoiceInput.ts      # Speech recognition
│   └── utils/                    # Utility functions
│       ├── imageCompressor.ts    # Image compression
│       └── utils.ts              # General utilities
├── public/                       # Static assets
│   ├── audio/                    # Audio samples
│   ├── sounds/                   # Sound effects
│   └── animation/                # Lottie animations
├── types/                        # TypeScript definitions
│   ├── index.ts                  # Main type definitions
│   └── speech.d.ts               # Speech recognition types
├── document/                     # Documentation
├── middleware.ts                 # Next.js middleware
├── next.config.mjs               # Next.js configuration
├── package.json                  # Dependencies
├── Dockerfile                    # Container configuration
└── cloudbuild.yaml              # CI/CD configuration

Documentation

For judges and reviewers, we provide comprehensive documentation covering every aspect of the project:

Document	Description
How This App Works	Agentic workflow, model orchestration, and step-by-step explanation of story creation and quiz flows
System Architecture & Diagrams	Complete architecture diagram, sequence diagrams, state machines, and data flow visualizations
Database Schema	Firestore data models, collection structure, and Cloud Storage layout
Hackathon Requirements	How the project meets every Gemini Live Agent Challenge requirement with technical evidence
Deployment Guide	Step-by-step Google Cloud Run deployment instructions, IAM setup, and troubleshooting

Usage Guide

Creating a Story

Sign in with Google account
Click "New Story" from dashboard
Choose input method: Voice or text
Speak or type your story idea
Optional: Upload character photos and name them
Select narrator voice (Luna, Stella, or Kiko)
Choose page count (4, 5, or 6 pages)
Click "Create My Story"
Watch the magic happen as AI generates your story

Reading a Story

Open story from your library
Use arrow keys or click arrows to turn pages
Listen to narration with audio player
Complete the quiz after finishing the story
Save your score and share your achievement

Quiz Features

Voice input for answering questions
AI-generated illustrations for each question
Sound effects based on AI suggestions
Score tracking and saving
Celebratory animations for correct answers

Technical Implementation Details

Interleaved Generation Flow

Interleaved Generation

The app uses Gemini's native interleaved output capability for story generation via Vertex AI:

typescript

// Story generation with interleaved output via Vertex AI
const result = await ai.models.generateContentStream({
  model: "gemini-2.5-flash-image",
  contents: [{ role: "user", parts }],
  config: {
    responseModalities: ["TEXT", "IMAGE"], // Single call, mixed output
  },
});

Character Consistency

When users upload character photos, the app:

Compresses images to base64
Sends them as reference images to Gemini
Ensures consistent character appearance across all illustrations
Uses character names in image prompts for accurate representation

Real-time Progress

Streaming updates during story generation
Parallel processing of images and audio
Visual progress indicators for each page
Error handling with user-friendly messages

Performance Optimizations

Image compression before upload
Parallel API calls for faster generation
Cached signed URLs for media files
Lazy loading of story components
Optimized animations with Framer Motion

Environment Variables Reference

Variable	Description	Required
`NEXT_PUBLIC_FIREBASE_API_KEY`	Firebase web API key	Yes
`NEXT_PUBLIC_FIREBASE_AUTH_DOMAIN`	Firebase auth domain	Yes
`NEXT_PUBLIC_FIREBASE_PROJECT_ID`	Firebase project ID	Yes
`NEXT_PUBLIC_FIREBASE_STORAGE_BUCKET`	Firebase storage bucket	Yes
`NEXT_PUBLIC_FIREBASE_MESSAGING_SENDER_ID`	Firebase messaging sender ID	Yes
`NEXT_PUBLIC_FIREBASE_APP_ID`	Firebase app ID	Yes
`FIREBASE_CLIENT_EMAIL`	Service account email	Yes
`GOOGLE_CLOUD_PROJECT`	GCP project ID	Yes
`GCS_BUCKET_NAME`	Cloud Storage bucket name	No (default: storybook-for-kids-media)
`GEMINI_IMAGE_MODEL`	Interleaved story model	No (default: gemini-2.5-flash-image)
`GEMINI_TTS_MODEL`	Text-to-speech model	No (default: gemini-2.5-flash-preview-tts)
`GEMINI_QUIZ_MODEL`	Quiz generation model	No (default: gemini-2.5-flash)
`NEXTAUTH_URL`	NextAuth URL	Yes
`NEXTAUTH_SECRET`	NextAuth secret	Yes

Troubleshooting

Common Issues

Firebase Authentication Fails
- Check Firebase project configuration
- Verify Google Sign-In is enabled
- Ensure correct API keys in environment variables
Vertex AI API Errors
- Verify Vertex AI API is enabled in Google Cloud
- Check service account permissions
- Ensure billing is enabled for the project
- Verify Application Default Credentials are set up
Image Generation Failures
- Check character photo format (JPEG/PNG)
- Verify image size (compressed to base64)
- Ensure Gemini has image generation permissions
Audio Playback Issues
- Check browser audio permissions
- Verify TTS API is enabled
- Test with different narrator voices

Development Tips

Use Chrome for best voice recognition support
Monitor browser console for API errors
Check Firebase console for authentication logs
Use .env.example as template for environment variables

Made with ❤️ for children everywhere

KidStory: Storybook for Kids ​

Overview ​

For Parents ​

For Kids ​

Key Features ​

Technology Stack ​

Frontend ​

Backend & AI ​

Development Tools ​

System Architecture ​

AI Generation Pipeline ​

Key Technical Components ​

Getting Started ​

Prerequisites ​

Installation ​

Development ​

Deployment (NO DOCKER NEEDED!) ​

What happens: ​

After deployment: ​

Project Structure ​

Documentation ​

Usage Guide ​

Creating a Story ​

Reading a Story ​

Quiz Features ​

Technical Implementation Details ​

Interleaved Generation Flow ​

Interleaved Generation ​

Character Consistency ​

Real-time Progress ​

Performance Optimizations ​

Environment Variables Reference ​

Troubleshooting ​

Common Issues ​

Development Tips ​

KidStory: Storybook for Kids

Overview

For Parents

For Kids

Key Features

Technology Stack

Frontend

Backend & AI

Development Tools

System Architecture

AI Generation Pipeline

Key Technical Components

Getting Started

Prerequisites

Installation

Development

Deployment (NO DOCKER NEEDED!)

What happens:

After deployment:

Project Structure

Documentation

Usage Guide

Creating a Story

Reading a Story

Quiz Features

Technical Implementation Details

Interleaved Generation Flow

Interleaved Generation

Character Consistency

Real-time Progress

Performance Optimizations

Environment Variables Reference

Troubleshooting

Common Issues

Development Tips