Table of Contents

Google GenAI Processors: Accelerating AI App Development with an Open-Source Library for Python

Background

Developing AI applications requires the real-time handling of multimodal inputs—such as text, audio, and images—while combining multiple steps like preprocessing, model invocation, and postprocessing. However, implementing these steps individually can lead to complicated asynchronous handling and dependency management, reducing code readability and maintainability.

Core Concept: The `Processor` Interface

Stream-Based Abstraction
All inputs and outputs are treated as bidirectional streams (ProcessorParts), with a consistent API that streams data chunks (such as text tokens, audio frames, and image frames) and metadata.
Unified Pipeline Definition
Input → Preprocessing → Model Invocation → Output Processing are all represented using the same Processor type. This allows multiple steps to be intuitively connected using the + operator.

Key Features

Automatic Asynchronous and Parallel Execution
- Utilizes Python’s asyncio to analyze dependencies and execute parts concurrently when possible.
- Maintains output stream order while achieving minimal “Time To First Token”.
- High throughput is ensured without requiring user awareness of async mechanics.
Seamless Integration with Gemini APIs
- Built-in Processors for calling various Google GenAI APIs, including the Gemini Live API.
- Enables quick setup for speech recognition/synthesis, video streaming, and conversational agent development.
Extensibility
- Separates core functionality into a core/ directory and community extensions into a contrib/ directory.
- Supports user-defined Processor implementations and custom combinations of existing components.
Multimodal Support
Text, audio, image, and arbitrary binary data are all treated as “parts”, enabling mixed processing in a single flow.

Simple Usage Example

from genai_processors.core import audio_io, live_model, video

# Audio input → Real-time speech recognition + video frame processing → Gemini Live
pipeline = audio_io.Input() + live_model.GeminiLive() + video.Output()

async for output_part in pipeline.run(input_stream):
    process(output_part)

As shown, simply running a pipeline composed with the + operator optimizes parallel execution in the background.

Architecture and Parallel Optimization

The dependency graph of each ProcessorPart is automatically analyzed, and any part is executed in parallel as soon as all its predecessors are complete.
Output order strictly follows input order, while prioritizing the earliest possible token generation.
Internally, a task scheduler is used, eliminating the need for users to write async code.

Future Outlook and Community

Current Status: Python only (early stage, approximately v1.0).
Planned: More diverse Processors, enhanced documentation, and potential multi-language support.
Contributions: Issues and PRs welcome on GitHub.
Repository:
```
https://github.com/google-gemini/genai-processors  
```
Acknowledgments: This is a collaborative result by engineers and PMs including Juliette Love, KP Sawhney, and Antoine He.

Google GenAI Processors: Accelerating AI App Development with an Open-Source Library for Python

Google GenAI Processors: Accelerating AI App Development with an Open-Source Library for Python

Background

Core Concept: The `Processor` Interface

Key Features

Simple Usage Example

Architecture and Parallel Optimization

Future Outlook and Community

Related Links

By greeden