Google GenAI Processors: Accelerating AI App Development with an Open-Source Library for Python
Background
Developing AI applications requires the real-time handling of multimodal inputs—such as text, audio, and images—while combining multiple steps like preprocessing, model invocation, and postprocessing. However, implementing these steps individually can lead to complicated asynchronous handling and dependency management, reducing code readability and maintainability.
Core Concept: The Processor
Interface
-
Stream-Based Abstraction
All inputs and outputs are treated as bidirectional streams (ProcessorParts
), with a consistent API that streams data chunks (such as text tokens, audio frames, and image frames) and metadata. -
Unified Pipeline Definition
Input → Preprocessing → Model Invocation → Output Processing are all represented using the sameProcessor
type. This allows multiple steps to be intuitively connected using the+
operator.
Key Features
-
Automatic Asynchronous and Parallel Execution
- Utilizes Python’s
asyncio
to analyze dependencies and execute parts concurrently when possible. - Maintains output stream order while achieving minimal “Time To First Token”.
- High throughput is ensured without requiring user awareness of async mechanics.
- Utilizes Python’s
-
Seamless Integration with Gemini APIs
- Built-in
Processor
s for calling various Google GenAI APIs, including the Gemini Live API. - Enables quick setup for speech recognition/synthesis, video streaming, and conversational agent development.
- Built-in
-
Extensibility
- Separates core functionality into a
core/
directory and community extensions into acontrib/
directory. - Supports user-defined
Processor
implementations and custom combinations of existing components.
- Separates core functionality into a
-
Multimodal Support
Text, audio, image, and arbitrary binary data are all treated as “parts”, enabling mixed processing in a single flow.
Simple Usage Example
from genai_processors.core import audio_io, live_model, video
# Audio input → Real-time speech recognition + video frame processing → Gemini Live
pipeline = audio_io.Input() + live_model.GeminiLive() + video.Output()
async for output_part in pipeline.run(input_stream):
process(output_part)
As shown, simply running a pipeline
composed with the +
operator optimizes parallel execution in the background.
Architecture and Parallel Optimization
- The dependency graph of each
ProcessorPart
is automatically analyzed, and any part is executed in parallel as soon as all its predecessors are complete. - Output order strictly follows input order, while prioritizing the earliest possible token generation.
- Internally, a task scheduler is used, eliminating the need for users to write async code.
Future Outlook and Community
- Current Status: Python only (early stage, approximately v1.0).
- Planned: More diverse
Processor
s, enhanced documentation, and potential multi-language support. - Contributions: Issues and PRs welcome on GitHub.
Repository:https://github.com/google-gemini/genai-processors
- Acknowledgments: This is a collaborative result by engineers and PMs including Juliette Love, KP Sawhney, and Antoine He.