Capability families
Text and embeddings
LLM chat/completions, embeddings, streaming responses, prompt caching, tool calling, per-model context, quantization, and model capability discovery.
Image
Text-to-image, image-to-image, inpainting, upscaling, deblur, unpixelate, outfit change, face swap, depth estimation, segmentation, ControlNet-style conditioning, and restoration workflows.
Video
Text-to-video, image-to-video, video-to-video, Ti2V, interpolation, upscaling, subtitles, dubbing, Wav2Lip/SadTalker lip sync, and 3D video processing.
Audio
Kokoro TTS, Whisper STT, MusicGen/AudioGen/AudioLDM2, F5-TTS voice cloning, Seed-VC voice conversion, singing mode, saved voice profiles, and stem separation.
Profiles and consistency
CoderAI uses named profile collections to preserve identity or scene feel across generations.
- Character profiles: reference images for appearance conditioning via IP-Adapter. Up to six profiles can be selected per generation.
- Environment profiles: reference scene/background images for environmental style. Same multi-slot selection model.
- Voice profiles: reference audio plus transcript for reusable voice cloning and conversion workflows.
2D / 3D conversion
- Image → stereo pair, anaglyph, depth map, or mesh.
- 3D model → rendered image from a specified viewpoint.
- Video → frame-by-frame 3D/depth processing.
- 3D model → turntable video.
- Text/image → GLB model with compatible 3D generation models.
Pipelines
Built-in pipelines chain common long workflows, while the custom pipeline builder can chain many step types using variables such as {{input}}, {{stepN.output}}, and {{stepN.url}}.
| Endpoint | Description |
|---|---|
POST /v1/pipelines/image-to-video | Generate an image, animate it, optionally add audio. |
POST /v1/pipelines/video-dub | Transcribe, translate, TTS dub, and optionally burn subtitles. |
POST /v1/pipelines/story | LLM script, images per scene, video, and narration. |
POST /v1/pipelines/audio-dub | Transcribe audio/video, translate, clone voice, replace audio. |
Bundled demo/example tools
The repository also includes three demo/example web applications in tools/. They are not required to use the API, but they show how CoderAI can act as a backend for larger media workflows. The Docker / OCI image exposes all three through nginx on the same published port.
tools/video_editor.py
Browser video editor using CoderAI TTS plus local ffmpeg/ffprobe for timeline editing, generated voiceover, music tracks, speed ramps, uploads, and final rendering.
Docker / OCI route: /editor/
tools/videogen.py
VideoGen Studio manages character/environment profiles and builds multi-clip short movies with video generation, speech/lip-sync, music, and sound effects.
Docker / OCI route: /videogen/
tools/gen_township_fighters.py
Township Fighters is an example app for generating fighter-match videos in an MMA-style flow: characters, environments, fight clips, progress, and output review.
Docker / OCI route: /township/
Model capability indicators
The repository documents capability detection in the model UI and cache scanner. Search results and local model tables can show compact badges such as Text, T2I, I2T, T2V, STT, TTS, embeddings, lip sync, and video dubbing. This matters operationally: users can choose models by capability before downloading or routing work.
AISBF