by ahmedeltaher
Voice AI SDK is a reusable Android library that gives any app a full voice-driven AI conversation pipeline in minutes. Voice Assistant + Android Voide AI + SDK + MVVM + Kotlin
# Add to your Claude Code skills
git clone https://github.com/ahmedeltaher/Android-MVVM-Architecture-Android-Voice-AI-SDKGuides for using ai agents skills like Android-MVVM-Architecture-Android-Voice-AI-SDK.
Last scanned: 6/2/2026
{
"issues": [],
"status": "PASSED",
"scannedAt": "2026-06-02T08:37:41.751Z",
"npmAuditRan": true,
"pipAuditRan": true
}No comments yet. Be the first to share your thoughts!

flowchart LR
Microphone --> AudioRecord --> VAD --> STT --> ClaudeAI["Claude AI"] --> TTS --> Speaker
The Android Voice AI SDK is a reusable Android library that gives any app a full voice-driven AI conversation pipeline in minutes. It captures audio from the device microphone, transcribes speech to text, sends the transcript to Anthropic Claude for an intelligent response, and speaks the reply back to the user through text-to-speech — all wired together with a single VoiceAISDK.Builder call. The SDK ships ready-to-drop-in Jetpack Compose UI components, swappable STT/TTS engine adapters, on-device emotion detection, and security utilities including PII redaction and encrypted key storage.
| Layer | Capability | |-------|-----------| | Audio Input | Voice Activity Detection (VAD), noise handling, streaming PCM capture | | Recognition | Speech-to-Text (STT), language detection, speaker diarization | | Understanding | Intent extraction, entity recognition, conversation context | | Action | API orchestration, workflow execution, task automation | | Response | LLM answer generation (Anthropic Claude) | | Voice Output | Text-to-Speech (TTS), voice style selection, audio streaming | | Safety | User consent, authentication, abuse prevention | | Analytics | Conversation logs, session summaries, quality metrics |
| Requirement | Version | |---|---| | Android Studio | Meerkat or newer | | Minimum SDK | 24 (Android 7.0) | | Kotlin | 2.0+ (project uses 2.3.21) | | Anthropic API key | Required — obtain at console.anthropic.com |
In your app build.gradle.kts:
dependencies {
implementation("com.sdk:voice-ai-sdk:1.0.0")
}
In app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
Annotate your Application class with @HiltAndroidApp and your Activity with @AndroidEntryPoint:
@HiltAndroidApp
class MyApp : Application()
@AndroidEntryPoint
class MainActivity : ComponentActivity() { ... }
local.properties is git-ignored, so your key never ends up in source control:
ANTHROPIC_API_KEY=sk-ant-...
Then expose it via BuildConfig in app/build.gradle.kts:
defaultConfig {
buildConfigField(
"String",
"ANTHROPIC_API_KEY",
"\"${project.findProperty("ANTHROPIC_API_KEY") ?: ""}\"",
)
}
buildFeatures {
buildConfig = true
}
Provide the SDK through Hilt by creating an AppModule:
@Module
@InstallIn(SingletonComponent::class)
object AppModule {
@Provides
@Singleton
fun provideVoiceAIConfig(): VoiceAIConfig =
VoiceAIConfig(anthropicApiKey = BuildConfig.ANTHROPIC_API_KEY)
@Provides
@Singleton
fun provideVoiceAISDK(
@ApplicationContext context: Context,
config: VoiceAIConfig,
): VoiceAISDK = VoiceAISDK.Builder(context)
.anthropicApiKey(config.anthropicApiKey)
.debugLogging(BuildConfig.DEBUG)
.build()
}
Or construct the SDK directly without Hilt:
val sdk = VoiceAISDK.Builder(context)
.anthropicApiKey(BuildConfig.ANTHROPIC_API_KEY)
.debugLogging(true)
.config { copy(systemPrompt = "You are a concise voice assistant.") }
.build()
val session: VoiceAISession = sdk.createSession()
session.start()
Use VoiceSessionPermissionGate to handle the RECORD_AUDIO runtime permission automatically, then place VoiceButton and ConversationView inside:
@Composable
fun VoiceScreen(viewModel: VoiceViewModel = hiltViewModel()) {
VoiceSessionPermissionGate(
rationale = "Microphone access is required for voice conversations.",
) {
Column(
modifier = Modifier
.fillMaxSize()
.padding(16.dp),
verticalArrangement = Arrangement.SpaceBetween,
) {
ConversationView(
messages = viewModel.messages.collectAsStateWithLifecycle().value,
modifier = Modifier.weight(1f),
)
VoiceButton(
session = viewModel.session,
modifier = Modifier.align(Alignment.CenterHorizontally),
)
}
}
}
The SDK is organised into six layers, each with a single responsibility:
| Layer | Package | Responsibility |
|---|---|---|
| Audio | audio/ | Raw PCM capture via AudioRecord, voice activity detection (VAD), audio level metering, and PCM-to-WAV conversion |
| STT | stt/ | SpeechToTextEngine interface with a drop-in Android built-in implementation; plug in Whisper or any other engine |
| AI | ai/ | AIEngine interface backed by ClaudeAIEngine, which wraps the official Anthropic Java SDK and maintains conversation history |
| TTS | tts/ | TextToSpeechEngine interface with a drop-in Android built-in implementation; plug in ElevenLabs for premium voices |
| Session | VoiceAISession | Orchestrates the full pipeline — audio in, transcript out, AI reply, speech out — as a single coroutine-based lifecycle |
| UI | ui/ | Ready-to-use Jetpack Compose components: VoiceButton, ConversationView, VoiceSessionPermissionGate, WaveformVisualizer, LiveCaptionBanner, VoiceStatusIndicator |
| Category | Engine | Class | Notes |
|---|---|---|---|
| STT | Android built-in | AndroidSttEngine | Default; free; uses android.speech.SpeechRecognizer; requires network |
| STT | OpenAI Whisper | WhisperSttEngine | Higher accuracy; POSTs PCM/WAV to OpenAI REST API; requires OpenAI key |
| AI | Anthropic Claude | ClaudeAIEngine | Default and only AI engine; uses com.anthropic:anthropic-java; model is configurable |
| TTS | Android built-in | AndroidTtsEngine | Default; free; uses android.speech.tts.TextToSpeech |
| TTS | ElevenLabs | ElevenLabsTtsEngine | High-quality natural voices; POSTs to ElevenLabs REST API; requires ElevenLabs key |
| Emotion | On-device | built-in | Lightweight on-device audio feature analysis; no external key required |
| Emotion | Hume AI | HumeEmotionDetector | Cloud-based; high accuracy across 7 emotions; requires Hume API key |
All options are fields on VoiceAIConfig. Pass a config { } block to VoiceAISDK.Builder to override defaults.
| Field | Type | Default | Description |
|---|---|---|---|
| anthropicApiKey | String | — | Required. Your Anthropic API key. Never hardcode; read from BuildConfig or encrypted storage. |
| aiModel | String | "claude-3-5-sonnet-20241022" | Claude model ID used for all AI turns. |
| systemPrompt | String? | "You are a helpful voice assistant…" | System instruction prepended to every conversation. |
| inputMode | InputMode | HANDS_FREE | HANDS_FREE activates VAD; PUSH_TO_TALK records only while button is held. |
| locale | Locale | Locale.getDefault() | Locale passed to the STT engine for language hints. |
| silenceTimeoutMs | Long | 1200 | Milliseconds of silence after speech before the STT turn is finalised. |
| maxHistoryTurns | Int | 20 | Maximum number of conversation turns kept in the Claude context window. |
| piiRedaction | Boolean | false | When true, strips phone numbers, emails, and credit-card numbers from transcripts before sending to the AI. |
| `emotionDete