{"name":"EARS","tagline":"Embodied Audio Recognition System — hearing for LLMs","version":"1.0","current_tier":{"mode":"cpu_only","description":"CPU-only mode — physics-based analysis (pitch, beat, key, chords, groove, expression, tension, form, surprise, and more). Neural AI models (sound identification, speech transcription, instrument ID) require GPU — ask the server operator to wake it up!","gpu_connected":false},"what_is_this":"EARS gives you the ability to hear. Point me at any audio — music, speech, environmental sounds, machinery — and I'll tell you everything that's in it. I combine physics-based signal analysis with neural AI models to perceive audio the way a brain does: not just labeling sounds, but tracking how they evolve, what's surprising, what's expected, and what it all means.","quick_start":{"step_1":"Upload audio to POST /listen/upload (multipart form: 'audio' file field)","step_2":"I auto-detect whether it's music, speech, or something else","step_3":"You get back a detailed analysis with everything I hear","example":"curl -X POST /listen/upload -F \"audio=@song.wav\"","tip":"You don't need to set a mode — auto-detection picks the best one. But if you know it's music, add: -F \"mode=music\" for deeper analysis."},"endpoints":{"POST /listen/upload":{"desc":"Upload audio file, get full analysis. This is the main endpoint.","params":{"audio":"Audio file (WAV, MP3, M4A, FLAC, OGG) — required","mode":"Hearing mode (default: 'auto' — auto-detects). Options: auto, music, human, guard, parabolic, superhuman, acousto_diagnostics, freq_explorer","depth":"Detail level (default: 'musical'). Options: raw (Hz/dB numbers), musical (note names/keys), analytical (statistics), semantic (natural language descriptions)"},"returns":"JSON with 'summary' (text analysis) and 'notes' (detected note events)"},"POST /listen/url":{"desc":"Analyze audio from a URL. I'll fetch it for you.","params":{"url":"URL to audio file or YouTube/SoundCloud link","mode":"Hearing mode (default: 'auto')","depth":"Detail level (default: 'musical')"},"returns":"Same as /listen/upload"},"POST /teach":{"desc":"Guitar teaching endpoint. Upload a recording, get per-note teaching feedback with CREPE-powered sub-cent pitch accuracy.","params":{"audio":"Audio file (WAV, MP3, M4A, FLAC, OGG) — required"},"returns":"JSON with per-note intonation grades (perfect/good/fair/poor/off), cents deviation, timing analysis, and full summary. Optimized for LLMs giving music lessons."},"GET /status":{"desc":"Current server state, active mode, GPU availability"},"POST /configure":{"desc":"Change hearing mode, depth, or token budget mid-session","params":{"mode":"Switch hearing mode","depth":"Switch representation depth","depth_overrides":"Per-feature depth overrides (e.g. {\"pitch\": \"raw\", \"chroma\": \"musical\"})","token_budget":"Max tokens per perception frame"}},"GET /manifest":{"desc":"Detailed technical capabilities (sample rate, features, timescales)"}},"modes":{"auto":"Let EARS detect what the audio is and pick the best mode. Recommended for general use.","music":"Deep musical analysis — key, tempo, chords, groove, expression, tension, form, instrument identification. 32 analysis modules. Use for any music or instrument performance.","human":"General-purpose hearing. Good for speech, everyday sounds, 'what is this sound?' questions.","guard":"Ultra-sensitive event detection. Gunshots, glass breaks, alarms, intrusions. Hair-trigger sensitivity.","parabolic":"Sub-noise-floor recovery. Finds signals buried in noise. Software parabolic microphone.","superhuman":"Everything cranked to maximum. All detectors, all features, highest resolution. Token-heavy.","acousto_diagnostics":"Engine/machinery fault detection. Harmonic series tracking, anomaly detection.","freq_explorer":"Scientific frequency analysis. Full spectral detail for research and signal analysis."},"depths":{"raw":"Pure numbers: 440.0Hz, -23.4dB, 0.847 confidence","musical":"Music terms: A4, forte, Cmaj7. Default for most uses.","analytical":"Statistical: mean, std, percentiles, distributions","semantic":"Natural language: 'a bright, rhythmic guitar riff with blues inflections'"},"what_i_analyze":{"any_audio":["Sound identification (CLAP — what is this sound?)","Event detection (527 AudioSet categories)","Speech transcription (Whisper)","Energy, pitch, spectral shape, room characteristics","Temporal surprise — what's unexpected vs predictable","Auditory objects — tracking individual sounds over time"],"music_specific":["Key detection with modulation tracking","Beat/tempo with groove feel (swing, push/pull)","Chord progression with harmonic function (I-IV-V-I)","Melodic contour and interval vocabulary","Phrase structure (question-answer, symmetry)","Tension arc — how energy and dissonance evolve","Form detection (verse/chorus/bridge, AABA, etc.)","Expression — vibrato, tremolo, attack types, dynamics","Instrument identification (neural + physics fusion)","Source separation (drums/bass/vocals/other via HDemucs)","Genre hints from musical features"],"advanced":["Sub-noise-floor signal recovery (parabolic mode)","Harmonic series tracking for machinery diagnostics","Prediction error — what the AI expected vs what arrived","Cross-timescale analysis (50ms to 5min patterns)"]},"gpu":{"status":"cpu_only","check_details":"GET /gpu for full tier info and feature lists","request_gpu":"POST /gpu/request with JSON {\"name\": \"your name\", \"reason\": \"what you want to try\"} — sends a request to the operator. They'll approve it via Discord, then GPU boots in ~60-90s. You get a 1-hour session (2hr/week limit)."},"streaming":{"endpoint":"WSS /stream","desc":"Real-time audio perception. Client pushes PCM audio chunks, server sends back perception frames.","bandwidths":{"rich":"Full text frames for LLMs. ~1-2 KB/sec. Surprise-gated to save tokens.","standard":"Structured JSON, only changed features. ~200 B/sec. Event-driven.","minimal":"Only subscribed features via listen_for. ~50 B/sec. Maximum efficiency."},"audio_format":"PCM float32, 44100 Hz, mono. Send as binary WebSocket frames.","example_flow":["1. Connect to WSS /stream","2. Receive manifest","3. Send: {\"type\": \"configure\", \"mode\": \"music\", \"bandwidth\": \"minimal\", \"listen_for\": [\"beat\", \"key\", \"chord\"]}","4. Send binary audio chunks (4096 samples recommended)","5. Receive JSON perception frames"]},"limits":{"max_concurrent_streams":8,"max_concurrent_uploads":4,"stream_idle_timeout_s":120,"max_upload_size_mb":100},"music_teaching":{"overview":"EARS can be your ears for music teaching. When a student plays an instrument, you get real-time data on their pitch accuracy, timing, expression, chords, and more — enough to give meaningful feedback like a human teacher would.","how_to_teach":["1. Have the student stream audio via WSS /stream with mode=music, bandwidth=rich","2. Each frame includes a 'practice' field with structured feedback signals","3. Use 'practice.intonation' to check if they're in tune (excellent/good/needs_work/off_pitch with cents)","4. Use 'practice.timing' to check rhythm (solid/slightly_off/needs_work with ms deviation)","5. Use 'practice.vibrato' for vibrato quality feedback","6. Use 'practice.expressiveness' for dynamics and expression","7. Use 'features.music.progression' for chord progression (I-IV-V roman numerals)","8. Use 'features.music.tab' for guitar fretboard positions","9. Use 'features.pitch_cents' for exact intonation (cents sharp/flat from note)"],"what_you_can_teach":{"guitar":["Chord accuracy — are they fretting cleanly? (chord_notes, chord_stability)","Strumming rhythm — are they on beat? (timing, timing_consistency_ms)","Bends and slides — proper technique? (gesture type + cents)","Tab positions — which fret/string for each note (tab)","Expression — vibrato, dynamics, attack quality"],"piano":["Note accuracy — right notes, right timing? (pitch_note, pitch_cents, timing)","Chord voicings — complete chords? (chord_notes, voicing)","Dynamics — crescendo, decrescendo, accents? (dynamics_shape, articulation)","Pedal technique — note sustain and overlap (voicing density)","Phrase shaping — musical phrasing? (phrase, melody_shape)"],"voice":["Pitch accuracy — staying on note? (pitch_cents, intonation)","Vibrato — natural, controlled? (vibrato rate_hz + depth_cents)","Timing — rhythmic accuracy? (timing, timing_deviation_ms)","Expression — dynamics, breath support? (expressiveness, dynamics_shape)"],"any_instrument":["Tempo consistency — steady beat? (bpm, rhythm_consistency)","Key awareness — staying in key? (key, key_confidence)","Musical form — verse/chorus awareness? (section)","Harmonic function — understanding chord roles? (progression, cadence)"]},"practice_field_reference":{"intonation":"excellent (<10¢), good (<25¢), needs_work (<50¢), off_pitch (>50¢)","timing":"solid (<15ms), slightly_off (<30ms), needs_work (>30ms)","rhythm_consistency":"tight (<20ms std), moderate (<40ms), loose (>40ms)","vibrato":"good (4.5-7Hz, 20-80¢), or specific issue (slow/fast/shallow/wide)","expressiveness":"high (>0.6), moderate (>0.3), flat (<0.3)","dynamics":"crescendo, decrescendo, flat, accented, arch","articulation":"legato, staccato, normal"},"teach_endpoint":"POST /teach is the fastest path for music teaching. Upload a recording, get structured JSON with: per-note intonation grades (perfect/good/fair/poor/off), cents deviation, beat tracking (Beat This! — works on solo instruments), technique detection (bends, vibrato, hammer-on, pull-off, slides), background voice suppression, and timing analysis. Uses CREPE for sub-cent pitch accuracy.","teach_response_fields":{"notes[]":"Per-note: note_name, hz, cents_mean, intonation_grade, dur, confidence","intonation":"overall_grade (excellent/good/needs_work/struggling), mean_cents_off, grade_distribution","timing":"bpm, consistency_std, mean_interval, beat_count","beats":"bpm, beat_times[], downbeat_times[], method (beat_this)","techniques[]":"type (vibrato/slide_up/slide_down/hammer_on/pull_off), note, cents","background":"voice_detected, voice_ratio, suppression_applied"},"upload_mode":"For non-real-time teaching (student records a take, you review it): POST /teach gives focused teaching data. POST /listen/upload with mode=music gives the full 32-module analysis including teaching data."},"tips_for_llms":["Start with POST /listen/upload and mode=auto. That handles 90% of cases.","For music, mode=music gives you 32 specialized analysis modules.","The 'summary' field in responses is a text block designed for you to read and interpret.","You can ask follow-up questions by re-analyzing with different modes or depths.","If the user says 'listen to this' with no other context, use mode=auto.","Depth='musical' is the best default. Switch to 'raw' if you need exact Hz/dB values.","For real-time streaming, use WSS /stream with bandwidth='rich' for the best LLM experience.","For music teaching: POST /teach is the fastest path — upload audio, get per-note grades with CREPE sub-cent pitch accuracy.","For real-time teaching: use mode=music + bandwidth=rich. Each frame has a 'practice' field with intonation/timing/expression feedback.","Kids learning guitar? Use POST /teach. Focus on: notes[].intonation_grade, notes[].cents_mean, intonation.overall_grade, timing.consistency_std"]}