Stepista: Building an AI MIDI Pattern Generator as a VST Plugin
I've been obsessed with one idea for months: what if you could have an AI co-pilot sitting right inside your DAW, generating MIDI patterns that actually make musical sense? Not random notes. Not generic loops. Real, style-aware patterns that understand scales, tension, swing, and rhythm.
So I built it. Meet Stepista.
What Is Stepista?
Stepista is an AI-powered MIDI pattern generator that runs as a VST3/AU plugin. You load it on a MIDI track in Ableton (or Logic, FL Studio, whatever you use), pick a style, set your parameters, hit Gen — and it spits out a MIDI pattern directly into your DAW. No file export needed, no copy-paste, no leaving your workflow.
It generates patterns across 12 electronic music styles: Electro, House, Techno, Minimal, Trap, Lo-Fi, Ambient, Drum & Bass, Breakbeat, Acid, Dub, and IDM. Each style has its own rhythmic DNA — the accent patterns, note densities, swing amounts, and harmonic behaviors that make house sound like house and acid sound like acid.
The Weird Part: A Web UI Inside a VST
Here's where it gets interesting. The entire interface is a Next.js web app running inside the plugin via JUCE's WebBrowserComponent. Yeah — React, Tailwind, TypeScript, all running inside a WebKit view embedded in your DAW.
Why? Because building complex, responsive UIs in C++ is painful. And I wanted an interface that felt modern — more like Revolut or Vercel than a 2005 hardware emulation. Dark, minimal, geometric. Custom knobs, a proper piano roll visualization, smooth Framer Motion animations. All things that are trivial in web tech and nightmarish in native GUI frameworks.
The magic happens through a bridge: JavaScript calls native C++ functions to send patterns, update parameters, and poll the DAW's transport state. The C++ side handles the actual MIDI output via processBlock() — the standard JUCE audio processor callback that every DAW expects.
How the AI Works
Stepista uses a hybrid approach. The base layer is a pure TypeScript music theory engine — scales, chords, intervals, rhythmic templates. It knows that a minor pentatonic over a Techno groove should hit different notes than a Dorian mode over an Ambient pad.
On top of that, there's an optional AI layer powered by Claude via the Vercel AI SDK. When you hit Gen, the AI receives your current context — style, key, mode, density, tension, swing, everything — and generates a musically coherent pattern. The output is structured (typed MIDI note arrays), not free text. So it never hallucinates a note outside your scale or a velocity above 127.
The inspiration comes from a research paper called NEA (New Electronic Assistant) — a computational co-creativity system that used constrained Markov models for symbolic music generation. Stepista takes that idea and replaces the Markov chains with LLMs, while keeping the core philosophy: the machine suggests, the human decides.
The Parameters
This is where you shape the music. Twelve parameters that control every aspect of the generated pattern:
- Key & Mode — Set the harmonic foundation. C Minor, F# Dorian, whatever you need
- Density — How many notes per bar. From sparse ambient textures to dense IDM madness
- Velocity & Accent — Dynamic control. How hard the notes hit and where the emphasis falls
- Swing — From dead straight to deeply shuffled
- Note Length — From 1/32 notes to whole notes
- Octave — Shift the pitch range up or down
- Loop Length — 1 to 16 bars
- Tension — My favorite. Controls harmonic complexity — higher values add dissonance, chromatic notes, extensions. Low values keep it safe and consonant
- Improvise — How much the AI departs from strict patterns. Zero is predictable, 100% is chaos
- Speed — BPM for the preview, though in the plugin it locks to your DAW's tempo
The Playhead Problem
One of the trickiest engineering challenges was synchronizing the piano roll playhead with the DAW. The WebView polls the DAW's transport state every 50ms via native functions, but rendering at 50ms intervals looks terrible — you need 60fps for smooth animation.
The solution is a two-layer system: the polling updates a sync reference without triggering React re-renders, and a separate requestAnimationFrame loop interpolates smoothly from that reference using the current BPM. It sounds simple, but getting there involved a lot of flickering playheads and race conditions.
MIDI Drag to DAW
Once you've generated a pattern you like, you can drag it directly from the plugin into your DAW's arrangement as a MIDI clip. The plugin generates a standard .mid file, triggers an OS-level drag operation via JUCE's performExternalDragDropOfFiles, and your DAW catches it like any other file drop. One gesture, pattern in your arrangement.
Multi-Instance Design
Each Stepista instance is independent. Load three of them on three tracks — one doing a Techno bassline in C Minor, another doing an Ambient pad in F Major, another doing a Breakbeat hi-hat pattern. They don't share state, they don't conflict. This mirrors NEA's approach where multiple instances could create polyphonic arrangements.
Try It
Stepista is free and available as both VST3 and AU. The website has everything you need:
Download the plugin, load it in your DAW, and start generating. If you're a producer who's curious about what AI can do for your workflow — without leaving your DAW, without breaking your creative flow — this is it.
The intersection of AI and music production keeps getting more interesting. First AURA for mastering and stems, now Stepista for MIDI generation. The tools are getting better, and they're getting free. That's the future I want to build.