A year ago I set out to build something that probably shouldn't run in a browser: a full music notation editor with multi-instrument playback, real-time staff rendering, and export to PDF, WAV, and MIDI. No plugins, no desktop install, no backend processing. Just a URL.
The result is ScoreInk — a browser-based music notation editor that now supports 26 instruments, from piano and guitar to trumpet, flute, clarinet, and harp. You click notes onto a staff, hear them played back with real instrument samples, and export your finished score in three formats.
Here's what was actually hard about building it, and what I'd do differently.
The Hard Part Nobody Warns You About: Rendering a Music Staff
Music notation looks simple. Five horizontal lines, some dots with stems. How hard can it be?
Extremely hard, it turns out. A music staff isn't a grid. Notes don't sit at fixed positions — they're spaced proportionally based on their rhythmic duration. A whole note takes more horizontal space than a quarter note. Accidentals (sharps, flats) need room to the left of the notehead without colliding with the previous note. Ledger lines appear dynamically when notes fall above or below the staff. Stems flip direction depending on the note's position relative to the middle line.
And that's before you add multiple staves for different instruments, bar lines, time signatures, key signatures, ties, triplets, dotted notes, and rests — each with its own symbol and spacing rules.
I render everything on an HTML Canvas element. The rendering pipeline works in three passes:
- Layout pass — calculates the horizontal position of every note, rest, and bar line based on rhythmic values and the current measure width
- Collision pass — adjusts positions to prevent overlapping accidentals, ties, and triplet brackets
- Draw pass — renders noteheads, stems, beams, flags, bar lines, clefs, and all annotations to the canvas
Each pass handles hundreds of edge cases. Two notes a second apart? Their accidentals might collide. A tie that spans a bar line? It needs to curve across the break. An eighth-note rest between two beamed eighth notes? The beam group has to split. Every one of these cases required custom logic.
Don't try to build a general-purpose music layout engine from scratch. Start with the simplest possible rendering (monospaced note spacing, no beaming) and add complexity one feature at a time. I rewrote the layout engine three times before landing on something maintainable.
Web Audio API: Surprisingly Powerful, Surprisingly Tricky
The Web Audio API is the reason ScoreInk can play back 26 instruments in real time without any server involvement. Every sound plays client-side using pre-recorded instrument samples loaded into AudioBuffer nodes.
The basic architecture is straightforward: decode an audio sample into an AudioBuffer, create an AudioBufferSourceNode for each note, connect it to the destination (speakers), and schedule playback at the correct time using AudioContext.currentTime.
The tricky parts:
- Sample loading strategy. 26 instruments times multiple octaves of samples per instrument equals hundreds of audio files. Loading them all upfront would make the editor take minutes to start. Instead, ScoreInk lazy-loads samples on first play — only fetching the instruments actually present in the current score. Samples are cached in memory after the first load, so subsequent plays are instant.
- Precise timing. Using
setTimeoutfor musical timing is a disaster — JavaScript timers are accurate to about 4ms at best, which is audible as rhythmic wobble. The Web Audio API's built-in scheduler (source.start(time)) uses the audio hardware clock, which is sample-accurate. The trick is scheduling notes in small lookahead windows rather than all at once, so you can still respond to tempo changes mid-playback. - Polyphony limits. Playing a full ensemble score with 8+ instruments, each playing chords, can mean 30+ simultaneous AudioBufferSourceNodes. On underpowered devices (old phones, cheap Chromebooks), this causes audio glitching. I added a voice-stealing system that silently kills the oldest playing note when the polyphony count exceeds a device-specific threshold.
- Autoplay restrictions. Every modern browser blocks AudioContext creation until a user gesture (click or keypress). The first time someone hits Play, the AudioContext has to be created inside that click handler. If you try to create it on page load, it will be in a "suspended" state and produce silence — a confusing bug that took me longer to diagnose than I'd like to admit.
Export: PDF, WAV, and MIDI From Pure JavaScript
Export is where the "no backend" constraint gets really interesting.
PDF Export
PDF export re-renders the entire score onto a hidden canvas at print resolution (300 DPI vs. the screen's 96 DPI), then converts the canvas to an image and embeds it in a PDF using jsPDF. The main challenge is pagination — figuring out where to break pages so measures don't get split mid-bar. The layout engine calculates break points based on cumulative measure widths, then renders each page to a separate canvas before assembling the final PDF.
WAV Export
WAV export uses an OfflineAudioContext — a Web Audio API feature that renders audio at faster-than-real-time into a buffer. The entire score gets scheduled against the offline context, which processes all the samples and mixing in one pass. The resulting buffer gets encoded to WAV format (PCM, 44.1kHz, 16-bit) and offered as a download. A 3-minute score exports in about 2 seconds on a modern laptop.
MIDI Export
MIDI is the simplest export format conceptually — it's just a list of note-on and note-off events with timestamps. But getting the details right matters: each instrument needs the correct MIDI program number (piano = 0, acoustic guitar = 25, trumpet = 56), note velocities should reflect the dynamics in the score, and tempo changes need to be encoded as MIDI tempo events. I use a minimal MIDI writer that builds the binary file format from scratch — about 200 lines of JavaScript to handle the variable-length encoding that MIDI uses for delta times.
The OfflineAudioContext is an underappreciated feature of the Web Audio API. It lets you render hours of audio in seconds, entirely client-side. If you're building any kind of audio export feature in the browser, this is the API you want.
Performance: Making It Feel Instant
A music notation editor has to re-render the staff on every interaction — every note placed, every note moved, every duration changed. If rendering takes more than about 16ms (one frame at 60fps), the editor feels sluggish.
The two biggest performance wins:
- Dirty-region rendering. Instead of re-rendering the entire canvas on every edit, track which measures changed and only repaint those regions. This took rendering time from 40ms (full repaint) to under 5ms for typical edits.
- Pre-computed glyph paths. Music notation symbols (noteheads, rests, clefs, accidentals) are drawn using pre-computed path data rather than runtime font rendering. This avoids the overhead of font loading and text measurement for every symbol on the staff.
The editor now handles scores with 100+ measures across multiple instruments without noticeable lag on a mid-range laptop. Mobile devices are the constraint — phones with less than 4GB of RAM can struggle with large ensemble scores. But for the typical use case (1–4 instruments, 20–60 measures), performance is solid across all devices.
What I'd Do Differently
If I were starting over:
- Use MusicXML as the internal data model from day one. I designed a custom JSON schema for scores that now needs a translation layer for interop with other notation software. Starting with MusicXML (the industry standard) would have made import/export trivial.
- Build the layout engine test-first. Music layout has thousands of edge cases. I added tests after the fact and found dozens of bugs in note spacing, beam grouping, and tie rendering. A test-first approach with visual regression snapshots would have caught these earlier.
- Invest in sample quality earlier. The instrument samples make or break the playback experience. I initially used lower-quality samples to save loading time, but users immediately noticed. Higher-quality samples with smart compression (OGG Vorbis) ended up being about the same file size with dramatically better sound.
Building a music notation editor in the browser was the hardest project I've worked on. The intersection of music theory, audio engineering, real-time rendering, and file format encoding makes it a uniquely multi-disciplinary challenge. But the result — a tool that lets anyone with a browser write, hear, and export sheet music — is worth the complexity.
If you want to try the finished product: open ScoreInk in your browser. Free 3-day trial, no credit card, no install. Twenty-six instruments, real-time playback, PDF/WAV/MIDI export.