First off, apologies for the long radio silence. It’s been far too long since I’ve made any updates! But I just had to share a recent that I’m currently probably the most excited about.
[ Scroll down for TL;DR ]
To begin, a little background. For the past several months, I’ve been captioning an intensive web design class at General Assembly, a coding academy in New York City. Our class in particular utilized multiple forms of accommodation with four students using realtime captioning or sign interpreters depending on the context, as well as one student who used screen sharing to magnify and better view content projected in the classroom. A big shout-out to GA for stepping up and making a11y a priority, no questions asked.
On the realtime captioning front, it initially proved to be somewhat of a logistical challenge mostly because the system my colleague and I were using to deliver captions is commercial deposition software, designed primarily with judicial reporting in mind. But the system proved to be less than ideal for this specific context for a number of reasons.
- Commercial realtime viewing software either fails to address the needs of deaf and hard of hearing individuals who rely on realtime transcriptions for communication access or addresses them half-assedly as an afterthought.
The UI is still clunky and appears antiquated. Limited in its ability to change font sizes, line spacing, and colors, it makes it unnecessarily difficult to access options that are frequently crucial when working with populations with diverse sensory needs. Options are sequestered behind tiny menus and buttons that are hard to hit on a tablet. One of the most glaring issues was the inability to scroll with a flick of the finger. Having not been updated since the widespread popularization of touch interfaces, there is practically no optimization for this format. To scroll, the user must drag the tiny scroll bar on the far edge of the screen with high precision. Menus and buttons clutter the screen and take up too much space and everything just looks ugly.
- Though the software supports sending text via the Internet or locally via Wi-Fi, most institutional Wi-Fi is either not consistently strong enough, or too encumbered by security restrictions to send captions reliably to external devices.
Essentially, unless the captioner brought his or her own portable router, text would be unacceptably slow or the connection would drop. Additionally, unless the captioner either has access to an available ethernet port into which to plug said router or has a hotspot with a cellular subscription, this could mean the captioner is without Internet during the entire job.
Connection drops are handled ungracefully. Say I were to briefly switch to the room Wi-Fi to google a term or check my email. When I switch back to my router and start writing, only about half of the four tablets usually survive and continue receiving text. The connection is very fragile so you had to pretty much set it and leave it alone.
- Makers of both steno translation software and realtime viewing software alike still bake in lag time between when a stroke is hit by the stenographer and when it shows up as translated text.
A topic on which Mirabai has weighed in extensively — most modern commercial steno software runs on time-based translation (i.e. translated text is sent out only after a timer of several milliseconds to a second runs out). This is less than ideal for a deaf or hard of hearing person relying on captions as it creates a somewhat awkward delay, which, to a hearing person merely watching a transcript for confirmation as in a legal setting would not notice.
- Subscription-based captioning solutions that send realtime over the Internet are ugly and add an additional layer of lag built-in due to their reliance on Ajax requests that ping for new text on a timed interval basis.
Rather than utilizing truly realtime technologies which push out changes to all clients immediately as the server receives them, most subscription based captioning services rely on the outdated tactic of burdening the client machine with having to repeatedly ping the server to check for changes in the realtime transcript as it is written. Though not obviously detrimental to performance, the obstinate culture of “not fixing what ain’t broke” continues to prevail in stenographic technology. Additionally, the commercial equivalents to Aloft are cluttered with too many on-screen options without a way to hide the controls and, again, everything just looks clunky and outdated.
- Proprietary captioning solutions do not allow for collaborative captioning.
At Kickstarter, I started providing transcriptions of company meetings using a combo of Plover and Google Docs. It was especially helpful to have subject matter experts (other employees) be able to correct things in the transcript and add speaker identifications for people I hadn’t met yet. More crucially, I felt overall higher sense of integration with the company and the people working there as more and more people got involved, lending a hand in the effort. But Google Docs is not designed for captioning. At times, it would freeze up and disable editing when text came in too quickly. Also, the user would have to constantly scroll manually to see the most recently-added text.
With all these frustrations in mind, and with the guidance of the GA instructors in the class, I set off on building my own solution to these problems with the realtime captioning use case specifically in mind. I wanted a platform that would display text instantaneously with virtually no lag, was OS and device agnostic, touch interface compatible, extremely stable in that it had a rock solid connection that could handle disconnects and drops on both the client and provider side without dying a miserable death, and last but not least, delivered everything on a clean, intuitive interface to boot.
How I Went About It
While expressing my frustrations during the class, one of the instructors informed me of the fairly straightforward nature of solving the problem, using the technologies the class had been covering all semester. After that discussion, I was determined to set out to create my own realtime text delivery solution, hoping some of what I had been transcribing every day for two months had actually stuck.
Well, it turned out not much did. I spent almost a full week learning the basics the rest of the class had covered weeks ago, re-watching videos I had captioned, putting together little bits of code before trying to pull them together into something larger. I originally tried to use a library called CodeMirror, which is essentially a web based collaborative editing program, using Socket.io as its realtime interface. It worked at first but I found it incapable of handling the volume and speed of text produced by a stenographer, constantly crashing when I transcribed faster than 200 WPM. I later discovered another potential problem — that Socket.io doesn’t guarantee order of delivery. In other words, if certain data were received out of order, the discrepancy between what was sent to the server versus what a client received would cause the program to freak out. There was no logic that would prioritize and handle concurrent changes.
I showed Mirabai my initial working prototype and she was instantly all for it. After much fuss around naming, Mirabai and I settled on Aloft as it’s easy to spell, easy to pronounce, and maintains the characteristic avian theme of Open Steno Project.
I decided to build Aloft for web so that any device with a modern browser could run it without complications. The core server is written in Node. I used Express and EJS to handle routing and layouts, and JQuery with JavaScript for handling the dynamic front-end bits.
I incorporated the ShareJS library to handle realtime communication, using browserchannel as its WebSockets-like interface. Additionally, I wrapped ShareJS with Primus for more robust handling of disconnects and dissemination of updated content if/when a dropped machine comes back online. Transcript data is stored in a Mongo database via a wrapper, livedb-mongo, which allows ShareJS to easily store the documents as JSON objects into Mongo collections. On the front end, I used Bootstrap as the primary framework with the Flat UI theme. Aloft is currently deployed on DigitalOcean.
Current Features
- Fast AF™ text delivery that is as close to realtime as possible given your connection speed.
- OS agnostic! Runs on Mac, Windows, Android, iOS, Linux, anything you can run a modern Internet browser on.
- User login which allows captioner to create new events as well as visualize all past events by author and event title.
- Captioner can delete, modify, reopen previous sessions, and view raw text with a click of a link or button.
- Option to make a session collaborative or not if you want to let your viewers directly edit your transcription.
- Ability for the viewer to easily change font face, font size, invert colors, increase line spacing, save the transcription as .txt, and hide menu options.
- Easy toggle button to let viewers turn on/off autoscrolling so they can review past text but be quickly able to snap down to what is currently being written.
- Ability to run Aloft both over the Internet or as a local instance during cases in which a reliable Internet connection is not available (daemonize using pm2 and access via your machine’s IP addy).
In the Works
- Plugin that allows captioners using commercial steno software that typically output text to local TCP ports to send text to Aloft without having to focus their cursor on the editing window in the browser. Right now, Aloft is mostly ideal for stenographers using Plover.
- Ability for users on commercial steno software to make changes in the transcript, with changes reflected instantly on Aloft.
- Ability to execute Vim-like commands in the Aloft editor window.
- Angularize front-end elements that are currently accomplished via somewhat clunky scripts.
- “Minimal Mode” which would allow the captioner to send links for a completely stripped-down, nothing-but-the-text page that can be modified via parameters passed in to the URL (e.g. aloft.nu/stanley/columbia?min&fg=white&bg=black&size=20 would render a page that contains text from job name columbia with 20px white text on a black background.
That’s all I have so far but for the short amount of time Aloft has been in existence, I’ve been extremely satisfied with it. I haven’t used my commercial steno software at all, in favor of using Aloft with Plover exclusively for about two weeks now. Mirabai has also begun to use it in her work. I’m confident that once I get the add-on working to get users of commercial stenography software on board, it’ll really take off.
TL;DR
I was captioning a web development course when I realized how unsatisfied I was with every commercial realtime system currently available. I consulted the instructors and used what I had learned to build a custom solution designed to overcome all the things I hate about what’s already out there and called it Aloft because it maintains the avian theme of the Open Steno Project.
Try it out for yourself: Aloft.nu: Simple Realtime Text Delivery
Special thanks to: Matt Huntington, Matthew Short, Kristyn Bryan, Greg Dunn, Mirabai Knight, Ted Morin