Using AI to enable interactive online story telling with Southern Dialect Auslan
Ravi and Emma is an interactive love story told in Southern Dialect Auslan and produced by the Australian multicultural broadcaster SBS, and artists from the Deaf community.
The project is world-first in that it allows audience members to interact online with the story by learning 14 select AUSLAN signs using custom Artificial Intelligence technology.
SBS asked us at Silverpond to partner with them on the Ravi and Emma interactive and create an AI system to recognize the 14 Auslan signs. We did this using our product Highlighter.
SBS gathered 300 videos of diverse volunteers signing the selected Auslan signs. Training on diverse appearances is vital in order for the computer to recognise everyone. SBS also ensured that many of these volunteers were fluent in Auslan.
We input the videos into Highlighter to train the AI. The AI first detects hands and then classifies the movements of those hands to determine which AUSLAN sign is being performed.
Auslan is a full body experience, it’s more than just hands, so future versions of the system will incorporate facial and body key points as well.
The fluent Auslan speakers helped ensure that the AI recognised the initial 14 signs to a high level of accuracy.
We then built the Ravi and Emma website to house the technology and enable the audience to see the signs in action and participate using their onboard camera.
Throughout the Ravi and Emma documentary the user is encouraged to sign. Their participation activates on-screen animations and also allows them to switch perspectives at any time.
As an SBS project, the site is free and open to the public, so there is the potential to serve a large number of users over time and for many users to access the website simultaneously.
To manage system performance and resources at this scale, we set it up so that this is all done in the user’s browser. It also means that no videos of the audience are stored as they participate in the experience.
With Ravi and Emma, we’ve used AI not only to allow people who are Deaf or hearing impaired to interact with an online story, but also as a creative way to introduce AUSLAN to new audiences.
We see huge potential for where this technology can be used; in entertainment, health services and education just to name a few.
The technology behind the scenes
The development of an AI that can recognise Southern Dialect Auslan gestures began in September 2018 when Matt Smith from SBS reached out to Silverpond with a concept to use AI for interactive storytelling. After discussions between Expressions Australia, SBS and Silverpond in late 2018, Silverpond embarked on internal research to see what it would take to develop a model suitable for interactive storytelling.
The model would need to be trained on diverse data and the model would need to be performant since it would need to run quick enough to support a storytelling experience. A number of challenges needed to be solved, not least teaching it the 14 gestures required for the story.
Research and planning occurred throughout 2019 with data collection occurring towards the end of 2019. SBS gathered 300 videos of diverse volunteers signing the selected Auslan signs. These were then uploaded to Silverpnd’s AI platform HighLighter to train and evaluate the AI model. Training on diverse appearances was vital in order for the system to recognise everyone. SBS also ensured that many of these volunteers were fluent in Auslan so that the data being collected was of suitable quality.
AI development began in earnest in 2020, with the first test model ready in the second half of 2020. Silverpond developed the Auslan Gesture Model across 3 different phases. The initial phase was a video classification prototype that worked on clips of video only. The next phase demonstrated it working on video streams to a server from a browser. This prototype ran on 4 T4s supporting 12 concurrent users. And the third phase was getting the Auslan Gesture Model to work in the browser without sending any video to a server. This was essential In order to scale this to a large number of users and improve privacy.
Silverpond’s AI system detects Auslan signs in two stages. First, it detects key points in the user’s hands using Google MediaPipe and then it tracks and classifies the movements of the hands to determine which Auslan sign is being performed using a HighLighter classifier where the temporal information across multiple frames is used to work out the gestures.
Future versions of the system will incorporate facial and body key points as well. This gesture recognition is performed with the webcam video never leaving the user’s browser.
Once the AI was deemed suitable, Silverpond were ready to tackle building the rest of the storytelling experience with SBS. Part of the challenge was to seamlessly integrate the Auslan gesture recognition into the storytelling experience so it wasn’t disruptive and making it fun for the audience. And so a key requirement of the Ravi and Emma site was to access the user’s computer’s GPU to speed up computation, making it snappy and responsive to the user’s input.
Ravi and Emma is an interactive love story told in Southern Dialect Auslan and produced by the Australian multicultural broadcaster SBS, and artists from the Deaf community. The AI and the site was developed by the team at Silverpond using their AI platform HighLighter.
‘WOW! So many things I loved about it. It’s fantastic. Beautiful story. I’m teary and want to learn auslan now. The interactivity was awesome too and super cute illustratrations’ – Audience quote
Experience the site here: raviandemma.sbs.com.au