This German nonprofit is building an open voice assistant that anyone can use

[ad_1]

There have been many attempts at open source AI-powered voice assistants (see Respy, Mycroft, and Jasper to name a few) — all founded with the goal of creating privacy-preserving, offline experiences that don’t compromise functionality. But progress has proven exceptionally slow. That’s because, in addition to all the usual challenges that come with open source projects, programming is a helpful difficult. There are years, if not decades, of R&D behind technologies like Google Assistant, Siri, and Alexa – and huge infrastructure to boot.

But that’s not stopping the folks at the Large-Scale Artificial Intelligence Open Network (LAION), the German nonprofit responsible for maintaining some of the world’s most popular AI training data sets. this month, LAION Announced a new initiative, BUD-E, which aims to create a “completely open” voice assistant capable of running on consumer hardware.

Why launch a brand new voice assistant project when countless people in different states are abandoning it? Wieland Brendel, Ellis Institute Fellow and contributor to BUD-E, believes that there is no open assistant with a sufficiently extensible architecture to take full advantage of emerging GenAI technologies, particularly along the lines of large language models (LLMs). . OpenAI chatgpt,

“Most conversations with [assistants] Rely on chat interfaces that are cumbersome to interact with, [and] “Interactions with those systems feel awkward and unnatural,” Brendel told TechCrunch in an email interview. “Those systems are fine for controlling your music or giving commands to turn on the lights, but they are not the basis for long and engaging conversations. BUD-E aims to provide the foundation for a voice assistant that sounds more natural to humans and that mimics the natural speech patterns of human conversations and remembers past conversations.

Brendel said LAION also wants to ensure that every component of BUD-E can eventually be integrated license-free, even commercially, with apps and services — similar to other open source efforts. It is not necessary.

In collaboration with the Ellis Institute in Tübingen, tech consultancy Collabora and the Tübingen AI Center, BUD-e – recursive shorthand for “Buddy for Understanding and Digital Empathy” – has an ambitious roadmap. one in blog postThe LAION team outlines what they hope to accomplish over the next few months, primarily building “emotional intelligence” into BUD-E and ensuring it can handle conversations involving multiple speakers at once.

“There is a huge need for a well-functioning natural voice assistant,” Brendel said. ,LAION has shown in the past that it is very good at building communities, and ELLIS Institute Tübingen and the Tübingen AI Center are committed to providing resources to develop the assistant.

BUD-E is up and running – you can download And install it from GitHub today on Ubuntu or Windows PC (macOS is coming) – but it’s very clearly in the early stages.

LAION combined several open models to assemble the MVP, including Microsoft’s Phi-2 LLM, Columbia’s text-to-speech StyleTTS2, and Nvidia’s FastConformer for speech-to-text. Thus, the experience is not optimized a bit. The BUD-E requires a strong GPU like Nvidia’s to respond to commands within about 500 milliseconds – in the range of commercial voice assistants like Google Assistant and Alexa RTX 4090.

Collabora is working for free to adapt its open source speech recognition and text-to-speech models, WhisperLive and WhisperSpeech, for BUD-E.

“Building text-to-speech and speech recognition solutions ourselves means we can customize them to the extent that they are exposed through APIs,” said Jakub Piotr Klapa, AI researcher at Collabora and member of the BUD-E team. Not possible with discontinued models.” said in an email. “Collabora started working at the beginning [open assistants] Partly because we were struggling to find a good text-to-speech solution for an LLM-based voice agent for one of our customers. We decided to join forces with the broader open source community to make our models more widely accessible and useful.

In the near future, LAION says this will work to make BUD-E’s hardware requirements less onerous and reduce the assistant’s latency. A long-horizon venture is creating a data set of dialogues to fine-tune BUD-E – as well as a memory mechanism to allow BUD-E to store information from past conversations and a speech processing pipeline that enables talking. Can keep an eye on many people. Immediately.

I asked the team if accessibility This was a priority, given that speech recognition systems historically do not perform well with languages ​​that are not English and whose accents are not transatlantic. a stanford Study Found that speech recognition systems from Amazon, IBM, Google, Microsoft and Apple were almost twice as likely to mishear black speakers as white speakers of the same age and gender.

Brendel said so LAION is not ignoring accessibility – But it’s not “immediate focus” Bad-E.

“The first focus is really on redefining how we interact with voice assistants, before generalizing that experience to more diverse accents and languages,” Brendel said.

to that end, LAION has some great ideas for BUD-E, ranging from an animated avatar to materialize the assistant to help analyze users’ faces via webcam and take into account their emotional state.

The ethics of that last part – the facial analysis – are unintuitive, to say the least. But LAION co-founder Robert Kaczmarski stressed that LAION will remain committed to security.

,[We] Strictly follow the security and ethical guidelines drawn up by the EU AI Act,” he told TechCrunch via email — referring to the legal framework governing the sale and use of AI in the EU. The EU AI Act allows EU member states to adopt more restrictive rules and safeguards for “high-risk” AI, including emotion classifiers.

,This commitment to transparency not only facilitates early identification and correction of potential biases, but also supports the objective of scientific integrity,” Kaczmarski said. “By making our data sets accessible, we enable the broader scientific community to engage in research that maintains the highest standards of reproducibility.”

LAION’s previous work has not been ancient in a moral sense, and it’s a somewhat controversial separate project he’s working on at the moment emotion detection, But maybe BUD-E will be different; We will have to wait and see.

[ad_2]

Thanks For Reading

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
Global Social Media Outage: Facebook, Instagram, Messenger – Resolved! Cal.com: Know how this productivity tool can keep you on track at work; it is free for individuals Amazon is offering a whopping 26 pct discount on iPhone 14 Plus: Check offers here iPhone 15 price drop: Get a huge 11% discount on Amazon now – check deal NASA captures the most powerful black hole eruption ever recorded! Check details here. Private US moon lander Odysseus enters lunar orbit en route to historic touchdown attempt Want to buy the new Samsung Galaxy S24 Ultra? Check out this huge Amazon discount Grab 11 pct discount on iPhone 15! Check deals and whopping exchange offer on Amazon NASA calls for volunteers to join simulated one-year Mars surface mission iPhone 14 price drop: Huge 15% discount now on Flipkart; check Rs. 42000 exchange offer too