
Act 1: The Snapshot
We begin with our feet firmly planted in the present and reflect on where the language services industry currently stands (spoiler: there’s no way to avoid this thing called GenAI).
But do we understand where we are and where we are headed? Are we asking the right questions, or should we just start with the “Why”?
There is a sense of urgency that powers the varied market players. We explore new pathways, test new solutions, learn from the mistakes made along the way, course-correct, and start again.
The industry is evolving, and we are in motion. In the following pages, we examine what this movement looks like.
When you open the Spotify app and scroll through your home screen, you’ll see a rich tapestry of content: a curated playlist to match your mood, a gripping true crime podcast, a relaxing audiobook, and now, perhaps even a short eLearning course on something you never knew you were curious about. It all feels personalized, relevant — and it’s in your native language.
What you don’t see is the complex and innovative engine behind the scenes that makes this seamless experience possible. Spotify isn’t just delivering content; we’re delivering it globally — and that means making every element, from the interface to the voice in your headphones, feel natural, local, and human.
In this article, I want to share how our team is pioneering a new era of localization — because as audio becomes the next frontier of global connection, we believe that everyone, everywhere, deserves to experience content that feels native, personal, and deeply human — no matter the language.
Because behind every playlist that lifts your mood, every podcast that pulls you in, and every voice that feels like it’s speaking just to you, there’s a quiet revolution unfolding. One where language is no longer a barrier, where culture is honored, and where Spotify is reimagining what it means to feel truly seen and heard, no matter where you are in the world.
A different kind of scale
Spotify currently operates in 180 markets and supports 74 languages. Behind this scale is a centralized localization team focused on more than just translation. We bring cultural context, linguistic expertise, and a deep understanding of how our platform works to everything from marketing campaigns in Japan to platform design in Brazil. Our work ensures that a Spotify user in Indonesia or Poland experiences the platform as if it were made just for them. As a start, the Spotify localization team has begun reimagining the very foundation of how content is localized at scale. In the last 18 months, we’ve gone from virtually no machine translation in production to 90% of our localization pipeline now supported by MT. This massive shift has allowed us to increase speed, reduce cost, and unlock faster go-to-market timelines — all while maintaining our high-quality bar through layered review and quality assurance frameworks.
But the use of AI has not stopped with the classic localization production use case. Even quality assurance is undergoing its own transformation. We’re now experimenting with AI in both linguistic and functional QA testing, leveraging models that can automatically flag inconsistencies, detect placeholder issues, and even simulate user flows across languages. For instance, when a translated string doesn’t fit the UI in German or a currency symbol is rendered incorrectly in Arabic, these systems can identify the issue before it ever reaches a user. Combined with human expertise, this hybrid model ensures that scale doesn’t come at the cost of experience.
And we‘re not stopping at how content is localized: we‘re also rethinking how localization itself is run. Looking ahead, we‘re exploring a bold new model: one where content isn‘t translated after it‘s created, but generated in multiple languages simultaneously from the outset. Technically, this means working with large language models (LLMs) and multilingual generation pipelines that can produce semantically aligned versions of content in parallel. Instead of a single source language driving the process, we envision a system where prompts, intent, and creative direction are fed into AI models that generate content natively in each target language — reducing the reliance on sequential translation workflows. Human-in-the-loop validation would still play a critical role, with expert linguists and reviewers fine-tuning tone, cultural nuance, and context before content goes live. This paradigm shift could enable faster go-to-market timelines, improved quality consistency, and an inclusive user experience where content feels truly native — from day one, in every market.
It‘s no cliché to say that AI has truly revolutionized our localization ecosystem and forced us to rethink the way we approach localization in every single aspect of its specific functioning.
AI redefining content interaction
And while we have made terrific progress in localization production, we have also started working on how localization, through the power of AI, can help revolutionize the entire Spotify company. As the platform expanded beyond music into podcasts, audiobooks, video content, and even live interactions, our localization strategy had to evolve fast. It’s not just about text anymore — it’s about voice, sound, and experience.
For example, let’s talk about podcasts. Traditionally, if a podcast was in English, only English speakers could access it, unless the creators produced separate versions in other languages. This unfortunately means a lot of people miss out as they don’t speak the language. Now, Spotify is exploring AI dubbing for podcasts using synthetic voices. Imagine a podcast originally recorded in English by a host in New York automatically dubbed in Spanish, Brazilian Portuguese, or Hindi using a synthetic voice that matches the tone and emotion of the speaker. That’s exactly what’s happening now: we’re testing it with internal pilots and developing a quality framework to ensure authenticity and cultural accuracy.

For video content, including music videos and video podcasts, we’re also generating multilingual subtitles — making sure that when an artist shares a story on camera, audiences across the world can understand, engage, and connect. This is especially powerful for creators who want to reach new markets without losing the personal voice and pacing that make their content unique.
Audiobooks bring even more complexity. Some titles exist in many languages but don’t have corresponding audio versions. In these cases, Spotify can offer synthetic voice-over recordings to fill that gap, making stories more accessible without the long production cycles of traditional voice actors. For books that only exist in one language, we’re testing full pipelines of AI translation combined with synthetic voice narration. Imagine a popular advice book originally recorded in French — now, with the help of our AI systems, it can reach a Korean listener for the first time. These efforts don’t just improve accessibility; they open doors for authors and publishers to connect with truly global audiences. And we’re not stopping there. Live experiences are the next big frontier. When artists speak directly to fans during a live-streamed “Listening Party,” we’re working to provide real-time multilingual captions and instant chat translations. This means a fan in Berlin can ask a question in German, and an artist in Los Angeles can answer in English — yet everyone in the chat sees the interaction in their own language. It turns one live event into a global dialogue.
To orchestrate all this, we’re building what we call MAP: the Multilingual AI Platform. It’s our centralized system for managing every AI-powered localization flow — from dubbing and voice synthesis to live captioning and beyond. This platform allows us to scale quickly, test new features efficiently, and ensure quality across every touchpoint. It’s our way of future-proofing localization for a world where content is increasingly multimodal and multilingual by default.
Of course, localization is not just about language — it’s about culture. Spotify’s localization team doesn’t just run content through a machine. We continuously test UX with real users in each market. We adjust product labels that may feel awkward in Japanese or fine-tune search terms in Turkish. When we market a new artist in Mexico, we tailor the message, imagery, and tone to reflect local music culture — not just translate a global campaign.
A recent example: we recently launched our audiobook platform in Germany, and during the launch process we worked closely with local publishers to ensure metadata was structured in a way that German users expect. We even updated our content categories to reflect regional preferences like “Krimi” (crime fiction) or “Ratgeber” (self-help). These are details that require human insight, linguistic sensitivity, and deep local knowledge.
This dual approach — traditional linguistic rigor combined with AI innovation — positions Spotify not only as a leader in music streaming but also as a true pioneer in global audio experiences. By embracing cutting-edge tools while staying rooted in cultural understanding, we’re not just translating, we’re transforming how people experience stories, music, and ideas.
Spotify changed how people experience music and audio — and now, we‘re transforming how people access and understand that content across different languages and cultures. And this is only the beginning. In the past, music was universal. Now, everything can be. Spotify is making audio borderless — one voice, one story, one language at a time.

Read the full 132-page Global Ambitions: (R)Evolution in Motion publication featuring vital perspectives from 31 industry leaders on the ongoing AI-spurred (r)evolution.
