Media-Mix Marketing Package Robyn: An In-Depth Look at Its Capabilities and Shortcomings

What are MMMs and what are they used for?

An important part of any business is marketing. By successfully allocating your marketing budget to the right marketing channels, you can increase your revenue or reduce your marketing costs while keeping the same revenue.  But deciding how to allocate your marketing budget optimally is easier said than done. MMMs are exactly designed for this purpose. An MMM model will find how effective (or ineffective) different marketing channels are. Once you know this relationship you can optimize your marketing budget accordingly.

One of the leading approaches to MMM models is the open-source package Robyn, developed by Meta. Robyn makes it easy for anyone with a basic understanding of programming in R to set up a MMM model and run a budget optimization. You load in your marketing expenditures and your revenue and Robyn will do the rest. Due to its ease of use, Robyn is becoming the industry standard for MMM models. But unfortunately, Robyn has quite a few shortcomings and limitations you should consider. 

To demonstrate Robyn’s shortcomings and limitations, we decided to put it to the test using a simulated data set. With a simulated data set, we will be able to see how well Robyn will perform since we know the underlying mechanics behind this data set. We can, for example, configure how each of the marketing channels impacts revenue. This allows us to validate if the Robyn models found the same degree of importance.

Simulated Dataset

The simulated dataset we will be using consists of five different components that we expect to find in any real data set. Each of these five components contributes to the revenue. Using these five components we will simulate the revenue for three years. Below you can see a breakdown of the simulated dataset:

 

The first component is the general trend or your baseline sales. It’s generated independent of the time of the year or your marketing spending. The second component is the yearly season, which models the fact that in a specific part of the year, more revenue is generated than in other parts of the year. The third component is the weekly season, which is the same as the yearly season but on a weekly time scale. The fourth component is the one we are interested in: the effect your marketing spend has on revenue. We will be using 8 different marketing channels. Lastly, we also included 3 promotional events in the year where a lot of extra revenue is generated. An overview of each of the components can be seen below:

The relationship between ad spend & revenue

At the heart of an MMM lies its budget optimization capabilities, it lets you find the relation between your marketing spend and the revenue. However, this relation is not linear. As you spend more and more in one channel the returns of this channel will start to diminish; this is also known as channel saturation. This effect can be modeled using a saturation curve.

The main task Robyn carries out is to estimate these saturation curves. If you can correctly identify these curves you will also be able to find the optimal budget allocation which maximizes your revenue. If on the other hand, you fail to find the correct saturation curves you won’t be able to find the best budget allocation. And even worse, your new budget allocation can lead to a decrease in revenue. Exactly the opposite of what you want.

We let Robyn estimate the saturation curves for our 8 different marketing channels. It is important to note that Robyn does not come up with just one model when it tries to optimize your budget, instead, it finds anywhere between 2 and 14 “best” models. We selected the first 4 best models Robyn found while optimizing the simulated data set. The results can be seen in the figure below:

 

When looking at the figures, it can be seen that Robyn was not able to accurately estimate the true saturation curves for almost all the channels. Most of the time it greatly underestimates the influence each channel has on the revenue. This is most apparent in channel 3, where Robyn deemed the channel to be completely insignificant. But the exact opposite is the case since channel 3 has actually one of the strongest effects on the revenue (see the y-axis). These wrongly estimated saturation curves have a severe impact on the budget optimization.

Missed opportunity

How exactly do these saturation curves translate into an optimal budget? Let us demonstrate this with a small example. Assume your total marketing budget is $1000 and you have two marketing channels to spend on. You spend a lot of this $1000 in Channel 1, say $900, and a little in the other, $100. Using the saturation curves of each channel we can see that this results in an expected revenue of $652 + $202 = $854 (see left plot below).

As you might have already seen, there is a lot of untapped potential in Channel 1; spending a little extra in this channel will result in a lot of extra revenue. So, we will reallocate the budget while keeping the same total budget. We now spend $340 on Channel 1 and $660 on Channel 2. This results in a revenue of $463 + $539 = $1002. An increase of 17% by just reallocating your marketing budget. (see right plot)

We used this same principle using the saturation curves Robyn found for each of our 8 marketing channels. Since Robyn performs poorly at finding the saturation curves, the budget allocation is unfortunately also quite bad. In 3 of the 4 models, following the budget reallocation of Robyn leads to a decrease in revenue compared to the original revenue of the simulated data set. The results can be seen in the table below:

Model

Before

After (Robyn)

Increase (Robyn)

Robyn_0

€ 2,591,259.16

€ 2,673,925.98

3.19%

Robyn_1

€ 2,591,259.16

€ 2,416,672.25

-6.74%

Robyn_2

€ 2,591,259.16

€ 2,471,183.28

-4.63%

Robyn_3

€ 2,591,259.16

€ 2,516,982.67

-2.87%

To make sure this is not just a fluke, we repeated the same process described above for 100 different datasets. For each of these datasets we let Robyn estimate the saturation curves and using these saturation curves we found an increase or decrease in the revenue. Since we also know the true saturation curves, we were also able to find the optimal possible revenue increase for each of the datasets. The results can be seen in the figure below:

 

On average the potential revenue increase sits around 9% as can be seen when looking at the left bar of the figure. Robyn only found an average increase of 0.7%. But as mentioned earlier, it often actually led to a decrease in revenue. When processing the 100 datasets it found 758 best models. Of these 758 models 315 of them would lead to a decrease in revenue, around 42% of the time.

Since Robyn comes up with multiple best models per dataset, it is sometimes hard to determine which one to use. To solve this problem, we can look at the 3 different fit metrics Robyn uses to determine its best models. These are the NRMSE, R-squared (R2) and DRSSD. So, we also computed the average revenue increase of the top 10% models of the top 758 models for each of these three fit metrics. As can be seen in the figure, the total average revenue increase went up but is still not close to the optimal increase. And once again a lot of these top 10% models would also lead to a revenue decrease.

Conclusion

As demonstrated, optimally allocating your marketing budget can be challenging. Even large tech companies like Meta face difficulties in implementing robust solutions as their MMM approach often results in decreased revenue. Fortunately, there are alternatives to using Robyn. If you’re interested in exploring better solutions or discussing your approach to MMMs, feel free to get in touch.

Experimentation: The Essential Engine for Today’s Business Success

Introduction: The Necessity of Experimentation

In the digital age, companies like Amazon, Netflix, and Airbnb have paved the way for a new way of thinking: test and fail quickly. That’s where measurement and experimentation come in: techniques to quickly verify if you’re running in the right direction.

As aviation stands on the edge of change, exemplified by ventures like Flyr and Fetcherr, its leaders face a pivotal choice: to evolve or risk becoming obsolete. Experimentation, a practice embraced by tech titans such as Uber and Booking.com, can guide the way. It enables businesses to experiment in controlled settings and constantly refine strategies based on tangible results. 

This cycle of continuous learning matters: Microsoft and Booking.com found that only one-third of their experiments had a positive impact on key metrics. While this seems almost disappointing, it’s a significant leap when compared to the estimated 5-10% success rate for initiatives undertaken without systematic testing.

For the airline sector, this presents a huge opportunity. From designing dynamic pricing strategies to overhauling loyalty programs grounded in real-time feedback, there are endless opportunities. The imperative isn’t merely to experiment, but to value each outcome as a stepping stone, utilizing the learnings from every test.

A culture worth aspiring to

Uber: Optimizing Ride Pricing Through Surge Algorithms

Background

One of the main challenges for ride-sharing platforms is ensuring that supply matches demand. Too few drivers during peak times can lead to missed revenue opportunities and disgruntled customers, while too many drivers without sufficient rider demand can demotivate and financially strain the driver community.

Experimentation

Uber introduced dynamic pricing, more commonly known as “surge pricing,” to address this. However, determining the right surge multiplier is complex. Uber conducted experiments by tweaking its pricing algorithms in real-time in various cities and monitoring the results.

Outcome

Through this iterative process, Uber was able to refine its surge algorithms to better predict when and where surges should be applied, ensuring a balance of driver availability and rider demand. This not only improved customer satisfaction by reducing wait times but also increased revenue during peak times.

Booking.com: Enhancing User Experience Through A/B Testing

Background

With a myriad of options for accommodations, presenting the most relevant options to users based on their preferences and browsing behavior is crucial for platforms like Booking.com. The challenge lies in understanding which site features and presentation styles resonate most with different user segments.

Experimentation

Booking.com is known for its rigorous A/B testing culture. They continuously run thousands of experiments, tweaking everything from button colors to the order of hotel listings. For instance, they tested different call-to-action phrases, image placements, and review presentation formats to understand what influences a user’s booking decision.

Outcome

These experiments have led to a more personalized and efficient user experience, resulting in higher conversion rates. They’ve found that even minor changes, such as a slight alteration in phrasing or button positioning, can have significant impacts on user engagement and bookings.

Instacart: Delivery Logistics with Advanced A/B Testing, Boosting Efficiency by 3%

Background

Instacart, a grocery delivery service, experienced rapid growth, expanding from 30 to 190 markets in just a year. Their primary challenge was efficiently dispatching shoppers to fulfill and deliver orders. The goal was to deliver groceries on time while maximizing the earnings potential for shoppers.

Experimentation

Given the challenge of conventional A/B testing in their logistics system, Instacart devised an innovative approach. Instead of splitting samples by customers or shoppers, they segmented their service areas into “zones” and then applied A/B testing by splitting samples by zone and day. They used two algorithm variants: the existing one (A) and a new one (B). Initial tests included simulations, before-and-after analysis, and difference in differences. However, these methods were not conclusive. Finally, they employed multivariate regression, which factored in various variables like zone, day of the week, and week number to assess the efficiency of the new algorithm.

Outcome

The application of multivariate regression demonstrated that the new algorithm led to a significant 3.0% improvement in delivery efficiency. By using this data-driven approach, Instacart was not only able to improve efficiency but also enhanced shopper satisfaction. This approach allowed the company to innovate at a faster pace and optimize its logistics engine in a rigorous manner.

Applications in the Airline Industry

As the aviation sector faces the challenges of a digital age, it’s becoming increasingly clear that traditional methods are challenged. To remain competitive, airlines must adopt strategies akin to those employed by tech giants like Uber, Booking.com, and Airbnb. These companies have set benchmarks in personalization, dynamic pricing, and loyalty programs through a culture rooted in experimentation.

Tech giants have found that only about one-third of their experiments actually succeed. If you aren’t learning from both hits and misses, you’re effectively wasting over 60% of your efforts. There’s no room for this kind of inefficiency in a sector as competitive as airlines.

A few examples of what’s worth measuring:

1. Hyper-Personalization

Airbnb and the like have made personalization more than just a buzzword; they’ve made it a science. Airlines can do the same. Imagine in-flight services tailored to individual passenger preferences, or special promotions based on previous trips. The catch? Experimentation. Data alone can’t create magic; it’s through iterative tests that you identify what truly resonates with your customers.

2. Dynamic Pricing

Airlines are no strangers to variable pricing, but the game has changed. Look at Uber’s real-time pricing fluctuations based on demand. How about allowing passengers to bid on fares? It’s more than just a new feature; it’s a potential revolution in how aviation approaches pricing. But you won’t know unless you try, test, and iterate.

3. Loyalty Programs

In a digital world, loyalty is currency, and stale programs won’t suffice. Experiment with offering rewards that go beyond miles. How about partnerships with e-commerce platforms or local businesses? Again, it’s about running experiments to determine what’s most engaging for your customer base.

None of these advancements can occur in a vacuum. Like tech giants, airlines must embrace a culture of experimentation. Whether you’re altering your boarding procedures to improve on-time performance or partnering with unconventional companies to expand your loyalty program, every initiative must be viewed as an experiment.

Key Challenges: Legacy Tools, Lack of New Skills, Mindset & Culture

In the airline industry, there’s an undercurrent of tension between time-tested methods and the urgent need for agile, forward-thinking practices. The industry’s ability to readily adapt to business experimentation isn’t just a question of new technology; it’s fundamentally a challenge of mindset, dated tools, and resistant culture.

Challenge 1: Skill Gaps – The Pitfalls of Simple Metrics

Let’s start with the basics. Many industry players default to simple pre/post comparisons to assess the impact of any new initiative. The problem? This method often misses the forest for the trees. These analyses overlook external variables—like seasonality or competitor strategies—that can distort results and misguide decision-making.

While the more advanced methods like the synthetic control method may seem like an upgrade, it’s not without its flaws. This approach can offer a semblance of control, yet it’s inept at picking up those marginal gains—let’s say a 1-2% uplift—that can spell the difference between red and black on a balance sheet.

Challenge 2: Legacy Tools – When ‘Tried-and-True’ Becomes ‘Outdated’

Beyond skills, there are also legacy tools that the industry holds onto, whether it’s software or contractual agreements, which often impede full-scale experimentation. These tools, built for a different era, simply can’t offer the granularity or adaptability needed for agile, data-driven decision-making.

Challenge 3: The Cultural Inertia – Hesitation Over Innovation

Then there’s culture—the not-so-small elephant in the room. The aviation industry has a storied history and an often cautious approach to change. While this caution may have its merits, it also fosters an environment that’s hesitant to wholeheartedly embrace experimentation. It’s not just about the technology; it’s about preparing minds to steer the ships differently.

A few pointers on getting started

Embarking on a journey of business experimentation in the aviation sector calls for more than just good intentions. It demands a strategic approach, designed to generate quick wins while building sustainable, long-term capabilities.

Step 1: Choose Your Battle Wisely: Scope a High-Impact Initiative

Don’t boil the ocean. Identify a single, high-impact area where the benefits of an experimental approach can be easily demonstrated. This could be your frequent flyer program or an e-commerce project—ideally, a data-rich and agile area of the business. A successful pilot in one of these domains can be a powerful catalyst for wider change.

Step 2: Assemble the Dream Team: Gather a Cross-Functional Unit

Put together a nimble team of key stakeholders: operations, data science, and finance. Make sure the group is not just diverse but also empowered with the right skill sets to implement and measure the experiment.

Step 3: Engage the Corner Office: Secure Senior Leadership Buy-In

Pick a project that already has the ear of the C-suite. When top management isn’t just signing off but is genuinely invested in the outcome, the ripple effect through the organization can be transformative.

Step 4: Seek External Guidance: Consult Industry Experts

Why reinvent the wheel? Partner with external firms or consultants who have been down this road before. A fresh perspective and best practices from sectors where experimentation is the norm can greatly accelerate your progress.

By strategically focusing on these key areas, you set the stage for a culture where experimentation isn’t a one-off initiative but a business imperative. With targeted projects, a cross-disciplinary team, C-suite buy-in, and valuable external insights, you’re not just keeping up—you’re setting the pace.

Why Embracing Experimentation is No Longer Optional for Success

In today’s ultra-competitive business landscape, the rule of the game is adapt or get left behind. Companies like Uber and Booking.com have risen to the top not just by innovating but by rigorously testing and validating each of those innovations. These firms have discovered that only about one-third of their experiments lead to positive results. It’s a huge edge knowing what 30% of your investments work and where to invest going forward.

Let’s put it bluntly: If you’re not part of the experimentation wave, you’re missing out on refining your intuition and doubling down on what’s working for your business and your customers. That’s like leaving money on the table in an industry where margins are razor-thin. Your competitors are leapfrogging, not by doing more, but by continuously learning through experimentation and getting a bit better every day.

In sectors like aviation, where the change is vital, embracing a culture of experimentation isn’t a ‘nice to have.’ It’s a lifeline. If the status quo remains unchallenged in your organization, you’re not merely stagnating—you’re effectively retreating in a race that waits for no one.