The core inputs
MMM needs three categories of data:
Marketing activity data: Spend by channel. Impressions for digital channels. GRPs for TV. Promotional calendars (dates and details of discounts, offers). For paid media: spend, frequency, reach if available.
Business outcome data: Sales revenue. Units sold. Leads. Conversions. Signups. Whatever you're trying to drive. You need one primary metric. That's the outcome variable the model tries to explain.
External factors: Seasonality. Weather. Competitor activity. Economic indicators. Pricing changes. Macroeconomic data. Anything that affects demand beyond your marketing. These variables help the model isolate the true effect of your marketing from other factors.
These three categories together tell the model: here's what we spent, here's what we got, and here's the context. Now find the relationship.
Granularity matters
Weekly data is the standard. Daily is better if you have it. Monthly works but limits what the model can detect.
Why? The model needs to see variation. If you only have 12 data points (monthly for a year), there's limited variation to learn from. Weekly data gives you 52 points, which is material. Daily gives you 365. More data points mean the model can detect patterns that would be hidden in monthly aggregates.
Minimum history is 2 years. Better is 3+ years. The model needs to see enough seasonal cycles and enough variation in spend. If you've only run certain channels for 6 months, you might not have enough history to model them confidently.
This is straightforward: bigger datasets train better models.
The common gaps
Offline media data. Often stuck with media agencies. You don't have spend data or impressions for TV, radio, outdoor. The data exists but it's locked behind the agency. Getting it requires coordination and sometimes contract negotiation. Plan for this.
Competitor spend. You rarely have exact numbers. You can estimate from Adbeat, Semrush, or similar tools, but it's not ground truth. Use estimates but flag them as such. The model can still work with imperfect competitive data.
Promotional data. Often buried in spreadsheets in different formats. What was the discount? When did it run? Which channels did you promote it on? This data usually exists somewhere but in messy form. Spend time getting this right: promotions are a major driver of sales, and the model needs to account for them.
Brand metrics. Awareness. Consideration. Purchase intent. These are nice to have but not essential. If you run brand tracking studies, include them. If not, don't worry. MMM works without them.
Format doesn't matter (anymore)
Historically, you'd spend weeks getting everything into a single clean spreadsheet. Different files. Different formats. Different time periods. Reconciling it all was painful.
Modern automated data prep tools change this. They can take CSVs, Excel files, platform exports, even PDFs, and map them into model-ready format automatically. Format becomes less of a constraint. Messiness matters less than completeness.
This is what Rix (our data ingestion tool) does. You don't need a perfectly formatted dataset. You need the right data. We handle the formatting.
A practical checklist
Before you start an MMM, gather:
- Media spend by channel, weekly. Include all paid channels: search, social, display, TV, radio, outdoor, direct mail.
- Sales or revenue data, weekly. The metric you're trying to explain.
- Pricing data if it changes. Price variations affect demand and need to be modelled.
- Promotional calendar. Discount dates, offer details, which channels promoted it.
- Any external factors you track. Weather data if relevant. Competitor activity if you track it. Macroeconomic data.
Don't worry about format. Don't worry about completeness. A good MMM partner works with what you have. The data you have is almost always good enough.
The rule: If you have 2+ years of weekly spend data and weekly sales data, you can build an MMM. Format, gaps, and messy sources aren't blockers. They're just things to work around.