Lead scoring: why single-score setups are dead in 2026

A VP of marketing showed me her lead scoring dashboard last month. Every contact at a 95 plus score for the last three quarters. The sales team had stopped looking at the field entirely. "We built this with our HubSpot consultant two years ago," she said. "The reps say it is useless. I think they are wrong, but I cannot prove it."

I pulled her closed-won data for the prior year. Of the 142 closed deals, 38 came from contacts with scores under 30. Twelve came from contacts with no score at all because they were added by SDRs after a cold call. Her scoring model was not wrong. It was measuring the wrong thing. It rewarded marketing engagement, which is a behavior, while her real buyers were companies that fit a specific profile and had triggered a specific event.

This is the most common lead scoring failure I see, and it has almost nothing to do with the tool. I have built dozens of these models inside HubSpot, Marketo, and a few custom builds in Clay plus n8n. The pattern is the same. Teams score the activity that is easy to track, not the activity that predicts revenue. So let me walk through what actually works and what to throw out.

What a lead scoring model is supposed to do

A lead scoring model has one job: tell a sales rep which contacts to call this week. That is it. Not "lead quality." Not "marketing efficiency." Not "MQL volume."

If the score does not change the rep's behavior, it is decorative. If the rep ignores the score and works from a list of their own making, the model is dead and should be rebuilt or removed.

The second job, which is downstream, is to give marketing a feedback loop. When you can see that scores above X close at Y rate, you can tell finance which channels and campaigns generate the contacts that actually pay you.

The point

If your reps do not work the list in score order, your scoring model is decoration.

The whole point of the score is to change who gets called first. If that is not happening, the model is broken or your sales team has stopped trusting it. Both are fixable.

The two-axis model that actually works

Every working lead scoring model I have built or seen has two axes. Not one. Two.

The first axis is fit. Does this contact match my ideal customer profile? Right industry, right size, right role, right region. This is mostly company data and contact data, and it does not change unless the company changes.

The second axis is intent. Has this contact done something that suggests they are looking to buy soon? This is behavioral data and signal data, and it changes hour by hour.

A contact at a perfect-fit company with no intent signals is a future deal. A contact at a poor-fit company with high intent is a waste of an SDR hour. You only call when both axes are lit up.

The broken single-score model

Opened 3 emails: +10

Visited pricing page: +20

Downloaded ebook: +15

Job title is VP: +25

Total: one number that means nothing

The two-axis model

Fit score: industry, headcount, role, tech stack

Intent score: pricing visits, demo requests, intent provider signals

A grid that tells reps which quadrant to work

Decay rules so old intent does not stay hot forever

In HubSpot you can build this with two separate properties: fit_score and intent_score. In Attio you can model it as two attributes on the company record. In a custom build with Clay and n8n, you compute both in a workflow and write them back to the CRM.

How to build the fit score

The fit score is the easier of the two. It is mostly static company and contact data. The trap is treating it like a survey.

Pull your last 24 months of closed-won deals. Then pull your last 24 months of closed-lost deals. Compare the company attributes. The fit score should reward attributes that are statistically over-represented in closed-won and under-represented in closed-lost.

For most of my B2B clients, the attributes that matter end up being:

Industry or sub-industry, not the broad NAICS code
Headcount band, which is usually narrower than people think (often 50 to 250 or 250 to 1000)
Tech stack signals such as "uses Snowflake" or "runs HubSpot Marketing Pro"
Geography, often a tax or compliance constraint you did not realize you had
Role and seniority of the contact, with a cap on how many points seniority earns

Each attribute gets a weight. Industry might be worth 30 points, headcount 20, tech stack 25, region 10, role 15. The numbers are arbitrary. What matters is that the total fit score correlates with close rate when you check the model against past data.

A working fit score has clear bands. I usually set them at 0 to 40 (poor fit, do not work outbound), 40 to 70 (decent fit, work if intent is high), 70 to 100 (priority accounts, work always).

If you skip this step and start with intent, your reps will end up chasing high-intent contacts at companies that will never buy. I have seen this kill SDR teams. The reps work hard, hit activity targets, and produce zero pipeline.

How to build the intent score

The intent score is the moving part. It is also where most teams go wrong because they measure marketing engagement and call it intent.

Email opens are not intent. Webinar attendance is not intent. Whitepaper downloads are not intent. They correlate weakly at best, and the correlation has gotten worse since Apple Mail Privacy Protection started auto-opening emails for users in 2021.

The behaviors that actually predict purchase intent in 2026 are:

Visiting the pricing page more than once in 14 days
Booking or rescheduling a demo
Visiting any page in the documentation or developer section if you have one
Repeated visits from the same company on different days
A second or third contact at the same company starting to engage

You weight each behavior, but more importantly you apply decay. Intent rots fast. A pricing page visit from yesterday is worth ten times a pricing page visit from 90 days ago. If you do not apply decay, every active contact ends up at the score ceiling and the rep loses the signal.

External signal data adds another layer. Tools like 6sense, Bombora, and G2 buyer intent give you third-party signals when accounts research your category off your site. These are noisy. I weight them at about a third of what I weight first-party site behavior. If you want a walkthrough of how to filter and act on these, I wrote about B2B buyer intent data signals you can act on last quarter.

4.2x

close rate on contacts

14 days

intent half-life I use

61%

of MQL volume that was noise

Those numbers are from a recent rebuild for a Series B fintech. Their old single-score MQL definition was pulling 480 contacts a week into the sales queue. After we split into fit and intent and applied decay, the queue dropped to 187 contacts and close rate went up 4.2 times because reps stopped wasting cycles on poor-fit accounts that had opened a few nurture emails.

The grid is what reps actually use

A score is a number. A grid is a decision. Reps want a decision.

The grid I deploy in HubSpot has four cells:

01 / High fit, high intent

Call today

Hot. Auto-route to a rep with a 30 minute SLA. Slack ping. This is where pipeline gets made.

02 / High fit, low intent

Nurture and watch

Future deal. Marketing owns these. ABM ads, light sequences, listening for intent signals.

03 / Low fit, high intent

Disqualify or PLG

Often a researcher or wrong segment. Route to a self-serve product flow if you have one, otherwise close the loop.

04 / Low fit, low intent

Ignore

Suppress from sales. Suppress from marketing if budget is tight. Stop spending on these.

When reps see a grid, they know what to do. When reps see a single score of 73, they ask "what does 73 mean again?" and walk away.

Where most teams break the model

The five mistakes I see almost every time I audit an existing model.

Mistake one: scoring every activity. Adding a behavior because you can track it. Email opens, social follows, video plays at 30 percent. Most of these have no correlation with revenue. They inflate scores and create false hot leads.

Mistake two: never recalibrating. A model built in 2023 on 2022 data is wrong in 2026. Your ICP shifted. Your product shifted. Your competitors shifted. You should re-fit the model against the last 12 months of closed-won at least twice a year.

Mistake three: no decay. Behavior scores keep climbing. Everyone is hot. The model becomes useless.

Mistake four: marketing owns the model alone. The score has to be agreed by sales and marketing together. If the reps did not help define what a hot lead looks like, they will not work the queue in score order.

Mistake five: scoring lives in a spreadsheet, not the CRM. I have seen great models that live in a marketing ops tool but never write back to HubSpot or Salesforce. The rep does not see them. The grid does not appear in the deal record. The model does not influence behavior.

The plumbing: how to wire this in 2026

Most modern stacks I build look like this.

Step 01

Enrich

Clay waterfall pulls firmographics, tech stack, and contact role on every new lead.

Step 02

Score fit

n8n workflow computes the fit score from the enriched data and writes to HubSpot.

Step 03

Score intent

HubSpot tracks site behavior. n8n pulls intent provider signals nightly and merges.

Step 04

Route

Grid cell drives lead routing. Quadrant 1 contacts auto-assign with a 30 minute SLA.

You do not need a fancy data warehouse for this. I have built it for teams as small as five people using HubSpot, Clay, and a single n8n instance. The cost is around $500 per month all-in for that size of team.

If you are on Attio, the same wiring works. Intent comes from Attio's own behavior tracking plus webhook events from your site, and you compute the score either inside Attio or in n8n and write it back.

For larger teams with custom data needs, see how we approach CRM and RevOps engagements and the AI automation work that often sits alongside scoring builds.

How to know your model is working

A lead scoring model is a hypothesis. You should be able to test it.

The single metric that matters: close rate by score band. Pull the last 90 days of closed deals. Group by score band at the time the deal was created. If your top band closes at 4 to 8 times the rate of your bottom band, the model is working. If the bands close at similar rates, the model is not predictive and needs to be rebuilt.

Run this check monthly. Have the report on a dashboard. If close rate by band starts to compress, your ICP has shifted and the model needs to be re-fit.

The second metric: queue health. Average score of contacts being worked by sales this week. If reps are working low-score contacts and ignoring high-score ones, you have a trust problem with sales, not a model problem.

What to do this week if your current model is broken

Three steps, in order.

First, run the close rate by score band check on your existing model. If it is flat, the model is broken and you should not patch it. You need to rebuild.

Second, pull your closed-won and closed-lost from the last 12 months into a spreadsheet. Look at the company attributes side by side. The fit score weights will become obvious within an hour of looking at the data.

Third, talk to your top two sales reps. Ask which contacts they call first and why. Their answer is your intent score. Encode their instinct into a model and they will trust the output.

Your scoring model is probably costing you pipeline.

If your reps do not trust the score, the model is dead. Book a free 30 minute audit and we will show you the three changes that would matter most.

Book a scoring audit →

FAQ

How long should a lead scoring rebuild take?

For a 5 to 50 person GTM team, plan on 2 weeks from kickoff to live. Week one is data analysis and weight definition with sales and marketing in the room. Week two is build, test against historical data, and deploy. Anything taking longer than 4 weeks is over-engineered.

Should I use AI or machine learning to score leads?

For most teams under 200 SDRs and AEs, no. A rule-based two-axis model is more transparent, easier to explain to reps, and easier to recalibrate. ML models become worth it when you have over 50,000 inbound leads per quarter and enough closed-won volume to train on. Below that, the rule-based model will outperform because you can iterate on it fast.

How often should I recalibrate the model?

At minimum twice a year, and any time you change your ICP, your pricing, or your product positioning. I run a close rate by band check monthly and a full re-fit every six months for active clients.

Does this work for product-led growth motions?

Yes, with one change. In a PLG motion, in-product behavior replaces some of the intent score. Things like activation events, feature usage, and team size inside the product are stronger signals than site behavior. The grid still works. The grid quadrants just get fed different data.

Should sales be able to override the score?

Yes. Always. The score is a starting point, not a final answer. The override should be a logged event so you can study which overrides paid off and improve the model. A score system that reps cannot override gets ignored within a month.

The honest summary

Lead scoring is not a technology problem. It is an alignment problem. The model is the artifact that proves sales and marketing agree on what a good lead looks like, and the grid is the daily decision tool that turns the agreement into rep behavior.

Build it on two axes. Apply decay to intent. Recalibrate against closed-won twice a year. Test by checking close rate by band. If you do those four things, your scoring model will earn its place in the CRM. If you skip any of them, the model becomes the field nobody looks at, and the rep with the spreadsheet wins.

If you want help running the audit on your current model, see how we work with RevOps teams on CRM and scoring builds or get in touch on the contact page.