RevOps GTM Engineering Pipeline systems

I translate Go-To-Market strategy into reliable systems that teams actually run.

RevOps and GTM engineer building the high-leverage systems revenue teams depend on.

01 Selected Projects
CASE_01 · pipeline risk scoring

Deal Triage

B2B SaaS pipeline risk scoring — heuristic signals + Claude analysis for AEs and RevOps managers.

Stack
Python · Streamlit · pandas · Anthropic API
Demo
100 synthetic deals · scoring view requires no API key
My role
Designed & directed the build
Two-layer scoring architecture cheap heuristics rank everything → model explains only the top 10
LAYER 1 · DETERMINISTIC · $0 API LAYER 2 · CLAUDE · TOP 10 ONLY Stage velocity vs team's own history Activity recency last meaningful touch Close-date pressure slip risk vs quarter 0–100 score per deal RANK ALL 100 ↑ top 10 at-risk Claude analysis tiered by confidence Strong signal full memo + verbatim transcript quotes Thin signal short brief — says what's missing, invents nothing BENCHMARKS COMPUTED FROM THE TEAM'S OWN PIPELINE — NOT HARDCODED · PROMPT LIVES IN EDITABLE MARKDOWN

The problem

Deals don’t die suddenly; they slip quietly, and the usual review tools all look the wrong way. Spreadsheet pipeline reviews are slow and only as good as the rep’s memory. Dashboards report lagging indicators — by the time a number turns red, the deal’s already gone. And the early signals that actually predict slippage — a close date that keeps drifting, activity going stale, a buried hesitation in a discovery call — live scattered across CRM fields and call transcripts no one reads in aggregate. So risk surfaces at forecast time — too late to act on.

The decision

Split the work into two layers that play to different strengths: cheap deterministic heuristics to rank every open deal, and expensive LLM reasoning to explain only the ones that matter. A composite 0–100 score from three signals — stage velocity, activity recency, close-date pressure — runs over the entire pipeline at zero API cost and surfaces the top 10 at-risk deals. Only then does Claude read those deals’ fields and transcripts to produce the analysis. It’s a cost-and-trust decision, not just an architecture one: running an LLM over every deal is expensive, slow, and buries the signal in noise. Heuristics-to-rank, LLM-to-explain keeps spend proportional to value and keeps the cheap layer auditable.

The stage-velocity benchmark is computed from the team’s own pipeline history, not hardcoded thresholds. “Slow” for a deal in Negotiation means slow relative to how this team’s Negotiation-stage deals actually move — so the tool adapts to a sales motion instead of imposing a generic one. That’s the difference between a score a rep argues with and a score a rep trusts.

What I built

A CRM export in → every open deal scored on the three dimensions (Closed Won/Lost filtered out first, since only open pipeline is rankable) → top 10 surfaced → Claude analysis tiered by confidence, so the depth of the output matches the strength of the evidence. A deal with a strong score and a transcript gets a full memo with verbatim call quotes; a thin deal gets a focused two-to-three-sentence brief and one concrete next action — never false confidence dressed up as a full memo. Around that: one-click follow-up email drafts grounded in the analysis, a thumbs up/down feedback capture for prompt tuning, an enriched CSV export, and an in-app methodology tab so the scoring model is transparent to the reps using it. The analysis prompt lives in an editable markdown file — the logic is legible and tunable, not buried in code.

Where it’s going

The roadmap is where this stops being a tool and becomes a system. v2 replaces the manual CSV with an n8n workflow that pulls open opportunities from HubSpot on a schedule and posts a top-risk digest to Slack — the insight goes to where the team already works instead of waiting to be opened. v3 is a dbt project modeling the deal-signal history — stage velocity, activity patterns, win/loss by segment — so the heuristic weights can eventually be tuned against real outcomes rather than my rules of thumb. The current tool is the v1 that earns the right to build v2 and v3.

Numbers describe the bundled demo, not customer outcomes — it runs on 100 synthetic deals. The scoring view loads with no API key; the Claude analysis layer requires one. On this sample, deep analysis routes to only the top 10 — keeping LLM spend proportional to deal risk.

Roadmap

v2 n8n pulls HubSpot deals on a schedule → Slack risk digest.
v3 dbt model tunes scoring weights against real closed-won / lost outcomes.
deal-memo.prompt.md EDITABLE · OWNED BY GTM
# Deal memo prompt — runs only on the top-10 at-risk deals

You are reviewing a single at-risk deal. You have the deal record,
the activity timeline, and the call transcripts.

## When signal is STRONG  # clear risk in the transcripts
Write a full memo:  - one-line risk thesis
  - 2–3 verbatim quotes from the transcript, attributed + dated
  - the single play the rep should run next

## When signal is THIN
Three lines, no quotes. Say plainly what's missing to make a call.
Don't invent risk that isn't in the data.
CASE_02 · inbound lead scoring

Lead Triage

Scores inbound leads against a plain-language ICP rubric the GTM team owns — then reads its own output to recommend tighter targeting.

Stack
TypeScript · Next.js · Anthropic API
Runs
Local-only · not hosted
Rubric
Markdown · in version control
Scoring flow with a feedback loop output feeds back into who you prospect next
closes the loop back to targeting Inbound lead from prospecting + targeting filters Claude scores vs rubric.md — plain-language ICP, no code change to retune TIERED OUTPUT · SCORE · CONFIDENCE · RATIONALE Tier 1 — pursue now Tier 2 — nurture Tier 3 — watch DQ — first-class tier 2nd Claude pass reads the full scored list → "tighten these filters" MATCHED SIGNALS + DISQUALIFIERS RETURNED PER LEAD · DQ IS A DECISION, NOT A LEFTOVER

The problem

SDR time is the scarcest resource at the top of the funnel, but lead lists from Apollo, ZoomInfo, or a CRM export arrive undifferentiated — the ideal-fit accounts and the no-hopes sit in the same queue, worked in roughly the order they land. The two usual fixes both have a catch. Keyword and rules-based scoring is brittle: it can’t tell that “Head of People at a Series B SaaS” carries context about company stage, HR maturity, and budget authority that no title-match will surface. A trained predictive model needs clean historical conversion data most teams don’t have, and it’s a black box the GTM team can’t read or edit. So in practice, most lists don’t get triaged at all.

The decision

Score each lead with an LLM reading a plain-language ICP rubric, rather than with rules or a trained model. That buys two things and costs one. It buys semantic judgment on incomplete data — the model reads the same between-the-lines context an experienced rep would. And, the part that matters most operationally, it puts the scoring logic in a single markdown file that anyone on the GTM team can read and edit, in version control, with no code change and no redeploy. The rubric a human edits is the exact artifact the model scores against — the people who own the ICP own the scoring. What it costs is per-lead API spend and some non-determinism, versus the zero-marginal-cost determinism of rules. I treated that as a budget to manage, not a dealbreaker: model selection is a one-line swap — a cheap, fast model for high-volume pre-screening, a high-reasoning model reserved for short lists where a single misclassification is expensive.

What I built

A CSV in from any list source → core fields auto-detected, everything else passed through as additional signal so there’s no pre-cleaning step → every lead scored in parallel against the rubric → tier (1 / 2 / 3 / DQ), a 0–100 score, a confidence flag tied to data completeness, a one-line rationale, matched signals, and disqualifiers → filter by tier and export the enriched CSV. Two design choices I’d call out. DQ is a first-class tier, not a leftover bucket: knowing who not to pursue protects SDR time as much as knowing who to chase. And I deliberately left CRM write-back out — auth and field mapping are too org-specific to bake into a general tool, so the output is structured to upsert on email through any CRM API or an n8n flow. Knowing where the tool should stop is part of the design.

Closing the loop

A second Claude pass reads the whole scored list and recommends how to tighten the upstream prospecting filters — which segments are landing as Tier 1, which search criteria are dragging in DQs. The sorted list isn’t the deliverable; a tighter top-of-funnel is. That closes the loop back to targeting instead of stopping at a clean export.

icp-rubric.md GTM-OWNED
# ICP rubric — edited by GTM in git

## Tier 1 — pursue now
- 50–500 employees, Series B–D
- Named RevOps / Rev leader on team
- Trigger in last 90d (funding, VP Sales)

## Disqualifiers  # DQ is a tier
- Sub-10 employees / pre-seed
- No sales motion (pure PLG)
- Agency or reseller, not end customer
02 About
Julian Ross

I build for the rep because I've spent years working alongside them. I came up through the GTM front lines — SDR, then leading an SDR team — and spent much of that time on the work between the lines: deep org research on key Fortune 100 accounts, and partnering with AEs and sales leaders to target and strategize new business and expansion. I've worked a queue that was never triaged, and watched good opportunities stall because the right signals never surfaced in time. That's the lens I bring to ops.

Over the past few years I've moved into operations full-time, with a heavy bet on agentic AI and automation to take friction out of the stack. How I work shows up the same way across the projects here. I keep the logic that matters — what makes a lead worth pursuing, what makes a deal look risky — in plain language a GTM team can read and edit, not buried in code only an engineer can touch. I match the depth of an answer to the strength of the evidence behind it, rather than dressing up a guess as a confident call. And I treat knowing what not to pursue as worth as much as knowing what to chase. Those aren't features I bolted on; they're how I think about the work.

So: a revenue operations manager who thinks like a GTM engineer. I figure out what to build and in what order, and build it myself when that's the fastest path to an answer. I work across the modern RevOps stack — Python, TypeScript, the Anthropic API, n8n, HubSpot, SQL, dbt — but the tooling is downstream of the judgment. What I'm good at is finding where the leverage is in a go-to-market system, and building something a team will genuinely use.

↓ Download résumé (PDF)

03 · contact

Reach out to discuss your RevOps strategy and uncover ways to improve your operating cadence.

// select consulting engagements considered