Nearly every Marketo instance will eventually have duplicate leads.

They can come from list uploads, form fills, CRM syncs, manual lead creation, and much more.

And since Marketo subscription costs scale with database size, duplicates are a major problem for your budget.

While manual deduplication is time-consuming and impractical, leaving the issue unaddressed will impact analytics, campaign efficiency, and ultimately, your bottom line.

Fortunately, the Marketo API solves this directly by allowing us to perform cost-effective, mass deduplication of leads.

Here’s a step-by-step guide on how it’s done!

(This guide is for those with a foundational understanding of the Marketo API and is designed for Marketo users. If you want to learn more about the Marketo API, check out our API webinar and an extensive API course.)

pink line

1. Extract Duplicates from Marketo Using a List Export

The first step in the deduplication process is to identify and extract the duplicate leads from your Marketo instance.

In Marketo, create a Smart List that identifies duplicate leads based on your chosen criteria (e.g., email address). Once your Smart List is populated, select all the leads and export them to a CSV file. You should have something similar to the screenshot below before you export.

This exported list will serve as your reference for the deduplication process (and provide a backup in case of any issues.)
 

2. Define Your Winner Criteria and Deduplication Rules

Before we start merging leads, it’s crucial to establish clear rules for determining which lead will be the “winner” (the consolidated lead that will remain after merging several duplicates). We must also define rules on how to handle conflicting data.

In our case, we want the oldest lead to be chosen as the winner to preserve the original acquisition date. But at the same time, we want to maintain paid media as the lead source.

Feel free to define your rules and criteria as needed – you may want to maintain the most recent phone number, job title, subscription status, and other characteristics as well.
 

3. Deduplicate Leads Based on the Winner Criteria

By now, we should have an exported list of duplicates and a winner criterion. The next step is to perform mass deduplication through the Marketo API (learn more about accessing the Marketo API here) using the following Python script:

deduplicate:def deduplicate(winner,losers):
    access_token=get_access_token()
    url = f"{BASE_URL}/rest/v1/leads/{winner}/merge.json"
    headers={
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json"}
    params = {
        "leadIds": losers
    }
    response = requests.post(url, params=params, headers=headers)
    return(response.json())

 
This will examine groups of duplicate leads, determine the winner based on your criteria, and merge the leads using the Marketo API.
 

4. Update the Winner Lead with Data from the Losing Leads

After merging, we need to ensure that any data we want to keep from “losing” leads (such as paid media as a source) is consolidated into the single “winner”.

Here’s a Python script that will update the winner based on our deduplication rules:

def update_lead(leadid,field,value):
	access_token=get_access_token()
	url=f"{BASE_URL}/rest/v1/leads.json"
	headers={
    	"Authorization": f"Bearer {access_token}",
    	"content-type": "application/json"
	}
	params={
    	'action': 'updateOnly',
    	'lookupField': 'id',
    	'input':[{
        	'id': leadid,
        	field: value
    	}]}
	response=requests.post(url=url,data=json.dumps(params), headers=headers)
	print(response.json()['result'])

 
The last thing we need to do is run a Python script that will execute all these functions together (prioritizing the oldest lead as the winner and paid media lead source):

df_aux=df[df['Email Address']==email]
    leads=list(df_aux['Id'])
    winner=min(leads)
    leads.remove(winner)
    deduplicate(winner,str(leads)[1:len(str(leads))-1])
    if (df_aux['Lead Source']=='Paid Media').sum()>0:
        update_lead(winner, 'LeadSource', 'Paid Media')

 
And that’s it! Now, all of our duplicates have successfully been merged with data from our deduplication rules preserved for each one.

pink line

By taking advantage of the Marketo API to implement an efficient deduplication process for your Marketo database, you’ll be maintaining good data hygiene, which improves marketing effectiveness and leads to direct cost savings.

And while this guide provides a solid foundation for deduplication, every organization’s needs are unique. You may need to adjust the winner criteria and deduplication rules to best suit your specific requirements.

If you need help implementing this deduplication process or have questions about optimizing your Marketo instance, book a 30-minute chat with us here!

Recommended: