This post was written by: Christina Blake & Ethan Lyon.
There are a lot of great tools in the market for keyword rank tracking – but at Seer we found that we had many use cases with organic rank data that wasn’t being addressed by those tools.
Many of our analyses pull large amounts of data (think 100K, 200K, 500K+ keywords at a time!) for a snapshot of insights – maybe we’re analyzing content gaps or trying to research a new line of business for a client. We don’t need daily rankings for this – we need a massive, one-time data pull.
For some clients, we might have search terms that we want to keep a pulse on, but do we need daily data? We might want to mix up our tracking – priority keywords daily, other keywords to help us understand shifts in the landscape weekly or monthly.
If a client was planning on a migration or a major change in their site, we might even want to switch weekly or monthly keywords to track daily so that we can keep a closer eye on changes during those high-risk time periods.
💡In this post, we’ll walk through how we approached building our own keyword tracking tool from the ground-up.
The keyword tracking tools we were using didn’t give us the flexibility to change tracking frequency – team members had to submit large keyword sets and then closely monitor so we could turn off tracking as soon as data was returned to prevent tracking 500K keywords daily for an analysis that we only needed to run twice a year. If a team member forgot to turn off tracking, we could have some pretty high costs for a very simple mistake.
Let’s say a team member wanted to submit 50,000 search terms for a snapshot analysis, and that the cost of our old tool was ~$400/day for 50K keywords.
Using our old tool, our team member would submit their keywords, then every day they’d check to see when the data had returned – usually within ~3-5 days. Then they’d need to shut off tracking immediately and download the data, or those keywords would continue to incur costs.
What if one of that team member’s clients had a major issue and they needed to jump in to help them out? It would be easy (and understandable) to forget to turn off tracking if that team member had a major analysis and presentation coming up, and then a client fire that they needed to address on top of it. (Christina can vouch for this – she’s been one of those team members who forgot to turn off tracking before).
Using our old tools, those keywords would continue incurring costs to the tune of $400/day until that team member remembered to turn them off or if one of our team members who was monitoring for major overages caught it. Now apply that risk to an entire team. Human errors happen – and something we could do to help our team is taking away the responsibility of remembering to turn off keywords.
When analyzing snapshot (one-time rankings) keywords from our old tool, we found that we were tracking (and paying for) 4-5x the rankings we actually needed, mainly due to the fact that data sometimes took several days to return and sometimes team members didn’t turn off tracking immediately. Approximately 80% of those rankings were redundant.
|Year||Count of snapshot keywords||Count of deduplicated snapshot keywords||% of waste|
With our own keyword tracking tool, that same team member submits their 50K search terms using the “one-time” frequency and that’s it. When their data is available, they can use it – no extra steps needed.
We’re also able to submit more keywords by partnering with Traject Data than ever before – we went from 26M rankings in 2019 to 61M rankings in 2020, the year we launched our internal rank tracking tool.
The day this blog post was written (2/10/22) we had 4.6 billion rows of keyword data in our rank tracking data lake.
Of all of the processed rankings with Traject, we see the following frequencies:
|Frequency||Count of tracked keywords||% to total|
That’s over 80% of all total search terms tracked at a one-time frequency in the past 2 years! Imagine all of the time saved and cost overages prevented by giving users the flexibility to track at different frequencies.
When we were still using other rank tracking tools, we’d have to export the data if we wanted to customize anything outside of their software. Even if we built templates in visualization tools like Power BI or Google Data Studio to speed up build time, those templates would expect very specific inputs – if a tool changed the name of a column or if a team member following a list of steps to export data missed a step, it could cause errors and confusion.
For example, a team member performing an analysis might follow 10-20 instructions for exporting the data a specific way – maybe it calls for a specific report or filtering the data a certain way before exporting. The team member exports their CSV to their computer, opens up the template, and selects their CSV as a source.
All of the sudden, everything breaks and they are hit with a glaring error message – “The column “Landing Page” wasn’t found.” All of the other data sources in the analysis fail as well – due to the failure from the missing “Landing Page” column.
The team member isn’t sure what went wrong – after all, they followed all of the steps! They jump into a chat and ask some other team members for help. One team member asks for a screenshot of the error message. Another team member says they can jump on a video chat to help troubleshoot.
Finally, someone who ran into this problem before asks our original team member to open up the CSV they downloaded. The “Landing Page” column isn’t in the CSV – it has been replaced by the field name “URL”.
“Sometimes this tool just changes column names and we don’t find out until it breaks something – I’ll tap the person who owns the template and instructions to make an update” our more seasoned team member says.
Problem solved, but this process took several team members maybe 30 minutes to solve it. Our original team member didn’t want to waste anyone’s time – they might have spent an hour trying to fix it themselves before they even asked for help.
By controlling our data we ensure that changes like that don’t happen to team members. Even if a vendor makes a change, we can “hide” those types of changes from the team in our transformation layer – maybe renaming the “URL” column back to “Landing Page” in a cleaning step before that data gets into the hands of our team.
Keeping our data in our warehouse also gives us the opportunity to reuse data. We might have multiple products that have a data source in common – if those could use the same data, we could create more value without increasing costs.
We can also multiply those savings through creating queues to microservices that deduplicate and cache data – decreasing costs and increasing turnaround times for team members to get data.
Our data can flow into scalable dashboards and reports for insights that every SEO team member wants to know (like “how has my rank for priority keywords changed week over week?” or “how did our rank improve after implementing that content audit?”) but it can also be used in custom analyses by our analytics teams where we join in a client’s paid search or CRM data.
After only a few months of launching our internal rank tracking tool we started to hit data size limitations in our visualization tool, and with our data growing exponentially we had to move to a data platform that could query petabytes vs gigabytes. We were able to give ourselves some runway by implementing incremental refreshes or by filtering out data that wasn’t a must-have in each dashboard (which caused additional time for building each data product).
At the end of 2021, we migrated our data products (including rank tracking data) from one data platform to our own web application powered by Looker’s embedded analytics.
Because the data was in our warehouse, we were able to transform the data and rebuild using best practices for our new data platform. We kept our old platform running until our new platform was ready to launch – something that might not have been possible without the ability to use the same data in multiple applications.
A major value of creating our own rank tracking tool is the ability to build robust security into the system, not only by keeping our data safe in our warehouse but by using permissions to create a better experience for our team members.
By joining our client’s organic ranking data with data from our CRM (like “which team member is assigned to what client”) we can use permissioning to ensure that client marketing data is only visible to team members working on that client. For a Seer team member, when they open up one of our data products, they only see their clients’ data, making navigating through products easier.
This also gives us opportunities to anonymize data and combine it for industry-level trends and insights, without relying on manual methods more prone to user-error.
Having our own rank tracking tool means that all of that data flows into our data warehouse, where we can direct it many different ways for our team to consume it – we can democratize our data by enabling our team members to make data-informed decisions and feel confident about data, regardless of their technical expertise.
Remember our team member with the CSV issue?
They also had that data stored locally in a CSV on their computer. If another team member wanted to build a different analysis with the same data, they’d have to go through those steps all over again or ask the first team member to email them their CSV. All of that work and data is decentralized.
Storing data in our warehouse gives us the flexibility to grant team members access to the same centralized data in different formats. Data products that scale to the entire team might use data lakes, but we also create bite-sized tables and curated views that could be visualized in Power BI, Google Data Studio, Tableau, or any other tool a team member might want to use – we don’t force them into specific tool but encourage team members to use whatever tools they are confident using.
This also cuts down on training and build time – for reports that every team member regardless of experience or role should have access to, we’ll create a dashboard that they just click to open. They don’t need to download the data, clean and transform it, and then build visualizations on top of it. They just login and open the dashboard and voila — it’s there!
For team members who build custom dashboards as part of their role, we’ll provide them with structured data that can help them quickly build the foundation of their analysis, and then they use their skills to customize it.
In 2019, our team obtained 24M rankings at a daily frequency. We tracked 12M snapshot rankings, but ~80% were redundant (only 2.4M unique rankings).
In the first half of 2020, we obtained 14M rankings at a daily frequency and another 5M snapshot rankings (again, ~80% of snapshot rankings were redundant – we only needed 1M). After we migrated to our rank tracking tool in July 2020, we processed 46M total rankings across multiple frequencies.
In 2021, we returned 56M rankings. 76% of total rankings were run on a one-time frequency. Only 9% of those keywords were tracked daily.
We’re only a few weeks into 2022 and we’ve returned 4.6M rankings so far, with 65% of rankings tracked at a one-time frequency. We can process up to 1M keywords per day, and that data flows into multiple products and tools across our centralized platform.
And not a single team member is doing daily checks to see if their data was returned so that they can turn off tracking.
If you’re a natural-born consultant who loves solving problems at scale using big data — explore Seer Careers & apply now (we’re hiring folks just like you!)
Source: www.seerinteractive.com, originally published on 2022-02-16 14:34:19