Define a problem well and you will be pointed to its solution.
Click on a case study title below to jump to the study.
|Wells Fargo: Why finding 30,000 candidates was easier than finding three||
|GTT: Validating addresses for free with US Census web site||
|GTT: How many traffic lights in every town in America?||
Wells Fargo Home Mortgage (WFHM) contracted my services to help them identify prospective candidates for mortgage loan officer positions across the country. The criteria were usually very specific and involved foreign language fluency, specialization in a loan type or construction, and geographic proximity to the office for which WFHM was recruiting. They always wanted the best performing loan officers they could get and they always wanted at least three candidates to consider. Experiencing the difficulty of this task, I realized why they contracted a professional researcher.
The situation I was in, I realized, was the classic question of short-term versus long-term gains, whether to spend my time finding three good candidates for individual positions or (potentially) to capture the entire population of U.S. mortgage loan officers in a single searchable database.
Ideally, a long-term gain approach would:
- Identify every U.S. mortgage loan officer in the United States.
- Capture all available information on skills and sales awards.
- Capture information quickly, inexpensively, and with as little data cleansing as possible.
- Make my results easily searchable by my client's recruiters.
Using web scraping software, I automated searches of mortgage loan officer web sites, running searches by ZIP code using a table of all U.S. ZIP codes. Out of professional courtesy (but now a requirement of U.S. law), I minimized the demand on the web site's server by running searches only during overnight hours and spacing each search by one minute. During the day, I cleaned up the data with VBA and often by hand, populating a growing table of prospective candidates. I published each batch of results as soon as it was ready so that my recruiters could benefit from them.
For the interface I created CANINE: Candidate Identification & News Database. The news part was a feature that employed my competitive intelligence skills and would come later. CANINE was a distributed, split MS Access-only client/server application because a SQL Server database was not available.
With the user interface, recruiters can:
- Specify a radius (miles) around the ZIP code of the office for which they were hiring and pull every candidate in that radius.
- Filter their searches by foreign language skills, loan specialization, and sales awards.
- Quickly see if any candidate was being or had been considered for employment. This feature tied into an HR database.
With CANINE, recruiters can spend more time recruiting and almost no time researching. For me, I was able to move into creating the competitive intelligence part of the tool, which fed the recruiters talking points about the prospective candidate's current company.
After my assignment was completed, I was contracted periodically to repeat my nationwide research and replace the candidate table with updated information. At its largest, the database recorded over 30,000 mortgage loan officers around the United States.
Click the graphic at the top of this case study to see a slideshow about CANINE.
After a trade show, my client would have 100+ names and addresses of potential government customers. First hand-written and then typed into a spreadsheet, the results needed to be verified and deduped before being uploaded to the CRM database. For-fee services could be used, as could the US Postal Services lookup API --- but the first would be too expensive for small counts and the second is limited by terms- of-use, user account, and number of searches.
There was no budget for doing this and no administrative employees who could do the address verification and data clean-up manually. Since part of my contracting work was deduping their CRM system, I would work against myself if I did not find an effective way of verifying and cleaning up this prospect information.
My solution would therefore need to be:
- Easy to use & fast
- Be free or low-cost
The US Census Bureau website has a no-cost feature that lets you upload a file of addresses (no names) along with whatever identification number you wish to include. It then does fuzzy searches on each address against its database of every household and business in the United States. It then returns the file with the following information appended to each record:
- The results of each matching attempt: EXACT, NO_MATCH, or NON_EXACT.
- The "official" address the Bureau matched to each of your addresses (if EXACT or NON_EXACT).
- Geocoding info: census tract & block, county, and latitude/longitude.
Some of this information is useful for other purposes. For my verification purposes, the "EXACT" designation told me my address was clean.
The Census data was reliable because the Census Bureau tracks individual homes and offices. Although the Census is performed only every 10 years, there are regular updates of address information as the Bureau makes estimates and prepares for the next decennial Census. Post office boxes, however, are not tracked and will therefore always be unverifiable with this method.
In some recent work for another client, I noted that in one batch of 1,000 residential addresses, the feature was able to verify all but 202 --- 39 of which were post office boxes or blank addresses (which are always unidentifiable). Omitting these obvious exceptions, the Census Bureau bulk lookup had an 83% match rate at no cost. If this were applied to very large customer databases, there could be a significant cost savings over for-fee address correction services. The only downside is that you can only upload 1,000 addresses at a time. However, there are no limits on how often you can do that.
Click the graphic at the top of this case study to see a slideshow on how to use this Census Bureau feature.
GTT sells traffic control equipment, software and services to municipalities of all sizes in the United States and abroad. For both strategic and tactical sales planning, they needed to know how many traffic-light-regulated intersections every U.S. town and city had (as opposed to simply counting all intersections, which is comparatively easy, I hear). Surprisingly, many municipalities (even large ones) do not have a comprehensive table of such intersections. Those that do often consider it sensitive information and will not share it. As part of my contracting and consulting services to GTT, I was asked to consider the problem.
My first idea was to investigate how much might be achieved with GoogleMaps, which has a feature designed to plan and quickly "drive" a virtual tour through a town. I could then count all the traffic-light-regulated intersections along the way. Following my principal of "failing quickly," I chose Boston as my test case as it was a mid-size city (#21 in U.S. for population in 2017) whose road network did not use a grid system and so would present a significant challenge. Boston required two hours to plan and execute my "driving survey." Scaled up for the almost 30,000 towns and cities in the U.S., this approach was cost prohibitive.
I therefore began to consider what might serve as an analog --- something I could measure that was analogous to the thing that I wanted to measure.
To identify an analog, I thought about why a traffic light would exist at an intersection as opposed to a stop sign. The reason is a high volume of traffic in a small period of time. From this I deduced two things:
- Population density (count of people per square mile) could be used to estimate relative traffic density. An East Coast city with high population density would have a much higher vehicle traffic density per square mile than a sparsely populated Midwestern town at the same time of day.
- Land area (square miles) could be used to estimate the magnitude for vehicle traffic density. Two cities might have the same number of vehicles per square mile at the same time of day, but the city that is larger in size would have a greater total number of such vehicles.
Of course, some sparsely populated towns do have traffic lights only because of commuters, visitors, and business vehicles. Then there are towns like Carmel, Indiana that are populated enough to have needed traffic lights, but eliminated them. For the most part, though, one can assume a town larger in both population density and land area will have more traffic-light-regulated intersections than a town smaller in both aspects. Based on that assumption, the two metrics can be used in an algorithm to rank towns and cities in relative terms. Using information on traffic-light-regulated intersections that my client did have, the algorithm could be tested and refined. That was my recommendation to the client.
That was the approach my client chose to use to rank the 30,000 municipalities in the United States for purposes of marketing and sales efforts. See my case study on how I identified those 30,000 municipalities and obtained their population and area information.