Pomegra Wiki

Alternative Data Sources Used by Quant Funds

Quantitative funds use alternative data sources—satellite imagery, credit-card transaction volumes, mobile phone geolocation, web scraping, supply-chain records—to construct trading signals that traditional financial statements and news flow miss. These datasets compress months of business intelligence into days or hours, and the edge comes from speed and uniqueness rather than the raw information itself.

How quants monetize alternative data sources

A quant fund builds an edge by reducing information lag. Traditional hedge funds read 10-K filings, wait for earnings calls, and react to news—by which time the market has already repriced. A quant with satellite data of shipping containers at a logistics hub can infer Q3 sales volume before the company files its income statement. A credit-card processor feeding real-time transaction counts can signal consumer health weeks before consumer-price-index data lands.

The workflow is mechanical: data arrives, a machine-learning model scores it, and if the signal is statistically meaningful, the fund trades. The profit pool is shallow—dozens of quants are mining the same datasets—so speed and cost discipline matter. A fund that licenses expensive proprietary data but reacts three days late will underperform a bare-bones operation that trades the next morning.

Common alternative data sources

Satellite imagery tracks physical asset utilization: container volume at ports, ore piles at mining sites, vehicles in retailer parking lots. Computer-vision models measure the footprint in each image and trend the sequence over weeks. A fund betting on an industrial-equipment maker can watch its biggest customer’s factory utilization in real time. Oil traders monitor tank levels at refineries. Real-estate investors track construction progress from orbit.

Credit-card and payment flows are sourced from processors like Mastercard and Visa, or aggregated from smaller point-of-sale providers. The quant sees spending by merchant category, geography, and time lag. If credit-card spend in restaurants sinks during a week in July (vs. the prior three Julys), it signals economic weakness before jobless claims arrive. Conversely, a spike in hardware-store spending flags housing starts may be coming.

Mobile geolocation comes from apps that ask permission to share location. Aggregators (Foursquare, Cuebiq, Gravy) sell heatmaps of foot traffic by store, state, or mall. A pharmacy chain that sees declining store visits for months in a row may face revenue headwinds. A discount retailer seeing traffic growth in recession-hit counties signals relative resilience.

Web scraping harvests pricing, product reviews, inventory levels, and search trends from e-commerce sites. A quant fund monitoring an online retailer’s price-list HTML can detect competitive undercutting, product launches, and out-of-stock signals without waiting for earnings commentary. Some funds track website traffic (via tools like Alexa or Similar Web) to infer customer acquisition costs or user engagement.

Supply-chain and logistics data include freight volumes, shipping rates, port activity, and trucking movements. A fund short a shipping company can sense demand collapse within days of satellite imagery showing cranes idle at container terminals. Insurance claims data, shipping manifests, and rail-car positioning systems all feed this intelligence.

Job postings and hiring data signal management’s growth expectations. A software company listing 200 engineer roles in Q2 (vs. 30 historically) implies either rapid scaling or high churn—both tradeable signals. Indeed and LinkedIn hiring volume by company and title is scraped by firms like Thinknum and Burning Glass.

FDA and regulatory filings are parsed in real time. Drug approvals, clinical-trial delays, and enforcement actions hit before press releases. Patent applications and USPTO office actions are similarly mined for early signals about tech and pharma pipelines.

Why alternative data fails (or gets overused)

Not all alternative data is profitable. A common pitfall is overfitting: the signal looked great on historical backtests because the model learned quirks of the past, not true economic relationships. Real-time data is messier than historical databases; a satellite image is cloudy or the camera angle changes. Credit-card data lags by days in many cases, so “real-time” advantage is illusory.

Survivorship bias inflates historical returns: funds that bought expensive alternative data and lost money have closed. The survivors—those promoting their edge via podcasts and investor pitches—are the lucky or skillful few. A fund that joined the satellite-imagery bandwagon in 2019 and saw diminishing returns by 2021 (because everyone had the data by then) is less visible than a survivor touting 15% annual outperformance.

Privacy and regulatory risk are rising. The GDPR limits how much personal geolocation data can be sold in Europe. SEC and FINRA have begun scrutinizing market-manipulation allegations tied to coordinated scraping or unusual data sources. A fund whose edge relies on web-scraped data that breaks a retailer’s terms of service may face legal pressure.

Data quality is inconsistent. A satellite vendor’s imagery covers a region only every five days; a credit-card aggregator may miss fringe processors. As more funds pile into the same alternative datasets, the signal degrades—it becomes priced in immediately, and alpha evaporates.

Cost vs. expected return

A single alternative dataset rarely costs less than $10,000–$50,000 per month; premium or proprietary sources run $200,000–$500,000 monthly. A mid-sized hedge fund with $500 million under management might spend $2–$5 million per year on alternative data licenses—a significant technology cost. The bet is that the edge it generates will pay for itself and produce excess returns. A fund trading S&P 500 constituents on satellite imagery of quarterly container volumes needs to earn enough alpha to cover licensing, compute, researcher salaries, and trading costs.

This forces discipline: funds that license every trendy dataset go broke. Winners are selective, often developing proprietary pipelines (their own scrapers, custom satellite orders) or focusing on narrow sectors (industrial metals, aerospace supply chains) where the data truly moves prices in advance of consensus.

See also

Wider context