Quantitative Methods

Question 1

How do word embeddings such as Word2Vec differ from TF-IDF in representing words for natural language processing?

Accepted Answer

Word embeddings represent words as dense vectors that capture semantic similarity; TF-IDF is a statistical weighting scheme reflecting term importance within and across documents but ignoring semantic relationships

Answer

Word embeddings and TF-IDF are functionally equivalent for financial text because financial documents use a specialised vocabulary where all terms are equally weighted

Answer

Word embeddings assign a numerical score from 0 to 1 to each word based on its domain-specific importance; TF-IDF produces multi-dimensional vectors that encode positional information

Answer

Word embeddings weight terms by their frequency in a document relative to their frequency across all documents; TF-IDF requires neural network training on large corpora to produce vector representations

Question 2

A researcher has a macroeconomic time series that appears to trend upward visually but the ADF test with a constant and trend specification fails to reject the unit root null (p = 0.12). An alternative ADF test on the first differences rejects the null of a unit root in the differences (p = 0.01). Based on these results, should the researcher detrend or first-difference the original series, and why?

Accepted Answer

The researcher should first-difference: the ADF results indicate the series is I(1) (unit root in levels but not in first differences); a stochastic trend is present, and first differencing removes it; detrending would be incorrect because detrending only removes deterministic trends, and the series has a stochastic trend

Answer

The ADF test results are contradictory and no transformation should be applied until further tests, such as the KPSS test, confirm the order of integration

Answer

The researcher should detrend and then first-difference in sequence: detrending removes the visible linear trend while first differencing removes any remaining stochastic non-stationarity identified by the ADF test

Answer

The researcher should apply a log transformation rather than first differencing; log transformation stabilises an exponential trend and renders the series stationary without requiring differencing

Question 3

In the context of alternative data, what does 'point-in-time data alignment' mean, and why is it essential for backtesting?

Accepted Answer

Point-in-time alignment ensures data is used in a backtest only as of the date it was actually available to investors, preventing the use of information not yet known at the time of the simulated decision

Answer

Point-in-time alignment normalises alternative data to a common unit of measurement to allow direct comparison across different data vendors and geographies

Answer

Point-in-time alignment synchronises all alternative data feeds to a single calendar date, eliminating differences in reporting frequency across data vendors

Answer

Point-in-time alignment adjusts historical alternative data for survivorship bias by re-including companies that subsequently failed into the historical dataset

What’s in it.

Multiple Regression: Advanced Topics

Time-Series Analysis

Machine Learning Methods

Big Data and Investment Applications

Simulation Methods

Panel Data Regression

Logistic Regression and Classification

Backtesting and Model Evaluation

Sample questions