Apache Spark in Financial Modeling at BlackRock


Andrew Rothstein

Andrew Rothstein, Managing Director, is a member of BlackRock Solutions’ Financial Modeling Group currently leading the Advanced Data Analytics team. Mr. Rothstein’s primary responsibilities include managing a team of quantitative analysts and software engineers practicing data science in the pursuit of higher quality more predictive residential mortgage prepayment and credit models. Prior to focusing on mortgage modeling he has held many positions within the various teams that comprise the Financial Modeling Group. Since starting at BlackRock as an analyst in 2001, he has made significant contributions to BlackRock’s core fixed income risk management platform in a variety of areas including software infrastructure, inflation risk modeling, and corporate credit risk modeling.

Forest Fang

Forest Fang, Analyst, joined the Financial Modeling Group (FMG) within BlackRock Solutions in 2013. He was a founding member of the Advanced Data Analytics team within the FMG. His primary area of focus is in solving BlackRock’s clients’ large scale data science problems. He has developed many big data analysis and visualization tools. He is passionate about the underlying computing challenges inherent to visualization of large datasets. Prior to joining BlackRock Forest earned a Bachelor of Arts degree from Cornell University double majoring in Mathematics and Computer Science

David Durst

David Durst, Analyst, is a member of the Advanced Data Analytics team in BlackRock Solutions’ Financial Modeling Group. Mr. Durst currently works on developing user friendly tools for understanding and modeling large datasets. In particular, he focuses on creating intuitive, interactive visualizations.


Andrew Rothstein and Forest Fang of the Financial Group Modeling group within BlackRock Solutions and David Durst, a member of the Advanced Data Analytics team at BlackRock presented an overview on how their groups leverage Apache Spark to explore and better understand the financial and economic behaviors of debtors through data.

They walked through several use-cases on how we use Spark and D3 to visualize a large loan-level mortgage dataset, extract distributions and cluster boundaries in order to draw meaningful insight. They also talked about their use of K-Means clustering to reveal similar borrower groups and corresponding discriminant attributes. In addition, the talk also featured insight into how they used several sbt plugins to streamline the building, deploying and running of Spark analyses.

This event was particularly useful for participants looking to understand how large data sets can be leveraged for financial modeling while using big data technology that allows you to increase speed, hide complexity, and provide easily decipherable and appealing result sets while continuously updating knowledge of machine and human learning concepts.