Anda di halaman 1dari 15

BlackRock

Data Science

Teams Work
Custodia
n

Aladdin
NAV

Expense
s

Expense

Reclaims

Reclaims

Asset
s

Asset
s

Cash

Cash

Problem Statement
When NAV or any of its breakdown do not match, there
is a Exception
Exception are shown on Exception Monitor
A person used to manually classify Exceptions for 1
year.
Exceptions are generated every day and they are not
classified automatically.

Problem Statement Classify Exceptions


Data NAV and its breakdown of Portfolio [1 year data]

Data Extraction
Data from table port_group
Portfolio and its aggregated sleeves [portfolio_code]

Data from table port_Nav


Portfolio NAV and its breakdown

Data from Excpetion Monitor


Java code for combining all data

Data Cleaning
Removing Outliers
Removed ambiguity [ Accured Expenses, Accrued
Expenses, Accrued Expense -- All are same]
Levels of Comments : 97
Reduced it to 30 levels

Data
Portfolio_name
Portfolio_code
Portfolio_group
Company Name
P. NAV [continuous]
Q. NAV [continuous]
P. Exclaims [continuous]
P. Reclaims [continuous]

P. Asset [continuous]
P. Cash [continuous]
Q. Exclaims
[continuous]
Q. Reclaims
[continuous]
Q. Asset
[continuous]
Q. Cash [continuous]
Start time
End time
Status [Class]

Feature Selected
P. NAV [continuous]
Q. NAV [continuous]
P. Exclaims [continuous]
P. Reclaims [continuous]
P. Asset [continuous]
P. Cash [continuous]
Q. Exclaims [continuous]
Q. Reclaims [continuous]
Q. Asset [continuous]
Q. Cash [continuous]

Features Extracted

f1 - Nav Diff (BPS)


[(NAV1-Nav2)/NAV1] *1000
Continous
f2 f5 Expense Diff (BPS] [(Expense1-Expense2)/NAV1]*1000
Continous
f6 - f7 Diff(Asset-Cash) Diff(Exclaim-Reclaim)
Continous
f8 f11 Is Expense/Rec/Asset/Cash from Aladdin?
Binary
f12 f15 Is Expense/Rec/Asset/Cash from Custodian?
Binary
f16 f19 % Contribution of Expense in Total NAV
Continous
f20 f24 Impact of Expense?
Binary
f25 - NAV diff >10?
f26 - NAV diff >20?
Normalization on f1 f7

Model Used
Random Forest
Accuracy - 74% on testing data

BlackRock
Software Development

Problem Statement
Given 2 data set
Compute difference between them.
Generate a report (.csv)
Useful for Business people
Difference between 2 .csv file

Make shell script for regression testing in local


environment
Useful for Tech people
See the outage of code before going into tst environment
[test]

Product features
Provide diff of 40 lakhs entries DataSet in 5 minutes
Can define Ignore column
Ignore time column [ Will be different in data from sql query]

Can define Match ruleset


Difference between src_a [Net_Asset] src_b [Net_Asset] <
0.01
Make it Match

Can rename Column


Match col1 from src_a to col2 from src_b

Code Input
Path of Src_a
Path of Src_b
Key_column : Uniquely Identify the row [ ex:
portfolio_code, transactional_id]
Ignore_column : creation time
Integer_column
Match set : src_a [ column_name] operator src_b
[ column_name] operator value
Rename_column : src_a column and src_b column

Code Output
Sources

Hyperlink

Difference integer

Regression Testing
Shell Script
Run
Code
[Befor
e]

Run
Code
[After
]

Save
DB
(.csv)

Save
DB
(.csv)

Code : Make changes in


DB
Report : Expected
Changes
Actual Difference

Repor
t
DiffToo
l
(.java)

Anda mungkin juga menyukai