HR Analytics Framework Development: Turnover Prediction Models Using Workday Prism and Python Integration
Employee turnover is one of the most measurable and most preventable sources of cost in an enterprise workforce. The data to predict it exists inside your Workday environment already. The challenge is not collecting that data – it is structuring it correctly, extracting it in a usable form, and applying the right modelling approach to generate predictions that HR and business leaders can actually act on. This post walks through how to build a practical turnover prediction framework using Workday Prism as the data layer and Python as the modelling environment, including what the integration looks like at a technical level, which features drive predictive accuracy, and how to operationalise the output so it informs real decisions rather than sitting in a dashboard nobody checks.
Why Turnover Prediction Deserves a Proper Analytical Framework
Gut-feel approaches to attrition risk have a poor track record. Managers identify flight risks based on visible behavioural signals – quieter in meetings, fewer interactions with the team – by which point the employee has already mentally moved on. A data-driven prediction model can surface risk signals weeks or months earlier, at a point when intervention is still viable.
The business case for this investment is well established. According to the Society for Human Resource Management (SHRM), the average cost of replacing an employee ranges from 50 to 200 percent of that employee’s annual salary, depending on seniority and role complexity. For a mid-sized enterprise with 5,000 employees and a 15 percent annual turnover rate, even a modest reduction in attrition through earlier intervention produces measurable financial return.
Workday is now used by over 10,500 organisations globally, according to Workday’s own investor reporting, and a significant proportion of those organisations are sitting on years of HCM data that has never been used for predictive analytics. Prism Analytics, Workday’s native data management and analytics layer, changed the accessibility of that data considerably by enabling blending of Workday data with external sources and exposing it via structured datasets. Combined with Python’s machine learning ecosystem, the two tools together make a turnover prediction capability achievable without a data science team of ten people and a six-month build timeline.
Understanding Workday Prism as the Data Foundation
Before building any model, you need a reliable, structured data source. Workday Prism is the right layer for this because it sits inside the Workday security model, respects domain-level access controls, and allows you to blend Workday HCM data with external datasets without extracting raw data to an uncontrolled environment.
Prism works by creating datasets – structured tables built from Workday report data or uploaded external files – which are then queryable via Prism’s reporting layer or via the Prism Data API. For a turnover prediction use case, the Prism dataset you build becomes the feature store that feeds the Python model.
The article on Workday Prism data management and analytics on the Sama site covers how Prism is structured and how datasets are built, which is useful background before working through the technical setup described in this post.
Building the Core Dataset in Prism
The dataset for a turnover model needs to be built at the worker level, with one row per worker per time period (typically monthly snapshots) and a binary outcome variable indicating whether the worker left the organisation during that period.
The raw data sources in Workday that feed this dataset include:
- Worker profile data (tenure, age band, gender, job family, job level, location)
- Compensation data (current salary, time since last pay change, position in grade range)
- Performance data (last performance rating, trend of ratings over prior periods)
- Promotion and career movement data (time since last promotion, number of role changes)
- Manager data (manager tenure, span of control, team turnover rate)
- Leave and absence data (leave frequency, unplanned absence rate)
- Learning and development data (training completion rate, certifications earned)
- Recruiting source data (how the worker was originally hired)
Each of these is available within Workday and can be surfaced via Workday reports that feed into Prism datasets. The key discipline is building these as point-in-time snapshots rather than current-state records. A model trained only on current data will not capture how attributes have changed over time, which is where much of the predictive signal lives.
Prism’s dataset blending capability allows you to join these Workday-sourced datasets with external data that Workday does not hold natively. Relevant external datasets for turnover modelling include local unemployment rates (a proxy for labour market conditions), industry-level pay benchmarks, and commute distance data derived from worker postcode or ZIP code. Uploading these as external datasets in Prism and joining them to the worker-level HCM dataset produces a significantly richer feature set than Workday data alone provides.
For teams who are using calculated fields within Workday to derive additional metrics before they reach Prism, the post on leveraging Workday HCM calculated fields for dynamic workforce reporting covers how to build those fields in a way that makes them consistently available across reporting and analytics outputs.
Ready to make Workday Workforce Scheduling your competitive edge in 2026?
Sama delivers senior Workday Workforce Scheduling expertise — from schedule assignments and labor rules to payroll integration and compliance automation — helping large shift-based organizations cut labor cost variance and eliminate manual reconciliations.
Extracting Data from Prism for Python Modelling
Workday Prism exposes its datasets through the Prism Data API, which is a REST API that allows programmatic access to dataset contents. The authentication flow uses OAuth 2.0, consistent with Workday’s REST API standard. Once authenticated, you can query a specific Prism dataset and retrieve its contents as a paginated JSON response, which you then parse into a Pandas DataFrame for modelling in Python.
The connection flow from Python looks like this:
import requests import pandas as pd # Step 1: Obtain OAuth token token_url = "https://wd2-impl-services1.workday.com/ccx/oauth2/{tenant}/token" token_payload = { "grant_type": "client_credentials", "client_id": "your_client_id", "client_secret": "your_client_secret" } token_response = requests.post(token_url, data=token_payload) access_token = token_response.json()["access_token"] # Step 2: Query the Prism dataset prism_url = "https://api.workday.com/prism/v1/datasets/{dataset_id}/data" headers = { "Authorization": f"Bearer {access_token}", "Content-Type": "application/json" } response = requests.get(prism_url, headers=headers) data = response.json() # Step 3: Load into DataFrame df = pd.DataFrame(data["data"]) In production, the token refresh logic needs to be built around this call. The client credentials grant provides a token with a defined expiry, and the data extraction process – especially for large Prism datasets – may outlast a single token’s validity if pagination is involved. Wrapping the API calls in a token refresh handler prevents silent authentication failures mid-extraction.
The Prism Data API returns data in pages. For a workforce dataset covering several years of monthly snapshots for a large organisation, the full dataset may span hundreds of thousands of rows across multiple pages. The extraction script needs to iterate through pages using the offset parameter until the response contains no further records.
Once the data is in a Pandas DataFrame, standard feature engineering and model training can proceed using Python’s scikit-learn, XGBoost, or LightGBM libraries.
Feature Engineering for Turnover Prediction
Raw Workday data fields rarely feed directly into a model with high predictive value. Feature engineering – transforming raw fields into derived variables that better represent the underlying patterns – is where much of the modelling work sits.
The Features That Actually Matter
Research on workforce attrition published by Cornell University’s ILR School identifies tenure, time since last promotion, relative pay position, and manager quality as the strongest predictors of voluntary turnover. These align closely with what is available in Workday, but they need to be expressed as engineered features rather than raw values.
Tenure-related features. Raw tenure in months is useful, but non-linear. The highest turnover risk typically clusters in the first 12 months and again around the three to five year mark. Creating tenure band indicators (0 to 12 months, 13 to 36 months, and so on) captures this non-linearity better than a continuous tenure variable.
Compensation positioning. A worker’s absolute salary is less predictive than their position within the pay range for their grade. A worker paid at the bottom quartile of their grade range is at significantly higher attrition risk than one at the midpoint, even if the absolute salaries differ. Calculate the compa-ratio (actual salary divided by the grade midpoint) as a derived feature. Time since the last pay increase is equally important.
Manager stability. Voluntary turnover is strongly correlated with manager relationship quality. Since you cannot directly measure relationship quality, proxy variables derived from Workday data work well. Manager tenure in role, the manager’s own team’s trailing 12-month turnover rate, and the manager’s span of control are all derivable from Workday data and carry genuine predictive weight.
Career velocity. Time since last promotion and number of role changes in the past three years together capture whether a worker perceives their career as progressing. Workers who have not had a promotion or a meaningful role change in over three years show elevated turnover risk, particularly in mid-career.
Absence patterns. Unplanned absence frequency in the six months prior to termination is a reliable behavioural signal available in Workday’s time and absence data. An increase in unplanned absences relative to a worker’s personal baseline is more informative than the raw absence count.
import pandas as pd import numpy as np # Tenure bands df["tenure_band"] = pd.cut( df["tenure_months"], bins=[0, 12, 36, 60, 120, 999], labels=["0-12m", "13-36m", "37-60m", "61-120m", "120m+"] ) # Compa-ratio df["compa_ratio"] = df["current_salary"] / df["grade_midpoint"] # Manager team trailing turnover df["mgr_team_turnover_12m"] = ( df.groupby("manager_id")["terminated_in_period"] .transform(lambda x: x.rolling(12, min_periods=1).mean()) ) # Absence rate change vs personal baseline df["absence_rate_delta"] = ( df["absence_rate_6m"] - df["absence_rate_prior_6m"] ) Handling Class Imbalance
In most organisations, the turnover rate in any given month is between 1 and 3 percent. This means the dataset is heavily imbalanced: the vast majority of rows represent workers who did not leave. Training a model on imbalanced data without correction produces a classifier that predicts “no turnover” for almost every worker and achieves high overall accuracy while being useless for the actual task.
The standard correction approach is SMOTE (Synthetic Minority Oversampling Technique), available in Python’s imbalanced-learn library, which generates synthetic examples of the minority class (workers who left) to rebalance the training set. Alternatively, adjusting class weights in the model training step achieves a similar effect with less computational overhead for smaller datasets.
python
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
X = df.drop(columns=[“worker_id”, “terminated_in_period”])
y = df[“terminated_in_period”]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y)
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)
Ready to make Workday Workforce Scheduling your competitive edge in 2026?
Sama delivers senior Workday Workforce Scheduling expertise — from schedule assignments and labor rules to payroll integration and compliance automation — helping large shift-based organizations cut labor cost variance and eliminate manual reconciliations.
Selecting and Training the Prediction Model
For turnover prediction in an HR context, the choice of model involves a balance between predictive performance and interpretability. HR business partners and people managers are the end consumers of the model output. A model they cannot understand well enough to trust will not influence their decisions.
Gradient Boosting as the Primary Approach
Gradient boosting models – specifically XGBoost or LightGBM – consistently outperform logistic regression and random forests on tabular HR data with mixed feature types. They handle non-linear relationships well, tolerate missing values (which are common in HR datasets where not every worker has performance ratings or learning records), and produce feature importance scores that help explain predictions to non-technical stakeholders.
import xgboost as xgb from sklearn.metrics import roc_auc_score, classification_report model = xgb.XGBClassifier( n_estimators=300, max_depth=5, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=10, # adjust based on class ratio use_label_encoder=False, eval_metric="auc", random_state=42 ) model.fit( X_train_balanced, y_train_balanced, eval_set=[(X_test, y_test)], early_stopping_rounds=20, verbose=False ) y_pred_proba = model.predict_proba(X_test)[:, 1] print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba):.3f}") A well-trained XGBoost model on Workday HR data typically achieves a ROC-AUC score between 0.75 and 0.85 for 90-day turnover prediction, depending on the richness of the feature set and the quality of historical data. A ROC-AUC above 0.80 is generally sufficient to drive useful intervention prioritisation.
Model Interpretability with SHAP
SHAP (SHapley Additive exPlanations) values are the standard approach for explaining gradient boosting model outputs at both the individual worker level and the population level. For an HR use case, SHAP allows you to tell a business partner not just that Worker X has a high predicted turnover probability, but which specific factors are driving that prediction for that individual.
import shap explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test) # Summary plot - shows which features drive predictions across the population shap.summary_plot(shap_values, X_test, plot_type="bar") # Individual explanation for a single worker shap.initjs() shap.force_plot( explainer.expected_value, shap_values[0], X_test.iloc[0] )
This produces outputs that translate directly into manager conversations: “This worker is flagged as high risk primarily because their compa-ratio is in the bottom quartile, they have had no promotion in 48 months, and their unplanned absence rate has increased in the last quarter.” That is actionable. A raw probability score is not.
Operationalising the Model Output
A model that runs once in a notebook and produces an interesting result is not an analytics framework. Operationalising means running the model on a regular cadence, writing outputs back to a place where HR and managers can consume them, and building governance around how the predictions are used.
Scheduling and Automation
The Python model should run on a monthly schedule, aligned with the Prism dataset refresh cadence. The extraction script pulls the latest Prism snapshot, the feature engineering pipeline transforms it, the trained model scores each worker, and the output – a ranked list of workers by predicted turnover probability, with SHAP-based explanations – is written to a structured format that feeds the reporting layer.
For teams already using Workday’s advanced reporting infrastructure, Prism datasets can receive the model output scores as an uploaded external dataset, which then feeds Workday Prism reports accessible to HR business partners inside the Workday interface. This keeps the output within the Workday security model, ensuring that managers see only the workers within their scope and that data access follows existing Workday role-based security controls.
The post on mastering real-time KPI tracking and optimising Workday Prism Analytics covers how Prism reporting is structured for operational use, which is directly relevant to how the model output gets surfaced to end users.
Governance and Ethical Use
Turnover prediction models carry ethical obligations that are not present in most other analytics use cases. The model output is personal data. It makes inferences about individual workers’ intentions that they have not explicitly disclosed. Used well, it enables earlier, more supportive conversations. Used poorly, it becomes the basis for discriminatory decisions or self-fulfilling prophecies where flagged workers are managed out rather than supported.
Governance requirements for this kind of model should include: a defined use policy that specifies what actions the output can and cannot inform, a regular fairness audit that checks whether the model produces systematically higher risk scores for protected characteristics such as gender, ethnicity, or age, and a process for workers to understand in general terms that predictive analytics are used in workforce planning without exposing individual scores.
The European Union’s AI Act, which came into force in 2024, classifies AI systems used to make employment-related decisions as high-risk AI systems subject to specific transparency, accuracy, and human oversight requirements. Organisations operating in the EU or processing data of EU-based workers need to ensure their turnover prediction framework complies with these obligations from the design stage, not as an afterthought.
For teams working with sensitive workforce data across multiple reporting dimensions, the article on securing sensitive data in Workday Analytics with role-based access controls is directly relevant to how access to prediction outputs should be governed inside Workday.
Retraining and Model Maintenance
A turnover model trained on data from 2022 may perform poorly in 2025 if the workforce composition, labour market conditions, or company culture have changed substantially. Schedule a quarterly model evaluation that compares predicted turnover rates against actual outcomes. If the ROC-AUC on recent data has declined by more than five percentage points from the original benchmark, retrain the model on the updated historical dataset.
Feature importance can also shift over time. A period of wage inflation may make compa-ratio a stronger predictor than it was historically. A change in management structure may reduce the signal from manager-level features. Reviewing SHAP summary plots quarterly alongside model performance metrics gives early warning of when the model is becoming stale.
Ready to make Workday Workforce Scheduling your competitive edge in 2026?
Sama delivers senior Workday Workforce Scheduling expertise — from schedule assignments and labor rules to payroll integration and compliance automation — helping large shift-based organizations cut labor cost variance and eliminate manual reconciliations.
Building the Reporting Layer in Workday Prism
With the model running and producing monthly output, the reporting layer determines whether the predictions actually influence decisions. A well-built Prism report surfaces the right information to the right person in a format that supports action, not just observation.
The recommended structure for an HR analytics reporting layer for turnover prediction includes three levels:
An executive summary view, accessible to senior HR leadership, showing overall workforce attrition risk by business unit, function, and location, with trend lines across the past six months. This informs workforce planning conversations at the leadership level without exposing individual worker data.
An HR business partner view, scoped to the worker population the HRBP supports, showing individual risk scores ranked from highest to lowest, with the top three contributing SHAP factors for each high-risk worker. This informs the HRBP’s conversation agenda with people managers.
A manager view, accessible to line managers within their span of control, showing team-level risk indicators without individual probability scores. Individual scores carry interpretation risks when surfaced directly to managers without adequate context and training.
For teams working on building composite reports that span multiple Workday data sources alongside Prism datasets, the post on mastering advanced Workday reporting with composite reports covers how to structure multi-source reports in a way that keeps performance and maintenance manageable.
Connecting Prediction to Action
The final step of the framework is closing the loop between a risk score and a tangible HR intervention. Identifying a high-risk worker is only useful if it triggers a specific, proportionate action at the right time.
A practical intervention tiering model works as follows. Workers with a predicted 90-day turnover probability above 70 percent are flagged for immediate HRBP engagement, focused on understanding the root cause and whether a structured retention action is appropriate. Workers between 40 and 70 percent are flagged for manager awareness, with suggested conversation topics derived from the SHAP explanation for that worker. Workers below 40 percent are monitored and reassessed monthly without direct intervention.
The intervention itself should be informed by the SHAP explanation, not by the score alone. A worker flagged primarily due to compensation positioning calls for a different conversation than one flagged primarily due to manager-level factors. Matching the intervention type to the predicted cause is what separates a prediction framework that reduces attrition from one that produces a ranked list of names that nobody acts on.
For HR teams that want to understand how Workday’s broader analytics capabilities support the kind of data-driven HR decision-making this framework enables, the article on transforming HR data into strategic intelligence through Workday Discover covers how Workday’s analytics tools connect at the platform level.
Conclusion
Building a turnover prediction framework on Workday Prism and Python is a technically achievable project for any organisation with a reasonably mature Workday implementation and access to Python skills, either internally or through a specialist. The data exists. Prism makes it accessible without requiring raw data exports to uncontrolled environments. Python provides the modelling and interpretability tools to turn that data into predictions that HR teams can act on.
The differentiator between a framework that works and one that does not is operational discipline: building the feature engineering pipeline to produce point-in-time snapshots rather than current-state records, handling class imbalance correctly, using SHAP to make individual predictions explainable, building governance around how the output is used, and committing to quarterly model evaluation so the predictions stay accurate as the workforce evolves.
If your organisation wants to build this capability on your existing Workday environment, or if you are dealing with Prism performance or data quality issues that are making analytics work difficult, Sama’s senior Workday consultants work directly in live environments to solve exactly these kinds of challenges. Reach out to discuss what your current data landscape looks like and where the framework should start.