Salesforce logo

Salesforce Data Scientist Interview Questions

31 practice questions for Salesforce Data Scientist interviews

Salesforce data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.

All Roles Software Engineer Backend Engineer Frontend Engineer Full Stack Engineer Mobile Engineer Data Engineer Data Scientist ML Engineer DevOps Engineer DevOps Engineer Product Manager SRE Security Engineer Engineering Manager Data Analyst UX/UI Designer QA Engineer

No verified questions yet for Salesforce.

ml design Hard ml pipeline #1

1. [OA] Feature Aggregator — Design a feature aggregator for Salesforce's machine learning models

Salesforce is building a robust machine learning platform and needs an efficient feature aggregation system to process various user interactions dynamically. This involves creating a class that can intake events and aggregate defined features over time.
Problem Statement: Design a class FeatureAggregator that aggregates features from incoming streams of events over a specified time window.
- Method Signature: def add_event(self, event: Dict[str, Any]) -> None — Accepts an event with features.
- Method Signature: def get_aggregated_features(self, features: List[str]) -> Dict[str, float] — Returns the average of specified features over the time period.
Example 1:
Input:
- aggregator = FeatureAggregator(time_window=60)
- aggregator.add_event({'a': 10, 'b': 20, 'timestamp': 1})
Output: aggregator.get_aggregated_features(['a', 'b']) -> {'a': 10.0, 'b': 20.0}
Example 2:
Input:
- aggregator = FeatureAggregator(time_window=60)
- aggregator.add_event({'a': 30, 'b': 40, 'timestamp': 1})
- aggregator.add_event({'a': 10, 'b': 20, 'timestamp': 30})
Output: aggregator.get_aggregated_features(['a', 'b']) -> {'a': 20.0, 'b': 30.0}
Constraints:
- Each event contains at least the given features to aggregate.
- Events are timestamped in seconds.
ml design Hard heap #2

2. [OA] Priority Queue — Implement Salesforce's lead prioritization engine

Salesforce is enhancing its lead management system to prioritize leads based on scoring metrics. A priority queue is essential for this feature to determine which leads to contact first.
Problem Statement: Design a class LeadPrioritizer that utilizes a priority queue to manage leads. The class should support adding leads, providing the highest priority lead, and removing that lead from the queue.
- Method Signature: def add_lead(self, lead: Tuple[str, int]) -> None — Adds a new lead with a lead_id and priority_score.
- Method Signature: def get_highest_priority(self) -> str — Returns the lead_id with the highest priority.
- Method Signature: def remove_highest_priority(self) -> None — Removes the lead with the highest priority from the queue.
Example 1:
Input: lead_prioritizer = LeadPrioritizer()
Sequence: add_lead(('lead1', 5))
Output: get_highest_priority() -> 'lead1' (highest priority because of score 5)
Example 2:
Input: lead_prioritizer = LeadPrioritizer()
Sequence: add_lead(('lead2', 10)), add_lead(('lead3', 3)), remove_highest_priority()
Output: get_highest_priority() -> 'lead2'
Constraints:
- The number of leads will not exceed 10^5.
coding Hard database #3

3. [OA] SQL Window Function — Analyze recurring revenue retention at Salesforce

Salesforce aims to measure the retention rate of its recurring revenue streams effectively. To analyze client financial behavior, we need to use SQL Window Functions to calculate month-over-month retention rates based on client payments.
Problem Statement: Given a table payments with the columns client_id, payment_date, and amount, write a query to compute a retention rate for each client on a month-over-month basis. Consider retention as clients who paid in the current month compared to previous months and return the client_id, month, and retention_rate (as a percentage).
Example 1:
Input:
sql
client_id | payment_date | amount
-----------|--------------|-------
1 | 2021-01-10 | 100
1 | 2021-02-15 | 150
2 | 2021-01-20 | 200
2 | 2021-03-10 | 100
1 | 2021-03-12 | 200

Output:
sql
client_id | month | retention_rate
-----------|-----------|----------------
1 | 2021-01 | NULL
1 | 2021-02 | 100.00
1 | 2021-03 | 50.00
2 | 2021-01 | NULL
2 | 2021-02 | NULL
2 | 2021-03 | 100.00

Constraints:
- There will be valid payment entries for at least one month.
- payment_date will be strictly in the format YYYY-MM-DD.
coding Hard sliding window #4

4. [OA] Sliding Window — Build a customer referral tracking system for Salesforce

Using Salesforce's platform, there is a need to analyze customer referral activities over a sliding time window to better understand engagement trends. This tracker will help identify top referrers and tailor rewards accordingly.
Problem Statement: Design a function that takes a list of referrals, where each entry is a tuple containing customer_id, timestamp, and reward_points. The function should return a mapping of customer_id to total reward_points earned within a specified time_window (in seconds).
- Method Signature: def track_referrals(referrals: List[Tuple[str, int, int]], time_window: int) -> Dict[str, int]: — The method returns a dictionary mapping customers to their total reward points within the time window.
Example 1:
Input: referrals = [('A', 1, 10), ('B', 2, 20), ('A', 3, 10), ('A', 4, 20)], time_window = 3
Output: {'A': 20, 'B': 20}
Explanation: Customer 'A' earns points from timestamps 1 to 3 only.
Example 2:
Input: referrals = [('A', 1, 10), ('B', 2, 20), ('A', 5, 30), ('C', 6, 40)], time_window = 5
Output: {'A': 10, 'B': 20, 'C': 40}
Constraints:
- 1 <= len(referrals) <= 10^4
- 0 <= timestamp <= 10^9
- 1 <= reward_points <= 100

Related Salesforce Data Scientist interview prep

Start practicing Salesforce questions

Sign up for free to access walkthroughs, AI-generated questions, and more.

Get Started Free