Job Sequencing: When Greedy Fails and DSU Saves the Day

This One Looked Like A Classic Greedy Setup
you get a list of jobs each job has a deadline and a profit you can only do one job per time unit finish the job before or on its deadline to get the profit goal do as many high-profit jobs as you can but within their deadlines
cool
so you sort the jobs by profit descending try to place each one in the latest free slot before its deadline done right
that's exactly what i did
for job in sorted_jobs_desc:
slot = deadline
while slot > 0:
if schedule[slot] is free:
assign it
break
slot -= 1
this worked for a few hundred jobs maybe a few thousand
then i tested it with 10^5 jobs
boom tle
Why The Loop Is A Bottleneck
the while loop scanning for free slots is the killer
imagine a job with a deadline of 99999 and every earlier slot is filled you just did 99999 iterations for that one job and you might have to do this for 50k other jobs too you're looking at a worst-case of o(n^2)
that's when i remembered something i hadn't touched in a while disjoint set union dsu union find the thing you use in kruskal's algorithm or to track connected components the idea isn't new i saw a version of this optimization in some older cp forums dsu is old but it's fast
How Dsu Entered The Picture
so yeah the greedy logic was obvious but the tle wasn't
- at first i blamed the input size maybe the test case was bogus
- then i saw it: the
while free_spot > 0
loop was the real bottleneck - that's linear scan per job - if deadlines are high, that loop turns into death by a thousand cuts
i hit the usual tricks:
- tried limiting deadline to
min(d n)
- minor gain - preallocated array to avoid dynamic resizing - nothing changed
- tried reverse filling instead of forward - same issue
nothing moved the needle
then i had this random mental flashback to kruskal's algorithm
i used to compete in college level contests where dsu was our go to when you needed to track connected components with minimal cost
it's what lets you merge groups and find roots fast
wait that's exactly what this job thing is
instead of looking for the nearest available free slot manually what if each slot points to the next free one
that's what find()
in dsu is for
assign a job to slot 4
cool now slot 4 is used point it to slot 3
next time you call find(4)
it returns 3
call find(3)
if that's used it returns 2
path compression kicks in and flattens everything
you now get the next available free slot in near constant time
Final Code With Dsu
class Solution:
def jobSequencing(self, deadline, profit):
n = len(deadline)
jobs = [ [i, deadline[i], profit[i]] for i in range(n) ]
jobs.sort(key=lambda x: -x[2])
max_deadline = max(deadline)
parent = [i for i in range(max_deadline + 2)]
def find(slot):
if parent[slot] == slot:
return slot
parent[slot] = find(parent[slot])
return parent[slot]
total_jobs = total_profit = 0
for jobid, d, p in jobs:
available = find(d)
if available > 0:
parent[available] = find(available - 1)
total_jobs += 1
total_profit += p
return [total_jobs, total_profit]
The Numbers
- old brute loop ~o(n * max(deadline))
- new dsu o(n * α(n)) and α(n) is practically constant
- passed every large test case instant jump from tle to ac
Final Takeaway
if a greedy algorithm starts scanning linearly too much especially in scheduling like problems dsu might be what you're looking for
nothing fancy here - just leveraging the power of efficient set operations
and some muscle memory from kruskal