Lessons Learned


If you have 4 parallel workers and need to process all rows from a SQL table, probably it's best to split the work with LIMIT/OFFSET into 4 parts and consume them with 4 tasks than to chose some smaller LIMIT/OFFSET and generate more than 4 tasks. OFFSET can get really slow and it doesn't build up speed when you make consecutive ones like OFFSET 1000, OFFSET 1100, OFFSET 1200, etc. In Python: limit = math.ceil(n_rows / n_workers) offsets = range(0, n_rows, limit)

Did you like?