Lessons Learned

Daily drops of knowledge from the Vinta team

contact us
Back to Lessons Learned
Postgres Sql

If you have 4 parallel workers and need to process all rows from a SQL table, probably it's best to split the work with LIMIT/OFFSET into 4 parts and consume them with 4 tasks than to chose some smaller LIMIT/OFFSET and generate more than 4 tasks. OFFSET can get really slow and it doesn't build up speed when you make consecutive ones like OFFSET 1000, OFFSET 1100, OFFSET 1200, etc. In Python: limit = math.ceil(n_rows / n_workers) offsets = range(0, n_rows, limit)

About Flávio Juvenal

Controversial software developer who questions everything: "Are we really going forward?". Python enthusiast, but is afraid JavaScript will conquer the world. Enjoys working with Django and now wants to write system checks for everything on it.