Question for Researchers about waiting for results

Author	Message
Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,661,974 RAC: 0	Message 81503 - Posted: 20 Apr 2017, 16:02:21 UTC Hi David and team. Some quick background: Yesterday I spent most all of my work day waiting, I write some 'big data' blending jobs in Hive&Pig (and starting to learn Spark), but the transformations I'm working on involve many BILLIONS of records and so even with my company's 160+ node Hadoop cluster, some steps of my transformation take a couple of hours to crunch. This waiting time really slows down my ability to iterate and test some aspects of my logic. Where possible I try to find a subset of data that can serve as a test case but there are some use cases where this is strategy cannot be applied. So, my question for you is, with the multi-days/weeks long turn around times of rosetta jobs, how the heck do you manage to iterate in your experiments efficiently and perhaps more importantly how do you ensure that you don't spend a whole two weeks waiting for a run to complete only to find out that there was a typo in the input sequences somewhere? Secondly, what do you do while waiting for jobs to finish? ID: 81503 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 81504 - Posted: 20 Apr 2017, 18:57:16 UTC - in response to Message 81503. "how the heck do you manage to iterate in your experiments efficiently?" We typically submit large batches of jobs per iteration and when we are satisfied with the results, we cancel jobs that are still queued but jobs that are on clients will continue to run. Having short turn around times and machines that are continually crunching and networked would obviously make this more efficient. "how do you ensure that you don't spend a whole two weeks waiting for a run to complete only to find out that there was a typo in the input sequences somewhere?" We try to be careful :) and we almost never have to manually type sequences. "what do you do while waiting for jobs to finish?" There's always stuff to do. Depending on the researcher, one can prepare more jobs, analyze data, develop new methods, write, refactor, and debug code, think of and do other experiments (computational and/or wet lab), write papers, go to meetings, respond to forum posts, etc etc etc..... ID: 81504 · Rating: 0 · rate: / Reply Quote