[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for additional work space for Hall D



All,

It was probably me who first raised the question of re-generating on the 
fly, and what prompted my question was the very high cost of storing all 
the simulated data compared to the fairly modest cost of generating it.  
Depending upon how many times each simulated event is used, 
re-generation makes good sense.  Otherwise the cost of tape and the tape 
library dwarfs all other costs for GlueX.  But of course it depends upon 
evolving costs of computing and that magic number of how many times an 
event is re-used.

Chip

David Lawrence wrote:

>
> Hi Elke,
>
>    Thanks for the response. I'll just throw in that in the discussions 
> at the time, we used a factor of 3 rather than 10 since the 10 came 
> from earlier, lower statistics experiments which had statistically 
> driven error bars. Clearly you're right that something will need to be 
> saved, but the question was whether saving the raw data was faster 
> and/or cheaper than regenerating and reconstructing on the farm. The 
> logic was the following: Reconstruction of GlueX data takes longer 
> than the simulation. Saving the simulated data to disk would be for 
> the purpose of re-reconstructing it again assuming whatever problem 
> motivated this was limited to the reconstruction and not the 
> simulation itself. Saving some kind of simulation DSTs was thought to 
> be the way to go, but they would have a tiny footprint on the disk by 
> comparison. RHIC is probably a whole other beast, especially Au Au 
> scattering where the number of tracks per event is orderS of magnitude 
> greater than GlueX. Anyway, I'm not really disagreeing (even though it 
> may sound like it!) I'm just trying to convey the earlier reasoning. 
> I'll let others revisit the plan and revise it for the future as they 
> see fit.
>
> Regards,
> -David
>
> elke-caroline aschenauer wrote:
>
>> On Tue, 26 May 2009, David Lawrence wrote:
>>
>> Dear David et al.,
>>
>> okay I cannot keep myself from replying. For hermes we need for a 
>> fully reconstructed event in Geant-3 between 0.5 to 2.5 s, at rhic Au 
>> on Au an event takes a between 1 - 2 min CPU time.
>> So I'm not sure what you mean it costs nothing to produce MC events. 
>> It costs time and the rule of thum is at least 10 times the MC 
>> statistics compared to the data. So with the data statistics GlueX is 
>> expecting you will need continously to generate MC to reach 10 times 
>> the data statistics. This udst need to be stored somewhere, so you 
>> will need disk-space or tape-space.
>>
>> I would revise the xls sheets to include this in your estimate.
>> I know the only valid comparision is CLAS, I have no idea what they 
>> do, but from my limited knowledge on CLAS I know they are normally 
>> not heavy on MC studies.
>>
>> Cheers elke
>>
>>
>>   Date: Tue, 26 May 2009 15:34:42 -0400
>>   From: David Lawrence <davidl@jlab.org>
>>   To: Mark M. Ito <marki@jlab.org>
>>   Cc: HallD Software Group <halld-offline@jlab.org>
>>   Subject: Re: Request for additional work space for Hall D
>>       Hi Mark,
>>        This may be a little late, but here are the latest spreadsheet 
>> (and it's
>>   explanation) used to compute the predicted GlueX disk space. The 
>> main thing
>>   you'll notice is that there is virtually no disk space allocated for
>>   simulation. This was mainly due to a calculation that it costs more 
>> to store
>>   the simulated data than reproduce it. As such, we wanted the IT 
>> division to
>>   focus their budget on more CPU power for the farm as opposed to 
>> work disk
>>   space. This is not to say that philosophy should still be followed, 
>> just that
>>   that was the motivation in the past. You are likely to get 
>> questions from
>>   Sandy and Chip referring back to the spreadsheet.
>>     Regards,
>>   -David
>>     Mark M. Ito wrote:
>>   > In recent months we have run out of space on our works disk from 
>> time to
>>   > time. Our activity generating and reconstructing simulated data 
>> is ramping
>>   > up. We discussed the situation in the Hall D Offline Meeting and 
>> we would
>>   > like request more space. Our current allocation is 2.7 TB. We 
>> request that
>>   > this be increased to 5 TB as soon as possible and that the total be
>>   > increased to about 10 TB over the next year.
>>   >   > Once in operation, the GlueX detector will generate a large 
>> volume of data.
>>   > To be ready to analyze real data once it comes in we will have to 
>> generate
>>   > large amounts of simulated data to develop and test all the of 
>> necessary
>>   > tools well in advance of the arrival of real data. As we approach
>>   > operations we will likely need an amount of space equal to or 
>> exceeding the
>>   > amount used by Hall B (about 35 TB). Recall that raw data is 
>> generally not
>>   > stored on the work disks in any case, work space is used for 
>> intermediate
>>   > files needed to analyze raw or simulated data. The fact that 
>> GlueX has not
>>   > taken real data yet does not imply our current disk needs are
>>   > insignificant. We note that this request is quite modest when 
>> compared to
>>   > the total amount of work disk space deployed at present.
>>   >   >     --     
>> ------------------------------------------------------------------------
>>   David Lawrence Ph.D.
>>   Staff Scientist                 Office: (757)269-5567   [[[  [   [ [
>>   Jefferson Lab                   Pager:  (757)584-5567   [  [ [ [ [ [
>>   http://www.jlab.org/~davidl     davidl@jlab.org         [[[  [[ [[ [[[
>>   
>> ------------------------------------------------------------------------
>>      ( `,_' )+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>>  )    `\                                                     -
>> /    '. |                                                     +
>> |       `,              Elke-Caroline Aschenauer               =
>>  \,_  `-/                                                       -
>>  ,&&&&&V         Brookhaven National Lab                         +
>> ,&&&&&&&&:       Physics Dept.,            8 Shore Road           =
>> ,&&&&&&&&&&;      Bldg. 510D / 2-195        East Patchogue, NY,     -
>> |  |&&&&&&&;\     20 Pennsylvania Avenue                 11772-5963  +
>> |  |       :_) _  Upton, NY 11973                                     =
>> |  |       ;--' | Tel.:  001-631-344-4769   Tel.:  001-631-569-4290    -
>> '--'   `-.--.   | Fax.:  001-631-344-1334   Cell:  
>> 001-757-256-5224     +
>>   \_    |  
>> |---'                                                        =
>>     `-._\__/     Mail: 
>> elke@jlab.org                                     -
>>            
>> =-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>
>