[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for additional work space for Hall D



Mark et al,

Granted, disk space is necessary for all these activities.  What I have 
seen from HEP experiments that I have reviewed is a rather complete 
model of what they intend to do, quantifying the largest terms in their 
computing model.  The point of the 12 GeV Computing Plan is to move 
Halls B and D in that direction, and I would say, "so far, so good".

I actually don't think that 35 TB in 5 years is much to care about, as I 
expect the lab to be running more than an order of magnitude above that, 
and 35 TB will be easily held in a 2U server.  I'm more concerned about 
the rigor in the process, and the total cost of the computing model's 
implementation.

In the short run (FY2009 and FY2010) make sure that you have made your 
case to Physics division, who is the payer for this disk space.  We will 
continue to plan increases in capacity based upon observed trends AND 
robust planning documents.  5 TB this fiscal for Hall D is probably not 
possible without additional funds from Physics division, but we'll see 
how the end of year looks in August.

Chip

Mark M. Ito wrote:

> Chip,
>
> The focus of this work disk request is to address files that need to 
> exist on disk, all at one time, to do an analysis, times the number of 
> such analyses which are underway. This is quite different from the 
> total volume of raw events (either real or simulated). It may involved 
> highly compressed data, such as ntuples or root trees, or it may 
> involve staging reconstructed data on its way to compression, or it 
> may involve having a set of raw data for development of reconstruction 
> algorithms. In our case it may also involve studies of how to generate 
> our raw data (via simulation) which is not a completely solved problem 
> yet (simulating detector effects properly in particular). In each of 
> these cases the raw data itself may not (or may) be stored somewhere. 
> Suffice it to say there are a lot of use cases for disk use and there 
> will be a lot of instances of such cases. A lot of the growth we are 
> seeing lately comes from the "number of analyses" factor ramping up, 
> i. e., more people doing stuff (which had better be the case by now).
>
> The other angle, mentioned in the CCPR, is that the work disk used by 
> CLAS is of this ilk. Note that the 35 TB that they use for work is 
> tiny compared to the accumulation of an instantaneous rate of nearly a 
> petabyte per day when CLAS is running flat out.
>
> Finally, predicting what the "right" amount of disk is is ridiculously 
> difficult to do. What that tiny fraction of the raw data volume should 
> be, the number of instances of each use case, the number of use cases 
> (including those yet to be invented), the effort required to make disk 
> use more efficient on a use-case-by-use-case basis are all factors and 
> all are hard to estimate. That's why we have always relied on 
> historical use data and planned disk expansion incrementally.
>
>  -- Mark
>
> Chip Watson wrote:
>
>> All,
>>
>> It was probably me who first raised the question of re-generating on 
>> the fly, and what prompted my question was the very high cost of 
>> storing all the simulated data compared to the fairly modest cost of 
>> generating it.  Depending upon how many times each simulated event is 
>> used, re-generation makes good sense.  Otherwise the cost of tape and 
>> the tape library dwarfs all other costs for GlueX.  But of course it 
>> depends upon evolving costs of computing and that magic number of how 
>> many times an event is re-used.
>>
>> Chip
>>
>> David Lawrence wrote:
>>
>>>
>>> Hi Elke,
>>>
>>>    Thanks for the response. I'll just throw in that in the 
>>> discussions at the time, we used a factor of 3 rather than 10 since 
>>> the 10 came from earlier, lower statistics experiments which had 
>>> statistically driven error bars. Clearly you're right that something 
>>> will need to be saved, but the question was whether saving the raw 
>>> data was faster and/or cheaper than regenerating and reconstructing 
>>> on the farm. The logic was the following: Reconstruction of GlueX 
>>> data takes longer than the simulation. Saving the simulated data to 
>>> disk would be for the purpose of re-reconstructing it again assuming 
>>> whatever problem motivated this was limited to the reconstruction 
>>> and not the simulation itself. Saving some kind of simulation DSTs 
>>> was thought to be the way to go, but they would have a tiny 
>>> footprint on the disk by comparison. RHIC is probably a whole other 
>>> beast, especially Au Au scattering where the number of tracks per 
>>> event is orderS of magnitude greater than GlueX. Anyway, I'm not 
>>> really disagreeing (even though it may sound like it!) I'm just 
>>> trying to convey the earlier reasoning. I'll let others revisit the 
>>> plan and revise it for the future as they see fit.
>>>
>>> Regards,
>>> -David
>>>
>>> elke-caroline aschenauer wrote:
>>>
>>>> On Tue, 26 May 2009, David Lawrence wrote:
>>>>
>>>> Dear David et al.,
>>>>
>>>> okay I cannot keep myself from replying. For hermes we need for a 
>>>> fully reconstructed event in Geant-3 between 0.5 to 2.5 s, at rhic 
>>>> Au on Au an event takes a between 1 - 2 min CPU time.
>>>> So I'm not sure what you mean it costs nothing to produce MC 
>>>> events. It costs time and the rule of thum is at least 10 times the 
>>>> MC statistics compared to the data. So with the data statistics 
>>>> GlueX is expecting you will need continously to generate MC to 
>>>> reach 10 times the data statistics. This udst need to be stored 
>>>> somewhere, so you will need disk-space or tape-space.
>>>>
>>>> I would revise the xls sheets to include this in your estimate.
>>>> I know the only valid comparision is CLAS, I have no idea what they 
>>>> do, but from my limited knowledge on CLAS I know they are normally 
>>>> not heavy on MC studies.
>>>>
>>>> Cheers elke
>>>>
>>>>
>>>>   Date: Tue, 26 May 2009 15:34:42 -0400
>>>>   From: David Lawrence <davidl@jlab.org>
>>>>   To: Mark M. Ito <marki@jlab.org>
>>>>   Cc: HallD Software Group <halld-offline@jlab.org>
>>>>   Subject: Re: Request for additional work space for Hall D
>>>>       Hi Mark,
>>>>        This may be a little late, but here are the latest 
>>>> spreadsheet (and it's
>>>>   explanation) used to compute the predicted GlueX disk space. The 
>>>> main thing
>>>>   you'll notice is that there is virtually no disk space allocated for
>>>>   simulation. This was mainly due to a calculation that it costs 
>>>> more to store
>>>>   the simulated data than reproduce it. As such, we wanted the IT 
>>>> division to
>>>>   focus their budget on more CPU power for the farm as opposed to 
>>>> work disk
>>>>   space. This is not to say that philosophy should still be 
>>>> followed, just that
>>>>   that was the motivation in the past. You are likely to get 
>>>> questions from
>>>>   Sandy and Chip referring back to the spreadsheet.
>>>>     Regards,
>>>>   -David
>>>>     Mark M. Ito wrote:
>>>>   > In recent months we have run out of space on our works disk 
>>>> from time to
>>>>   > time. Our activity generating and reconstructing simulated data 
>>>> is ramping
>>>>   > up. We discussed the situation in the Hall D Offline Meeting 
>>>> and we would
>>>>   > like request more space. Our current allocation is 2.7 TB. We 
>>>> request that
>>>>   > this be increased to 5 TB as soon as possible and that the 
>>>> total be
>>>>   > increased to about 10 TB over the next year.
>>>>   >   > Once in operation, the GlueX detector will generate a large 
>>>> volume of data.
>>>>   > To be ready to analyze real data once it comes in we will have 
>>>> to generate
>>>>   > large amounts of simulated data to develop and test all the of 
>>>> necessary
>>>>   > tools well in advance of the arrival of real data. As we approach
>>>>   > operations we will likely need an amount of space equal to or 
>>>> exceeding the
>>>>   > amount used by Hall B (about 35 TB). Recall that raw data is 
>>>> generally not
>>>>   > stored on the work disks in any case, work space is used for 
>>>> intermediate
>>>>   > files needed to analyze raw or simulated data. The fact that 
>>>> GlueX has not
>>>>   > taken real data yet does not imply our current disk needs are
>>>>   > insignificant. We note that this request is quite modest when 
>>>> compared to
>>>>   > the total amount of work disk space deployed at present.
>>>>   >   >     --     
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>   David Lawrence Ph.D.
>>>>   Staff Scientist                 Office: (757)269-5567   [[[  [   [ [
>>>>   Jefferson Lab                   Pager:  (757)584-5567   [  [ [ [ [ [
>>>>   http://www.jlab.org/~davidl     davidl@jlab.org         [[[  [[ 
>>>> [[ [[[
>>>>   
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>      ( `,_' )+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>>>>  )    `\                                                     -
>>>> /    '. |                                                     +
>>>> |       `,              Elke-Caroline Aschenauer               =
>>>>  \,_  `-/                                                       -
>>>>  ,&&&&&V         Brookhaven National Lab                         +
>>>> ,&&&&&&&&:       Physics Dept.,            8 Shore Road           =
>>>> ,&&&&&&&&&&;      Bldg. 510D / 2-195        East Patchogue, NY,     -
>>>> |  |&&&&&&&;\     20 Pennsylvania Avenue                 11772-5963  +
>>>> |  |       :_) _  Upton, NY 
>>>> 11973                                     =
>>>> |  |       ;--' | Tel.:  001-631-344-4769   Tel.:  
>>>> 001-631-569-4290    -
>>>> '--'   `-.--.   | Fax.:  001-631-344-1334   Cell:  
>>>> 001-757-256-5224     +
>>>>   \_    |  
>>>> |---'                                                        =
>>>>     `-._\__/     Mail: 
>>>> elke@jlab.org                                     -
>>>>            
>>>> =-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>>>
>>>
>>>
>