[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Request for additional work space for Hall D



Mark,

One additional note: the obligation to you, if you expand beyond your 
guarantee, is that you shrink upon demand.  We plan to work out a 
protocol in which we call a script you provide to shrink your usage.  If 
that fails to free sufficient space, we will roll out part of your usage 
to tape, based upon an algorithm of our choosing.  This should be 
developed during the coming 6 months.

Chip

Sandy Philpott wrote:

> Mark,
>
> One thing we can do now even if we can't guarantee the full 5TB to 
> Hall D is to set its quota to 5TB, leaving the reservation at 2.5TB.  
> What's nice about the zfs filesystems is exactly this feature -- we 
> can have both a guarantee and and a max.  Because there's still room 
> in the rest of the zpool that's not being used by Hall A, C, or MSS 
> staging, it's available.
>
> This should help for the short term while we work toward more accurate 
> allocations in the future.
>
> It is done.
>
> Sandy
>
>
>> Dear all,
>>
>> I just bought 1TB for the RCF farm here and it was 4.3k$, so 5 TB is 
>> ~ 20k$. I think DOE and JLab makes something significantly wrong if 
>> it has all the expensive equipment like detectors and so on, but than 
>> has no money to allow to analyse/simulate the data properly. The 
>> upgrade is 310M$ so 20k$ is 6.5x10^-5 %.
>>
>> Also this years funding was not to bad, so I assume JLab still is 
>> able to pay 20k$ for its key program in the 12GeV upgrade.
>> elke
>>
>>
>>    Date: Wed, 27 May 2009 09:14:38 -0400
>>    From: Chip Watson <watson@jlab.org>
>>    To: Mark M. Ito <marki@jlab.org>
>>    Cc: HallD Software Group <halld-offline@jlab.org>
>>    Subject: Re: Request for additional work space for Hall D
>>       Mark et al,
>>       Granted, disk space is necessary for all these activities.  
>> What I have seen
>>    from HEP experiments that I have reviewed is a rather complete 
>> model of what
>>    they intend to do, quantifying the largest terms in their 
>> computing model.
>>    The point of the 12 GeV Computing Plan is to move Halls B and D in 
>> that
>>    direction, and I would say, "so far, so good".
>>       I actually don't think that 35 TB in 5 years is much to care 
>> about, as I
>>    expect the lab to be running more than an order of magnitude above 
>> that, and
>>    35 TB will be easily held in a 2U server.  I'm more concerned 
>> about the rigor
>>    in the process, and the total cost of the computing model's 
>> implementation.
>>       In the short run (FY2009 and FY2010) make sure that you have 
>> made your case
>>    to Physics division, who is the payer for this disk space.  We 
>> will continue
>>    to plan increases in capacity based upon observed trends AND 
>> robust planning
>>    documents.  5 TB this fiscal for Hall D is probably not possible 
>> without
>>    additional funds from Physics division, but we'll see how the end 
>> of year
>>    looks in August.
>>       Chip
>>       Mark M. Ito wrote:
>>       > Chip,
>>    >    > The focus of this work disk request is to address files 
>> that need to exist
>>    > on disk, all at one time, to do an analysis, times the number of 
>> such
>>    > analyses which are underway. This is quite different from the 
>> total volume
>>    > of raw events (either real or simulated). It may involved highly 
>> compressed
>>    > data, such as ntuples or root trees, or it may involve staging
>>    > reconstructed data on its way to compression, or it may involve 
>> having a
>>    > set of raw data for development of reconstruction algorithms. In 
>> our case
>>    > it may also involve studies of how to generate our raw data (via
>>    > simulation) which is not a completely solved problem yet 
>> (simulating
>>    > detector effects properly in particular). In each of these cases 
>> the raw
>>    > data itself may not (or may) be stored somewhere. Suffice it to 
>> say there
>>    > are a lot of use cases for disk use and there will be a lot of 
>> instances of
>>    > such cases. A lot of the growth we are seeing lately comes from 
>> the "number
>>    > of analyses" factor ramping up, i. e., more people doing stuff 
>> (which had
>>    > better be the case by now).
>>    >    > The other angle, mentioned in the CCPR, is that the work 
>> disk used by CLAS
>>    > is of this ilk. Note that the 35 TB that they use for work is 
>> tiny compared
>>    > to the accumulation of an instantaneous rate of nearly a 
>> petabyte per day
>>    > when CLAS is running flat out.
>>    >    > Finally, predicting what the "right" amount of disk is is 
>> ridiculously
>>    > difficult to do. What that tiny fraction of the raw data volume 
>> should be,
>>    > the number of instances of each use case, the number of use cases
>>    > (including those yet to be invented), the effort required to 
>> make disk use
>>    > more efficient on a use-case-by-use-case basis are all factors 
>> and all are
>>    > hard to estimate. That's why we have always relied on historical 
>> use data
>>    > and planned disk expansion incrementally.
>>    >    >  -- Mark
>>    >    > Chip Watson wrote:
>>    >    > > All,
>>    > >    > > It was probably me who first raised the question of 
>> re-generating on the
>>    > > fly, and what prompted my question was the very high cost of 
>> storing all
>>    > > the simulated data compared to the fairly modest cost of 
>> generating it.
>>    > > Depending upon how many times each simulated event is used, 
>> re-generation
>>    > > makes good sense.  Otherwise the cost of tape and the tape 
>> library dwarfs
>>    > > all other costs for GlueX.  But of course it depends upon 
>> evolving costs
>>    > > of computing and that magic number of how many times an event 
>> is re-used.
>>    > >    > > Chip
>>    > >    > > David Lawrence wrote:
>>    > >    > > >    > > > Hi Elke,
>>    > > >    > > >    Thanks for the response. I'll just throw in that 
>> in the discussions
>>    > > > at the time, we used a factor of 3 rather than 10 since the 
>> 10 came
>>    > > > from earlier, lower statistics experiments which had 
>> statistically
>>    > > > driven error bars. Clearly you're right that something will 
>> need to be
>>    > > > saved, but the question was whether saving the raw data was 
>> faster
>>    > > > and/or cheaper than regenerating and reconstructing on the 
>> farm. The
>>    > > > logic was the following: Reconstruction of GlueX data takes 
>> longer than
>>    > > > the simulation. Saving the simulated data to disk would be 
>> for the
>>    > > > purpose of re-reconstructing it again assuming whatever problem
>>    > > > motivated this was limited to the reconstruction and not the 
>> simulation
>>    > > > itself. Saving some kind of simulation DSTs was thought to 
>> be the way
>>    > > > to go, but they would have a tiny footprint on the disk by 
>> comparison.
>>    > > > RHIC is probably a whole other beast, especially Au Au 
>> scattering where
>>    > > > the number of tracks per event is orderS of magnitude 
>> greater than
>>    > > > GlueX. Anyway, I'm not really disagreeing (even though it 
>> may sound
>>    > > > like it!) I'm just trying to convey the earlier reasoning. 
>> I'll let
>>    > > > others revisit the plan and revise it for the future as they 
>> see fit.
>>    > > >    > > > Regards,
>>    > > > -David
>>    > > >    > > > elke-caroline aschenauer wrote:
>>    > > >    > > > > On Tue, 26 May 2009, David Lawrence wrote:
>>    > > > >    > > > > Dear David et al.,
>>    > > > >    > > > > okay I cannot keep myself from replying. For 
>> hermes we need for a
>>    > > > > fully reconstructed event in Geant-3 between 0.5 to 2.5 s, 
>> at rhic Au
>>    > > > > on Au an event takes a between 1 - 2 min CPU time.
>>    > > > > So I'm not sure what you mean it costs nothing to produce 
>> MC events.
>>    > > > > It costs time and the rule of thum is at least 10 times 
>> the MC
>>    > > > > statistics compared to the data. So with the data 
>> statistics GlueX is
>>    > > > > expecting you will need continously to generate MC to 
>> reach 10 times
>>    > > > > the data statistics. This udst need to be stored 
>> somewhere, so you
>>    > > > > will need disk-space or tape-space.
>>    > > > >    > > > > I would revise the xls sheets to include this 
>> in your estimate.
>>    > > > > I know the only valid comparision is CLAS, I have no idea 
>> what they
>>    > > > > do, but from my limited knowledge on CLAS I know they are 
>> normally
>>    > > > > not heavy on MC studies.
>>    > > > >    > > > > Cheers elke
>>    > > > >    > > > >    > > > >   Date: Tue, 26 May 2009 15:34:42 -0400
>>    > > > >   From: David Lawrence <davidl@jlab.org>
>>    > > > >   To: Mark M. Ito <marki@jlab.org>
>>    > > > >   Cc: HallD Software Group <halld-offline@jlab.org>
>>    > > > >   Subject: Re: Request for additional work space for Hall D
>>    > > > >       Hi Mark,
>>    > > > >        This may be a little late, but here are the latest 
>> spreadsheet
>>    > > > > (and it's
>>    > > > >   explanation) used to compute the predicted GlueX disk 
>> space. The
>>    > > > > main thing
>>    > > > >   you'll notice is that there is virtually no disk space 
>> allocated
>>    > > > > for
>>    > > > >   simulation. This was mainly due to a calculation that it 
>> costs more
>>    > > > > to store
>>    > > > >   the simulated data than reproduce it. As such, we wanted 
>> the IT
>>    > > > > division to
>>    > > > >   focus their budget on more CPU power for the farm as 
>> opposed to
>>    > > > > work disk
>>    > > > >   space. This is not to say that philosophy should still 
>> be followed,
>>    > > > > just that
>>    > > > >   that was the motivation in the past. You are likely to get
>>    > > > > questions from
>>    > > > >   Sandy and Chip referring back to the spreadsheet.
>>    > > > >     Regards,
>>    > > > >   -David
>>    > > > >     Mark M. Ito wrote:
>>    > > > >   > In recent months we have run out of space on our works 
>> disk from
>>    > > > > time to
>>    > > > >   > time. Our activity generating and reconstructing 
>> simulated data
>>    > > > > is ramping
>>    > > > >   > up. We discussed the situation in the Hall D Offline 
>> Meeting and
>>    > > > > we would
>>    > > > >   > like request more space. Our current allocation is 2.7 
>> TB. We
>>    > > > > request that
>>    > > > >   > this be increased to 5 TB as soon as possible and that 
>> the total
>>    > > > > be
>>    > > > >   > increased to about 10 TB over the next year.
>>    > > > >   >   > Once in operation, the GlueX detector will 
>> generate a large
>>    > > > > volume of data.
>>    > > > >   > To be ready to analyze real data once it comes in we 
>> will have to
>>    > > > > generate
>>    > > > >   > large amounts of simulated data to develop and test 
>> all the of
>>    > > > > necessary
>>    > > > >   > tools well in advance of the arrival of real data. As 
>> we approach
>>    > > > >   > operations we will likely need an amount of space 
>> equal to or
>>    > > > > exceeding the
>>    > > > >   > amount used by Hall B (about 35 TB). Recall that raw 
>> data is
>>    > > > > generally not
>>    > > > >   > stored on the work disks in any case, work space is 
>> used for
>>    > > > > intermediate
>>    > > > >   > files needed to analyze raw or simulated data. The 
>> fact that
>>    > > > > GlueX has not
>>    > > > >   > taken real data yet does not imply our current disk 
>> needs are
>>    > > > >   > insignificant. We note that this request is quite 
>> modest when
>>    > > > > compared to
>>    > > > >   > the total amount of work disk space deployed at present.
>>    > > > >   >   >     --
>>    > > > > 
>> ------------------------------------------------------------------------ 
>>    > > > >   David Lawrence Ph.D.
>>    > > > >   Staff Scientist                 Office: (757)269-5567   
>> [[[  [   [
>>    > > > > [
>>    > > > >   Jefferson Lab                   Pager:  (757)584-5567   
>> [  [ [ [ [
>>    > > > > [
>>    > > > >   http://www.jlab.org/~davidl     davidl@jlab.org         
>> [[[  [[ [[
>>    > > > > [[[
>>    > > > >    > > > > 
>> ------------------------------------------------------------------------ 
>>    > > > >      ( `,_' 
>> )+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>>    > > > >  )    
>> `\                                                     -
>>    > > > > /    '. 
>> |                                                     +
>>    > > > > |       `,              Elke-Caroline 
>> Aschenauer               =
>>    > > > >  \,_  
>> `-/                                                       -
>>    > > > >  ,&&&&&V         Brookhaven National 
>> Lab                         +
>>    > > > > ,&&&&&&&&:       Physics Dept.,            8 Shore 
>> Road           =
>>    > > > > ,&&&&&&&&&&;      Bldg. 510D / 2-195        East 
>> Patchogue, NY,     -
>>    > > > > |  |&&&&&&&;\     20 Pennsylvania Avenue                 
>> 11772-5963
>>    > > > > +
>>    > > > > |  |       :_) _  Upton, NY 11973
>>    > > > > =
>>    > > > > |  |       ;--' | Tel.:  001-631-344-4769   Tel.:  
>> 001-631-569-4290
>>    > > > > -
>>    > > > > '--'   `-.--.   | Fax.:  001-631-344-1334   Cell:  
>> 001-757-256-5224
>>    > > > > +
>>    > > > >   \_    |  |---'
>>    > > > > =
>>    > > > >     `-._\__/     Mail: elke@jlab.org
>>    > > > > -
>>    > > > >            
>> =-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=-+=
>>   
>