Terry Barnaby wrote: > Hi Jeroen, > > I am looking at the Software API's and so am looking a bit deeper > into the data access and data processing requirements. > I think it is worth looking a bit at the proposed system and its features so we understand the data measurement possibilities better. > > The TMS system design, so far has, 3 separate modules. Each module has a cPCI bus with cPCI controller and 4 or 5 PUPE engines. Each module therefore has 12 or 15 individual Pick-Up engines. > The cPCI bus will be 64bit and probably operate at 33MHz (we may be > able to run at 66MHz ?) and thus support a data transfer rate of about 200 MBytes/sec. > Given a PS Cycle time of 1.2 seconds we can thus transfer to the cPCI > controller about 200MBytes from 15 Pick-Ups or 13MBytes per Pick-Up during this duration (data collected during next processing cycle). > Each bunches data will occupy about 3.8 MBytes. So we could transfer to > the cPCI controller memory a maximum of about 50 complete bunches. > As there are 3 modules in the TMS, this gives in total 150 bunches. > The cPCI controller will be able to perform a limited amount of software > processing on the data. > > The System Controller is connected to the Module Controllers by a > Gigabit Ethernet. This will provide about 90MBytes/sec data rate. This > would allow the transfer of the complete data of 23 bunches to the TMS System controller and hence to CERN's systems. > > So some observations: > 1. It would be good to do as much processing as possible as early as possible, in the data chain, to reduce the amount of data being passed though the system. > > 2. It would be good to do some of the main data processing functions on the FPGA if possible. We could, for example, implement the mean of all bunches and mean of 1 bunch in the FPGA. Perhaps we could use two block RAMS for this calculated data one for the mean of all bunches and one for a defined function which could be the mean of one bunch. New defined functions could then be added in the future. > > 3. Some processing could be done on the cPCI module controllers. These have been specified as using Mobile Pentium 4 processors. We may be able to use faster processors on these to improve the processing performance at the expense of heat if required. > > 4. We should make sure the cPCI module controllers have a PMC site allowing an Alpha Data FPGA module to be installed in the future for faster data post processing. > > 5. The cPCI Module Controllers software API should allow for software post processing algorithms on the Module Controllers together with a "plugin" interface to allow additional post processing algorithms to be installed. > > 6. It would probably be best to use the MMX instruction set on the > cPCI module controller for data post processing were possible. MMX accesses and manipulates data in 64bit chunks. So having each Sigma,DeltaX,DeltaY element padded to 64 bits could well be useful. > It might be useful, in the future, to be able to pack data from multiple bunches into single 64bit values. For example the DeltaX values from bunches 0,1,2 and 3 could be stored in an array of 64bit values for > easy MMX manipulation. > > 7. If we use a Data Pattern Engine to write data into block RAM where it is then DMA'ed to the host then we will need to DMA to the host in relatively small chunk sizes, say 64KBytes. At 200 MBytes/sec this gives a host interrupt and DMA setup rate of 3.2KHz or 312us between chunks. > There will be some DMA and interrupt latency involver here as well a CPU task switch times. This may restrict the overall data rate we can achieve. It may be better to use large size Data Pattern Engine buffers (could be in DDR) or to use a demand or master DMA scheme to reduce this overhead. > > 8. The system controller has two Gigabit Ethernet interfaces. It would be worth making sure the systems could be upgraded to 4 Gigabit Ethernet ports in the future. > > Any observations comments ? > Hi Terry, Sorry for the delay. Here we go: I fully agree that data processing (and therefore data reduction) should be done as close to the source as practical. That is, provided it does not unduly increase complexity or memory usage. Instead of the calculation of mean values, I'd rather use low-pass filters, which is much easier to do in FPGA hardware. There is no intention to attempt to transfer all of the data off the acquisition plug-ins all the time. Most measurements require the transfer of only small amounts of data. Of course, this is modelled somewhat on the usage we see of the current system, which is very sluggish indeed. I fully expect that a more lively TMS will see increased use rather quickly. I believe we should transfer data between PUPE and system controller on demand only. However, the system controller would probably keep a cache of interesting or frequently made measurements, which it updates regularly, and that would probably be the main source of demands. The data being delivered to the accelerator controls network is still more reduced, consisting only of measurements actually requested by machine operators. The largest demand on bandwidth that I envisaged in the specs was the data required for phase space pictures. That was about 100k measurements on two PUs, or about 1.2MB total. Add another 33% for sub-optimal packing, and it still seems that it would be trivially easy to do. We have some room for expansion. Concerning the organisation of data in memory: If I understand correctly, MMX instructions implement to some extent the SIMD model of data processing? If we can exploit that by judiciously packing data into 64 bit words, of course, that looks advantageous. Another issue I'd like to keep in mind is that I may want to increase the persistence of data by storing only the RF buckets actually containing beam. That would be done by setting the Gate column of the phase table so that it produces gates only for filled RF buckets. Of course, that will have its repercussions on the memory lay-out too. I like the idea of being able to plug in low-level post-processing algorithms on demand, but let's be careful not to get carried away, adding ever more fancy features. ;-) Regarding the cPCI module controllers, what kind of programming environment can we expect? A C cross compiler? Would these things run an OS? Best regards, Jeroen Belleman