Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[cdt-dev] Fwd: Opening and reading less to build the CModel

Cross Posted to cdt-core

 Looking at the project creation time I was looking at ways to 
 "defer" some of the IO action which occurs when you open/close/
 import a project. Using the ELF parser as an example this is 
 what happens when you look at a particular file:
 
 
 if(fileHasSourceEnding()) {
  --> Optimized import of source creates object and early outs here.
 } else {
  --> Extrace the binary parser
  --> Open file, read 128 bytes, close file
  --> Pass the 128 bytes and filename to binary parser to determine
      if this file is a binary object or not
  if(isABinaryObjectSaysTheParser) {
   --> Pass the filename to the binary parser to create an object
       to put into the CModel for this entry
   --> For an ELF object this means:
     --> Extract the Elf.Attributes to determine if this is a EXE,LIB,etc
       --> Opens the file, reads the ELF header, closes the file
     --> Create the appropriate container
  }
 }
 
 Obviously you have to look at the contents in many cases to get an 
 idea of what the file really is, its architectural attributes etc.
 The question is, how can we avoid doing this on project creation,
 and can we avoid re-reading the same data over and over again?
 
 In order to optimize this particular case, and to see what effect
 it might have, I did a couple of things:
 
 - Only do the binary searching for extensions we know are likely
   to contain binary things: {.o,.a,.so,.lib,.exe,.com,.dll ...}
   Minor gain, likely not significant with my particular example
   compared to extra overhead.
 
 - Cache the results of the array passed to the binary parser if
   the match was successfull in anticipation of being asked to 
   create an object.
   In the ELFParser.getBinary() check the cached object and if
   it matches, use the data array to attempt to extract the information
   needed to create an IBinaryFile()
 
 The results of these two changes:
 
 Old Project Open Time: 6minutes 10sec
 New Project Open Time: 3minutes 30sec
 
 Of course since we aren't having to go to disk twice, just once,
 and this is the major cost in opening up a new project, the "halving"
 factor is about what I expected.
 
 Thoughts and comments?  For 2.0 I think that there are a couple of 
 things we should consider, other than the backgrounding of this 
 activity which may not be possible:
 
 - Creating a virtual IBinaryFile() container that could defer much of
   this work/IO until it is actually needed.
 - Augmenting the API for binary parsers to be able to take this 
   data cache directly rather than having each one take it directly.
 - Potentially putting in another check earlier than after we read
   the initial 128 bytes so that potentially we can avoid the data
   reading all together like we do with the source files.
 
 Thoughts and comments?
 
 Thanks,
  Thomas ... preparing a 1.2 patch =;-)





Back to the top