Visiting and caching classes and analyses
Document history:
DHH 7/3/2006: Created
Goal: implement a mechanism for visiting classes (and caching
classfile representations and analyses) that is not tightly coupled
to any particular bytecode framework.
*** Avoid duplicating work if possible.
*** "Multicast" visitors?
Sketch of algorithm
1. user provides list of codebases to analyze, along with
the aux classpath and source path
2. scan to identify classes within the codebases
Need to scan enough information to build global type repository:
i.e., identify supertype/subtype relationships.
3. compute execution plan
4. run execution plan
*** detectors (really, visitors) are invoked with the
_unique descriptor_ of the class being visited.
(Should identify class name and code base)
*** things a detector can do (these are not mutually exclusive: any combination
is possible):
1. Register as wanting to inspect the class using a DismantleBytecode.
Initially, we can create one DismantleBytecode per detector.
Later we can group detectors wanting to dismantle bytecode that
are contiguous in the execution plan in order to "multicast"
the classfile visitation/dismantling.
[Maybe we should eliminate the distinction between BetterVisitor/
PreorderVisitor/DismantleBytecode, and just have a single implementation
that dismantles bytecode. Detectors that don't care about bytecode
can simply have a no-op sawOpcode(), etc.]
Note that detectors do not extend DismantleBytecode. Instead
they implement an interface (describing the visitation methods)
and register themselves with a DismantleBytecode object.
(Internally, the DismantleBytecode object has a list of visitors (detectors)
that are called back on each event.)
2. Request a tree-based representation of the class. I guess
we will have to develop our own framework-independent
representation.
3. Request analyses of the class or methods in the class.
These are cached to avoid computing them redundantly.
Classes and infrastructure needed:
CodeBase (a jar file, nested jar file, directory, URL, etc.)
ClassPath (a list of CodeBases)
ClassDescriptor (uniquely identify a class in a CodeBase)
MethodDescriptor (uniquely identify a method in a class)
FieldDescriptor (uniquely identify a field in a class)
TypeDescriptor (a type)
TypeRepository (global type repository)
IClassVisitor (interface defining vistation methods)
IDismantleBytecode (interface allowing an IClassVisitor to register
itself as wanting to dismantle the class that the
IDismantleBytecode object represents)
*** perhaps we should call this something different to distinguish
it from the previous inheritance-based DismantleBytecode class
BetterVisitor
PreorderVisitor
DismantleBytecode
--- These are all abstract implementations of IClassVisitor.
They can be subclassed by detectors to ease the transition
away from the current (BCEL-based) visitor framework.
--- better idea: one class called AbstractClassVisitor that is
a no-op implementation of IClassVisitor. All
visitor-based detectors change to inherit from AbstractClassVisitor
(or possibly a subclass that automates some of the work
of connecting to the IDismantleBytecode object).
ASMDismantleBytecode (dismantle bytecode using the ASM framework)