CMS Data Analysis School Pre-Exercises - Second Set
Overview
Teaching: 0 min
Exercises: 30 minQuestions
How to slim a MiniAOD file?
How to know the size of a MiniAOD file?
How to use FWLite to analyze data and MC?
Objectives
Learn how to reduce the size of a MiniAOD by only keeping physics objects of interest.
Learn how to determine the size of a MiniAOD file using EDM standalone utilities
Learn to use FWLite to perform simple analysis.
Introduction
Welcome to the second set of CMSDAS pre-exercises. As you know by now, the purpose of the pre-workshop exercises is for prospective workshop attendees to become familiar with the basic software tools required to perform physics analysis at CMS before the workshop begins. Post the answers in the online response form available from the course web area:
Indico page
The Second Set of exercises begins with Exercise 7
. We will use Collision data events and simulated events (Monte Carlo (MC)). To comfortably work with these files, we will first make them smaller by selecting only the objects that we are interested in (electrons and muons in our case)
The collision data events are stored in DoubleMuon.root
. DoubleMuon refers here to the fact, that when recording these events, we believed that there are two muons in the event. This is true most of the time, but other objects can fake muons, hence at closer inspection we might find events that actually don’t have two muons.
The MC file is called DYJetsToLL. You will need to get used to cryptic names like this if you want to survive in the high energy physics environment! The MC file contains Drell Yan events, that decay to two leptons and that might be accompanied by one or several jets.
Exercises 8
and Exercise 9
are using FWLite (Frame Work Lite). This is an interactive analysis tool integrated with the CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the shared libraries defining CMSSW data formats and the tools provided, to easily access parts of the event in the EDM format within ROOT interactive sessions. It reads produced ROOT files, has full access to the class methods and there is no need to write full-blown framework modules. Thus having FWLite distribution locally on the desktop one can do CMS analysis outside the full CMSSW framework. In these two exercises, we will analyze the data stored in a MiniAOD sample using FWLite. We will loop over muons and make a Z mass peak.
We assume that having done the first set of pre-exercises by now, one is comfortable with logging onto cmslpc-sl7.fnal.gov
and setting up the cms environment.
Exercise 7 - Slim MiniAOD sample to reduce its size by keeping only Muon and Electron branches
In order to reduce the size of the MiniAOD we would like to keep only the slimmedMuons and slimmedElectrons objects and drop all others. The config files should now look like slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py. To work with this config file and make the slim MiniAOD, execute the following steps in the directory YOURWORKINGAREA/CMSSW_10_6_18/src
Cut and paste the script slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py in its entirety and save it with the same name. Open with your favorite editor and take a look at these python files. The number of events has been set to 1000:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1000) )
To run over all events in the sample, one can change it to -1.
Now run the following command:
cmsRun slimMiniAOD_MC_MuEle_cfg.py
This produces an output file called slimMiniAOD_MC_MuEle.root
in your $CMSSW_BASE/src
area.
Now run the following command:
cmsRun slimMiniAOD_data_MuEle_cfg.py
This produces an output file called slimMiniAOD_data_MuEle.root
in your $CMSSW_BASE/src
area.
On opening these two MiniAODs one observes that only the slimmedMuons and the slimmedElectrons objects are retained as intended.
To find the size of your MiniAOD execute following Linux command:
ls -lh slimMiniAOD_MC_MuEle.root
and
ls -lh slimMiniAOD_data_MuEle.root
You may also try the following:
To know the size of each branch, use the edmEventSize
utility as follows (also explained in First Set of Exercises):
edmEventSize -v slimMiniAOD_MC_MuEle.root
and
edmEventSize -v slimMiniAOD_data_MuEle.root
To see what objects there are, open the ROOT file as follows and browse to the MiniAOD samples as you did in Exercise 6:
Here is how you do it for the output file slimMiniAOD_MC_MuEle.root
root -l slimMiniAOD_MC_MuEle.root;
TBrowser b;
OR
root -l
TFile *theFile = TFile::Open("slimMiniAOD_MC_MuEle.root");
TBrowser b;
To quit ROOT application, execute:
.q
Remember
For CMSDAS@LPC2023 please submit your answers at the Google Form second set.
Question 7.1a
What is the size of the MiniAOD
slimMiniAOD_MC_MuEle.root
in MB? Make sure your answer is only numerical (no units).
Question 7.1b
What is the size of the MiniAOD
slimMiniAOD_data_MuEle.root
in MB? Make sure your answer is only numerical (no units).
Question 7.2a
What is the mean eta of the muons for MC?
Question 7.2b
What is the mean eta of the muons for data?
Question 7.3a
What is the size of the slimmed output file compared to the original sample?
Compare one of your slimmed output files to the original MiniAOD file it came from. To find sizes in EOS, you can use eosls -alh /store/user/filepath/filename.root
with the appropriate username, path and filename.
Question 7.3b
Is the mean eta for muons for MC and data the same as in the original sample in Exercise 6?
Exercise 8 - Use FWLite on the MiniAOD created in Exercise 7 and make a Z Peak (applying pt
and eta
cuts)
FWLite (pronounced “framework-light”) is basically a ROOT session with CMS data format libraries loaded. CMS uses ROOT to persistify data objects. CMS data formats are thus “ROOT-aware”; that is, once the shared libraries containing the ROOT-friendly description of CMS data formats are loaded into a ROOT session, these objects can be accessed and used directly from within ROOT like any other ROOT class!
In addition, CMS provides a couple of classes that greatly simplify the access to the collections of CMS data objects. Moreover, these classes (Event and Handle) have the same name as analogous ones in the Full Framework; this mnemonic trick helps in making the code to access CMS collections very similar between the FWLite and the Full Framework.
In this exercise we will make a ZPeak
using our data and MC sample. We will use the corresponding slim MiniAOD created in Exercise 7. To read more about FWLite, have a look at Section 3.5
of Chapter 3
of the WorkBook.
We will first make a ZPeak
. We will loop over the slimmedMuons
in the MiniAOD and get the mass of oppositely charged muons. These are filled in a histogram that is written to an output ROOT file.
First make sure that you have the MiniAODs created in Exercise 7. They should be called slimMiniAOD_MC_MuEle.root
and slimMiniAOD_data_MuEle.root
.
Go to the src area of current CMSSW release
cd $CMSSW_BASE/src
The environment variable CMSSW_BASE will point to the base area of current CMSSW release.
Check out a package from GitHub.
Make sure that you get github setup properly as in obtain a GitHub account. It’s particularly important to set up ssh keys so that you can check out code without problems: https://help.github.com/articles/generating-ssh-keys
To check out the package, run:
git cms-addpkg PhysicsTools/FWLite
Then to compile the packages, do
scram b
cmsenv
Note
You can try
scram b -j 4
to speed up the compiling. Here-j 4
will compile with 4 cores. When occupying several cores to compile, you will also make the interactive machine slower for others, since you are using more resources. Use with care!
Note 2
It is necessary to call cmsenv again after compiling this package because it adds executables in the
$CMSSW_BASE/bin
area.
To make a Z peak, we will use the FWLite executable called FWLiteHistograms
. The corresponding code should be in $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
With this executable we will use the command line options. More about these can be learned from SWGuideCommandLineParsing.
To make a ZPeak
from this executable, using the MC MiniAOD, run the following command (which will not work out of the box, see below):
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100
You can see that you will get the following error
terminate called after throwing an instance of 'cms::Exception'
what(): An exception of category 'ProductNotFound' occurred.
Exception Message:
getByLabel: Found zero products matching all criteria
Looking for type: edm::Wrapper<std::vector<reco::Muon> >
Looking for module label: muons
Looking for productInstanceName:
The data is registered in the file but is not available for this event
This error occurs because your input files slimMiniAOD_MC_MuEle.root
is a MiniAOD and does not contain reco::Muon whose label is muons. It contains, however, slimmedMuons (check yourself by opening the root file with ROOT browser). However, in the code FWLiteHistograms.cc there are lines that say:
using reco::Muon;
and
event.getByLabel(std::string("muons"), muons);
This means you need to change reco::Muon
to pat::Muon
, and muons
to slimmedMuons
.
To implement these changes, open the code $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
. In this code, look at the line that says:
using reco::Muon;
and change it to
using pat::Muon;
and in this:
event.getByLabel(std::string("muons"), muons);
and change it to:
event.getByLabel(std::string("slimmedMuons"), muons);
Now you need to re-compile:
scram b
Now again run the executable as follows:
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100
You can see that now it runs successfully and you get a ROOT file with a histogram called ZPeak_MC.root. Open this ROOT file and see the Z mass peak histogram called mumuMass. Answer the following question.
Question 8.1a
What is mean mass of the ZPeak for your MC MiniAOD?
Question 8.1b
How can you increase statistics in your ZPeak histogram?
Now a little bit about the command that you executed.
In the command above, it is obvious that slimMiniAOD_MC_MuEle.root
is the input file, ZPeak_MC.root
is output file. maxEvents
is the events you want to run over. You can change it any other number. The option -1
means running over all the events which is 1000 in this case. outputEvery
means after how any events should the code report the number of event being processed. As you may have noticed, as you specified, when your executable runs, it says processing event:
after every 100 events.
If you look at the code FWLiteHistograms.cc , it also contains the defaults corresponding to the above command line options. Answer the following question:
Question 8.2
What is the default name of the output file?
Exercise 9 - Re-run the above executable with the data MiniAOD
Re-run the above executable with the data MiniAOD file called slimMiniAOD_data_MuEle.root
as follows:
FWLiteHistograms inputFiles=slimMiniAOD_data_MuEle.root outputFile=ZPeak_data.root maxEvents=-1 outputEvery=100
This will create an output histogram ROOT
file called ZPeak_data.root
Then answer the following question.
Question 9a
What is mean mass of the ZPeak for your data MiniAOD?
Question 9b
How can you increase statistics in your ZPeak histogram?
Key Points
A MiniAOD file can be
slimmed
by just retaining physics objects of interest.EDM standalone utilities can be used to determine the size of MiniAOD files.
FWLite is a useful tool to perform simple analysis on a MiniAOD file.