{ "cells": [ { "cell_type": "markdown", "id": "c97f8c4d", "metadata": {}, "source": [ "# Histogram of the Dimuon Mass Spectrum\n", "\n", "This implements the dimuon mass spectrum analysis, a \"Hello World!\" example for data analysis in High Energy Physics. It is intended as a technology demonstrator for the use Apache Spark for High Energy Physics.\n", "\n", "The workload and data:\n", " - The input data is a series of candidate muon events. \n", " - The job output is a histogram of the dimuon mass spectrum, where several peaks (resonances) can be identified corresponding to well-know particles (e.g. the Z boson at 91 Gev).\n", " - The computation is based on https://root.cern.ch/doc/master/df102__NanoAODDimuonAnalysis_8C.html and CERN open data from the CMS collaboration linked there. \n", " - See also https://github.com/LucaCanali/Miscellaneous/Spark_Physics\n", " \n", "Author and contact: Luca.Canali@cern.ch \n", "January, 2022" ] }, { "cell_type": "markdown", "id": "22d54a3d", "metadata": {}, "source": [ "## Dimuon mass spectrum calculation with Spark DataFrame API" ] }, { "cell_type": "code", "execution_count": null, "id": "be0c7ca3", "metadata": {}, "outputs": [], "source": [ "#\n", "# Local mode: run this when using CERN SWAN not connected to a cluster \n", "# or run it on a private Jupyter notebook instance\n", "# Dependency: PySpark (use SWAN or pip install pyspark)\n", "#\n", "# For CERN users: when using CERN SWAN connected to a cluster (analytix or cloud resources)\n", "# do not run this but rather click on the (star) button\n", "\n", "# Start the Spark Session\n", "from pyspark.sql import SparkSession\n", "spark = (SparkSession.builder\n", " .appName(\"dimuon mass\")\n", " .master(\"local[*]\")\n", " .config(\"spark.driver.memory\", \"2g\")\n", " .config(\"spark.sql.parquet.enableNestedColumnVectorizedReader\", \"true\")\n", " .config(\"spark.ui.showConsoleProgress\", \"false\")\n", " .getOrCreate()\n", " )" ] }, { "cell_type": "code", "execution_count": 2, "id": "cabed284", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
SparkSession - in-memory
\n", " \n", "SparkContext
\n", "\n", " \n", "\n", "v3.3.1
local[*]
dimuon mass
\n", " | nMuon | \n", "Muon_pt | \n", "Muon_eta | \n", "Muon_phi | \n", "Muon_mass | \n", "Muon_charge | \n", "
---|---|---|---|---|---|---|
0 | \n", "2 | \n", "[10.763696670532227, 15.736522674560547] | \n", "[1.0668272972106934, -0.563786506652832] | \n", "[-0.03427272289991379, 2.5426154136657715] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[-1, -1] | \n", "
1 | \n", "2 | \n", "[10.538490295410156, 16.327096939086914] | \n", "[-0.42778006196022034, 0.34922507405281067] | \n", "[-0.2747921049594879, 2.539781332015991] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[1, -1] | \n", "
2 | \n", "1 | \n", "[3.2753264904022217] | \n", "[2.210855484008789] | \n", "[-1.2234135866165161] | \n", "[0.10565836727619171] | \n", "[1] | \n", "
3 | \n", "4 | \n", "[11.429154396057129, 17.634033203125, 9.624728... | \n", "[-1.5882395505905151, -1.7511844635009766, -1.... | \n", "[-2.0773041248321533, 0.25135836005210876, -2.... | \n", "[0.10565836727619171, 0.10565836727619171, 0.1... | \n", "[1, 1, 1, 1] | \n", "
4 | \n", "4 | \n", "[3.2834417819976807, 3.64400577545166, 32.9112... | \n", "[-2.1724836826324463, -2.18253493309021, -1.12... | \n", "[-2.3700082302093506, -2.3051390647888184, -0.... | \n", "[0.10565836727619171, 0.10565836727619171, 0.1... | \n", "[-1, -1, 1, 1] | \n", "
5 | \n", "3 | \n", "[3.566528081893921, 4.572504043579102, 4.37186... | \n", "[-1.371932029724121, -0.703264594078064, -1.03... | \n", "[-2.9090449810028076, 2.4552080631256104, -3.0... | \n", "[0.10565836727619171, 0.10565836727619171, 0.1... | \n", "[-1, 1, -1] | \n", "
6 | \n", "2 | \n", "[57.6067008972168, 53.04507827758789] | \n", "[-0.5320892930030823, -1.0041686296463013] | \n", "[-0.07179804146289825, 3.089515209197998] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[-1, 1] | \n", "
7 | \n", "2 | \n", "[11.31967544555664, 23.906352996826172] | \n", "[-0.7716585397720337, -0.700996994972229] | \n", "[-2.2452728748321533, -2.1809616088867188] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[1, -1] | \n", "
8 | \n", "2 | \n", "[10.19356918334961, 14.204060554504395] | \n", "[0.4418068528175354, 0.7021172642707825] | \n", "[0.6778520345687866, -2.0344009399414062] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[-1, 1] | \n", "
9 | \n", "2 | \n", "[11.470704078674316, 3.4690065383911133] | \n", "[2.3417420387268066, 2.3523731231689453] | \n", "[3.1309704780578613, 3.0211737155914307] | \n", "[0.10565836727619171, 0.10565836727619171] | \n", "[-1, 1] | \n", "