{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "heading_collapsed": true }, "source": [ "# Measuring Matter Antimatter Asymmetries at the Large Hadron Collider" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "![](http://lhcb-public.web.cern.ch/lhcb-public/en/LHCb-outreach/multimedia/LHCbDetectorpnglight1.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Apache Spark and High Energy Physics Data Analysis\n", "## An example using LHCb open data\n", "\n", "This notebook is an example of how to use Spark to perform a simple analysis using high energy physics data from a LHC experiment. \n", "The exercises, figures and data are from original work developed and published by the **LHCb collaboration** as part of the **opendata** and outreach efforts (see credits below). \n", "**Prerequisites** - This work is intended to be accessible to an audience with some familiarity with data analysis in Python and an interest in particle Physics at undergraduate level. \n", "**Technology** - The focus of this notebook is as much on tools and techniques as it is on physics: **Apache Spark** is used for reading and analyzing high energy physics (HEP) data using Python with Pandas and Jupyter notebooks.\n", "\n", "**Credits:**\n", " * The original text of this notebook, including all exercises, analysis, explanations and data have been developed by the LHCb collaboration and are authored and shared by the LHCb collaboration in their opendata project at: \n", " * https://github.com/lhcb/opendata-project/blob/master/LHCb_Open_Data_Project.ipynb\n", " * \"Undergraduate Laboratory Experiment: Measuring Matter Antimatter Asymmetries at the Large Hadron Collide\" https://cds.cern.ch/record/1994172?ln=en\n", " * http://www.hep.manchester.ac.uk/u/parkes/LHCbAntimatterProjectWeb/LHCb_Matter_Antimatter_Asymmetries/Homepage.html \n", "\n", " * The Spark code in this notebook has been developed in the context of the CERN Hadoop and Spark service. \n", "Contact email: Luca.Canali@cern.ch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Get the data\n", "\n", "The original work uses LHCb opendata made available via the CERN opendata portal:\n", " [PhaseSpaceSimulation.root](http://opendata.cern.ch/eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root),\n", " [B2HHH_MagnetDown.root](http://opendata.cern.ch/eos/opendata/lhcb/AntimatterMatters2017/data/B2HHH_MagnetDown.root)\n", " [B2HHH_MagnetUp.root](http://opendata.cern.ch/eos/opendata/lhcb/AntimatterMatters2017/data/B2HHH_MagnetUp.root)\n", "\n", "This notebook uses a version of the data converted to Apache Parquet format and made available in CERN EOS" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Start the Spark Session\n", "# When Using Spark on CERN SWAN, use this and do not select to connect to a CERN Spark cluster\n", "# If you want to use a cluster, please copy the data to a cluster filesystem first\n", "\n", "from pyspark.sql import SparkSession\n", "spark = (SparkSession.builder\n", " .appName(\"LHCb opendata\")\n", " .master(\"local[*]\")\n", " .config(\"spark.driver.memory\", \"2g\")\n", " .getOrCreate()\n", " )\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+------------+\n", "|Hello World!|\n", "+------------+\n", "|Hello World!|\n", "+------------+\n", "\n" ] } ], "source": [ "# Test that Spark SQL works\n", "\n", "sql = spark.sql\n", "sql(\"select 'Hello World!'\").show()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "# Getting Started\n", "\n", "Note: the original text of this exercise in the form relased by LHCb can be found at https://github.com/lhcb/opendata-project\n", "____" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ " Welcome to the first guided LHCb Open Data Portal project! \n", "\n", "
\n", " | B_FlightDistance | \n", "B_VertexChi2 | \n", "H1_PX | \n", "H1_PY | \n", "H1_PZ | \n", "H1_ProbK | \n", "H1_ProbPi | \n", "H1_Charge | \n", "H1_IPChi2 | \n", "H1_isMuon | \n", "... | \n", "H2_IPChi2 | \n", "H2_isMuon | \n", "H3_PX | \n", "H3_PY | \n", "H3_PZ | \n", "H3_ProbK | \n", "H3_ProbPi | \n", "H3_Charge | \n", "H3_IPChi2 | \n", "H3_isMuon | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0.0 | \n", "1.0 | \n", "3551.84 | \n", "1636.96 | \n", "23904.14 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "36100.40 | \n", "16546.83 | \n", "295600.61 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
1 | \n", "0.0 | \n", "1.0 | \n", "-2525.98 | \n", "-5284.05 | \n", "35822.00 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-8648.32 | \n", "-16617.56 | \n", "98535.13 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
2 | \n", "0.0 | \n", "1.0 | \n", "-700.67 | \n", "1299.73 | \n", "8127.76 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-13483.34 | \n", "10860.77 | \n", "79787.59 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "
3 | \n", "0.0 | \n", "1.0 | \n", "3364.63 | \n", "1397.30 | \n", "222815.29 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "1925.16 | \n", "-551.12 | \n", "40420.96 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "
4 | \n", "0.0 | \n", "1.0 | \n", "-581.66 | \n", "-1305.24 | \n", "22249.59 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-2820.04 | \n", "-8305.43 | \n", "250130.00 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
5 | \n", "0.0 | \n", "1.0 | \n", "112.84 | \n", "-13297.98 | \n", "51882.87 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-440.95 | \n", "-13699.42 | \n", "71163.14 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
6 | \n", "0.0 | \n", "1.0 | \n", "5558.97 | \n", "3913.52 | \n", "56981.08 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "3457.70 | \n", "780.13 | \n", "28716.94 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "
7 | \n", "0.0 | \n", "1.0 | \n", "-15208.03 | \n", "-1783.93 | \n", "265210.55 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-4478.67 | \n", "-164.39 | \n", "71498.09 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "
8 | \n", "0.0 | \n", "1.0 | \n", "-109.04 | \n", "8239.25 | \n", "191486.94 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "-2083.59 | \n", "11359.35 | \n", "192297.67 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
9 | \n", "0.0 | \n", "1.0 | \n", "15175.26 | \n", "93142.09 | \n", "379269.30 | \n", "1.0 | \n", "0.0 | \n", "1 | \n", "1.0 | \n", "0 | \n", "... | \n", "1.0 | \n", "0 | \n", "3295.84 | \n", "24950.02 | \n", "105990.48 | \n", "1.0 | \n", "0.0 | \n", "-1 | \n", "1.0 | \n", "0 | \n", "
10 rows × 26 columns
\n", "\n", " | B_FlightDistance | \n", "B_VertexChi2 | \n", "H1_PX | \n", "H1_PY | \n", "H1_PZ | \n", "H1_ProbK | \n", "H1_ProbPi | \n", "H1_Charge | \n", "H1_IPChi2 | \n", "H1_isMuon | \n", "... | \n", "H2_IPChi2 | \n", "H2_isMuon | \n", "H3_PX | \n", "H3_PY | \n", "H3_PZ | \n", "H3_ProbK | \n", "H3_ProbPi | \n", "H3_Charge | \n", "H3_IPChi2 | \n", "H3_isMuon | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "6.888037 | \n", "8.426947 | \n", "1207.753798 | \n", "-84.290958 | \n", "10399.473702 | \n", "0.902219 | \n", "0.041574 | \n", "-1 | \n", "998.424410 | \n", "0 | \n", "... | \n", "110.519068 | \n", "0 | \n", "1973.085892 | \n", "-289.150032 | \n", "26771.341608 | \n", "0.915843 | \n", "0.057261 | \n", "1 | \n", "386.493713 | \n", "0 | \n", "
1 | \n", "8.957103 | \n", "3.474719 | \n", "-811.617861 | \n", "-518.300956 | \n", "22338.883014 | \n", "0.942885 | \n", "0.093401 | \n", "1 | \n", "162.677006 | \n", "0 | \n", "... | \n", "1994.119734 | \n", "0 | \n", "-4801.397918 | \n", "1993.340031 | \n", "76466.229808 | \n", "0.806471 | \n", "0.385119 | \n", "-1 | \n", "158.823018 | \n", "0 | \n", "
2 | \n", "9.100733 | \n", "4.113123 | \n", "-2007.925735 | \n", "-2555.080382 | \n", "26601.465958 | \n", "0.917661 | \n", "0.077237 | \n", "-1 | \n", "1355.611615 | \n", "0 | \n", "... | \n", "8500.518262 | \n", "0 | \n", "-1260.859080 | \n", "-2824.663002 | \n", "22365.178510 | \n", "0.947676 | \n", "0.097263 | \n", "1 | \n", "352.235461 | \n", "0 | \n", "
3 | \n", "11.077374 | \n", "2.360357 | \n", "1408.170513 | \n", "-1372.864558 | \n", "66357.093308 | \n", "0.785618 | \n", "0.119467 | \n", "-1 | \n", "2.799029 | \n", "0 | \n", "... | \n", "564.019813 | \n", "0 | \n", "2171.855775 | \n", "-1964.419835 | \n", "92096.742555 | \n", "0.560237 | \n", "0.070540 | \n", "1 | \n", "44.498271 | \n", "0 | \n", "
4 | \n", "17.743006 | \n", "5.116309 | \n", "1457.671574 | \n", "1311.684099 | \n", "8551.692070 | \n", "0.783945 | \n", "0.029395 | \n", "1 | \n", "18266.642863 | \n", "0 | \n", "... | \n", "1098.894225 | \n", "0 | \n", "10985.230400 | \n", "1271.856077 | \n", "62682.682662 | \n", "0.576559 | \n", "0.455894 | \n", "-1 | \n", "360.444011 | \n", "0 | \n", "
5 | \n", "11.554098 | \n", "1.120899 | \n", "-77.919930 | \n", "598.047768 | \n", "18486.604881 | \n", "0.937157 | \n", "0.227115 | \n", "1 | \n", "89.128636 | \n", "0 | \n", "... | \n", "2387.913090 | \n", "0 | \n", "-3786.001956 | \n", "-3050.190096 | \n", "49924.677288 | \n", "0.940656 | \n", "0.096659 | \n", "-1 | \n", "827.264326 | \n", "0 | \n", "
6 | \n", "8.296893 | \n", "7.380471 | \n", "970.351808 | \n", "-490.045926 | \n", "27929.242265 | \n", "0.966135 | \n", "0.095613 | \n", "1 | \n", "135.543971 | \n", "0 | \n", "... | \n", "944.174083 | \n", "0 | \n", "1618.033440 | \n", "-1593.587768 | \n", "45253.841121 | \n", "0.959964 | \n", "0.098093 | \n", "-1 | \n", "260.910241 | \n", "0 | \n", "
7 | \n", "15.875883 | \n", "1.656760 | \n", "2336.165388 | \n", "4166.002188 | \n", "35728.525679 | \n", "0.946878 | \n", "0.060513 | \n", "-1 | \n", "3321.946818 | \n", "0 | \n", "... | \n", "1942.129052 | \n", "0 | \n", "-1708.189185 | \n", "2517.048779 | \n", "27592.747481 | \n", "0.961387 | \n", "0.125535 | \n", "1 | \n", "7027.069112 | \n", "0 | \n", "
8 | \n", "12.265774 | \n", "5.394378 | \n", "2606.155962 | \n", "-4657.669797 | \n", "99824.722050 | \n", "0.755108 | \n", "0.403123 | \n", "1 | \n", "153.705895 | \n", "0 | \n", "... | \n", "4.976051 | \n", "0 | \n", "961.067631 | \n", "892.008741 | \n", "12739.165721 | \n", "0.630626 | \n", "0.030043 | \n", "-1 | \n", "4351.779561 | \n", "0 | \n", "
9 | \n", "7.960260 | \n", "9.528332 | \n", "-1740.175238 | \n", "1060.634895 | \n", "76542.224969 | \n", "0.823950 | \n", "0.096627 | \n", "-1 | \n", "28.678577 | \n", "0 | \n", "... | \n", "309.280125 | \n", "0 | \n", "-2026.920178 | \n", "887.978374 | \n", "47609.309664 | \n", "0.972340 | \n", "0.161073 | \n", "1 | \n", "193.376446 | \n", "0 | \n", "
10 rows × 26 columns
\n", "\n", " | Energy_K1_K2 | \n", "P_K1_K2 | \n", "Energy_K1_K3 | \n", "P_K1_K3 | \n", "Energy_K2_K3 | \n", "P_K2_K3 | \n", "H1_Charge | \n", "H2_Charge | \n", "H3_Charge | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "59312.848237 | \n", "59211.181509 | \n", "37331.391551 | \n", "37308.534012 | \n", "75681.554420 | \n", "75585.380172 | \n", "-1 | \n", "1 | \n", "1 | \n", "
1 | \n", "40065.665041 | \n", "40022.232171 | \n", "99009.419207 | \n", "98975.411119 | \n", "94344.925710 | \n", "94244.771077 | \n", "1 | \n", "-1 | \n", "-1 | \n", "
2 | \n", "41944.817064 | \n", "41751.672744 | \n", "49387.243264 | \n", "49369.614780 | \n", "37724.526918 | \n", "37581.719924 | \n", "-1 | \n", "1 | \n", "1 | \n", "
3 | \n", "115729.095355 | \n", "115680.388609 | \n", "158532.678053 | \n", "158529.404702 | \n", "141485.642261 | \n", "141426.952947 | \n", "-1 | \n", "1 | \n", "1 | \n", "
4 | \n", "53399.569782 | \n", "53304.215413 | \n", "72440.132643 | \n", "72359.081204 | \n", "108264.666058 | \n", "108244.552904 | \n", "1 | \n", "-1 | \n", "-1 | \n", "