{
"cells": [
{
"cell_type": "markdown",
"id": "09e245aa",
"metadata": {},
"source": [
"# Pandas DataFrame Examples with Plots\n",
"Pandas is a popular data manipulation library in Python"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9db67c57",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "b333e49e",
"metadata": {},
"source": [
"## Creating and Exploring Pandas DataFrames"
]
},
{
"cell_type": "markdown",
"id": "9a73534e",
"metadata": {},
"source": [
"The primary data structure in Pandas is the DataFrame, which represents tabular data. You can create a DataFrame from various data sources, such as CSV files, Excel files, SQL databases, or by using Python lists or dictionaries. \n",
"Here's an example of creating a DataFrame from a Python dictionary:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8bd230c2",
"metadata": {},
"outputs": [],
"source": [
"data = {'Name': ['Alice', 'Bob', 'Charlie', 'Tom', 'Eve', 'Frank'],\n",
" 'Age': [25, 30, 35, 42, 51, 60],\n",
" 'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Tokyo']}\n",
"df = pd.DataFrame(data)"
]
},
{
"cell_type": "markdown",
"id": "4dec4a08",
"metadata": {},
"source": [
"## Viewing the data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9fcd1e41",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Age | \n",
" City | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Alice | \n",
" 25 | \n",
" New York | \n",
"
\n",
" \n",
" 1 | \n",
" Bob | \n",
" 30 | \n",
" London | \n",
"
\n",
" \n",
" 2 | \n",
" Charlie | \n",
" 35 | \n",
" Paris | \n",
"
\n",
" \n",
" 3 | \n",
" Tom | \n",
" 42 | \n",
" New York | \n",
"
\n",
" \n",
" 4 | \n",
" Eve | \n",
" 51 | \n",
" London | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Age City\n",
"0 Alice 25 New York\n",
"1 Bob 30 London\n",
"2 Charlie 35 Paris\n",
"3 Tom 42 New York\n",
"4 Eve 51 London"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head() # Returns the first few rows of the DataFrame.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "04dbe70c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Age | \n",
" City | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" Tom | \n",
" 42 | \n",
" New York | \n",
"
\n",
" \n",
" 4 | \n",
" Eve | \n",
" 51 | \n",
" London | \n",
"
\n",
" \n",
" 5 | \n",
" Frank | \n",
" 60 | \n",
" Tokyo | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Age City\n",
"3 Tom 42 New York\n",
"4 Eve 51 London\n",
"5 Frank 60 Tokyo"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns the last few rows of the DataFrame.\n",
"df.tail(3) \n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3cc77f8f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6, 3)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns the dimensions of the DataFrame (number of rows, number of columns).\n",
"df.shape \n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f9ad1ec3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Name', 'Age', 'City'], dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns the column names of the DataFrame.\n",
"df.columns "
]
},
{
"cell_type": "markdown",
"id": "dbc7451a",
"metadata": {},
"source": [
"## Accessing and selecting data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "44477572",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Alice\n",
"1 Bob\n",
"2 Charlie\n",
"3 Tom\n",
"4 Eve\n",
"5 Frank\n",
"Name: Name, dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Accesses a specific column by name.\n",
"# df['ColumnName'] \n",
"\n",
"df['Name'] "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b8d8336a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Alice'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Accesses a specific element using row index and column name.\n",
"# df.loc[row_index, column_name] \n",
"\n",
"df.loc[0, 'Name'] "
]
},
{
"cell_type": "markdown",
"id": "d69a33a5",
"metadata": {},
"source": [
"## Data summary"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "eb0ec27e",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 6.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 40.500000 | \n",
"
\n",
" \n",
" std | \n",
" 13.217413 | \n",
"
\n",
" \n",
" min | \n",
" 25.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 31.250000 | \n",
"
\n",
" \n",
" 50% | \n",
" 38.500000 | \n",
"
\n",
" \n",
" 75% | \n",
" 48.750000 | \n",
"
\n",
" \n",
" max | \n",
" 60.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age\n",
"count 6.000000\n",
"mean 40.500000\n",
"std 13.217413\n",
"min 25.000000\n",
"25% 31.250000\n",
"50% 38.500000\n",
"75% 48.750000\n",
"max 60.000000"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2fcbc7d2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Alice 1\n",
"Bob 1\n",
"Charlie 1\n",
"Tom 1\n",
"Eve 1\n",
"Frank 1\n",
"Name: Name, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Counts unique values in a column\n",
"# df['ColumnName'].value_counts()\n",
"\n",
"df['Name'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "8cb0e141",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"City\n",
"London 40.5\n",
"New York 33.5\n",
"Paris 35.0\n",
"Tokyo 60.0\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Groups data by a column and calculates the mean of Age for each group\n",
"# df.groupby('ColumnName').mean()\n",
"\n",
"df.groupby('City')['Age'].mean()"
]
},
{
"cell_type": "markdown",
"id": "0f10db66",
"metadata": {},
"source": [
"## Data manipulation"
]
},
{
"cell_type": "raw",
"id": "ba9bef94",
"metadata": {},
"source": [
"df['NewColumn'] = ... - Creates a new column based on existing data or calculations. \n",
"df.drop('ColumnName', axis=1) - Removes a column from the DataFrame. \n",
"df.sort_values('ColumnName') - Sorts the DataFrame by a column. \n",
"df.fillna(value) - Fills missing values with a specified value. "
]
},
{
"cell_type": "markdown",
"id": "89793f9a",
"metadata": {},
"source": [
"## Reading and Writing Data"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1645ed3e",
"metadata": {},
"outputs": [],
"source": [
"# write the DataFrame to a csv file\n",
"df.to_csv(\"myfile.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "06eadc0a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" Name | \n",
" Age | \n",
" City | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" Alice | \n",
" 25 | \n",
" New York | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" Bob | \n",
" 30 | \n",
" London | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" Charlie | \n",
" 35 | \n",
" Paris | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" Tom | \n",
" 42 | \n",
" New York | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" Eve | \n",
" 51 | \n",
" London | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 Name Age City\n",
"0 0 Alice 25 New York\n",
"1 1 Bob 30 London\n",
"2 2 Charlie 35 Paris\n",
"3 3 Tom 42 New York\n",
"4 4 Eve 51 London"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Read a csv file into a Pandas DataFrame\n",
"df2 = pd.read_csv(\"myfile.csv\")\n",
"df2.head()"
]
},
{
"cell_type": "markdown",
"id": "e15dcf66",
"metadata": {},
"source": [
"# Plotting examples\n",
"Pandas provides convenient integration with Matplotlib, a popular plotting library in Python. Here are a few examples of data plotting using Pandas"
]
},
{
"cell_type": "markdown",
"id": "ac20da12",
"metadata": {},
"source": [
"## Line Plot\n",
"You can create a line plot using the plot method of a DataFrame. Here's an example of plotting a line graph to visualize the trend of a numerical column over time:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "7b3aacc2",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rcParams.update({'font.size': 20, 'figure.figsize': [14,10]})\n",
"\n",
"# Create a DataFrame with time-series data\n",
"data = {'Date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'],\n",
" 'Value': [10, 20, 15, 25]}\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Convert the 'Date' column to datetime type\n",
"df['Date'] = pd.to_datetime(df['Date'])\n",
"\n",
"# Plot the line graph\n",
"plt.plot(df['Date'], df['Value'])\n",
"plt.xlabel('Date')\n",
"plt.ylabel('Value')\n",
"plt.title('Line Plot')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"id": "23f9c3a2",
"metadata": {},
"source": [
"## Bar Plot\n",
"Pandas makes it easy to create bar plots using the plot method. Here's an example of plotting a bar chart to compare values in different categories:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "23944375",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Create a DataFrame with categorical data\n",
"data = {'Category': ['A', 'B', 'C', 'D'],\n",
" 'Value': [10, 20, 15, 25]}\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Plot the bar chart\n",
"plt.bar(df['Category'], df['Value'])\n",
"plt.xlabel('Category')\n",
"plt.ylabel('Value')\n",
"plt.title('Bar Plot')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "2577cf67",
"metadata": {},
"source": [
"## Histogram\n",
"Pandas provides a simple way to create histograms using the plot method. Here's an example of plotting a histogram to visualize the distribution of a numerical column:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9fa7fb57",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Create a DataFrame with numerical data\n",
"data = {'Value': [10, 20, 15, 25, 30, 40, 35, 50]}\n",
"df = pd.DataFrame(data)\n",
"\n",
"# Plot the histogram\n",
"df['Value'].plot.hist(bins=10) # 'bins' specifies the number of bins or intervals\n",
"plt.xlabel('Value')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Histogram')\n",
"plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffeb79fd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}