Pandas_examples_with_plots.ipynb Open in SWAN Download

Pandas DataFrame Examples with Plots

Pandas is a popular data manipulation library in Python

In [1]:
import pandas as pd

Creating and Exploring Pandas DataFrames

The primary data structure in Pandas is the DataFrame, which represents tabular data. You can create a DataFrame from various data sources, such as CSV files, Excel files, SQL databases, or by using Python lists or dictionaries.
Here's an example of creating a DataFrame from a Python dictionary:

In [2]:
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Tom', 'Eve', 'Frank'],
        'Age': [25, 30, 35, 42, 51, 60],
        'City': ['New York', 'London', 'Paris', 'New York', 'London', 'Tokyo']}
df = pd.DataFrame(data)

Viewing the data

In [3]:
df.head()  # Returns the first few rows of the DataFrame.
Out[3]:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 Tom 42 New York
4 Eve 51 London
In [4]:
# Returns the last few rows of the DataFrame.
df.tail(3)
Out[4]:
Name Age City
3 Tom 42 New York
4 Eve 51 London
5 Frank 60 Tokyo
In [5]:
# Returns the dimensions of the DataFrame (number of rows, number of columns).
df.shape
Out[5]:
(6, 3)
In [6]:
# Returns the column names of the DataFrame.
df.columns
Out[6]:
Index(['Name', 'Age', 'City'], dtype='object')

Accessing and selecting data

In [7]:
# Accesses a specific column by name.
# df['ColumnName'] 

df['Name']
Out[7]:
0      Alice
1        Bob
2    Charlie
3        Tom
4        Eve
5      Frank
Name: Name, dtype: object
In [8]:
# Accesses a specific element using row index and column name.
# df.loc[row_index, column_name] 

df.loc[0, 'Name']
Out[8]:
'Alice'

Data summary

In [9]:
df.describe()
Out[9]:
Age
count 6.000000
mean 40.500000
std 13.217413
min 25.000000
25% 31.250000
50% 38.500000
75% 48.750000
max 60.000000
In [10]:
# Counts unique values in a column
# df['ColumnName'].value_counts()

df['Name'].value_counts()
Out[10]:
Alice      1
Bob        1
Charlie    1
Tom        1
Eve        1
Frank      1
Name: Name, dtype: int64
In [11]:
# Groups data by a column and calculates the mean of Age for each group
# df.groupby('ColumnName').mean()

df.groupby('City')['Age'].mean()
Out[11]:
City
London      40.5
New York    33.5
Paris       35.0
Tokyo       60.0
Name: Age, dtype: float64

Data manipulation

df['NewColumn'] = ... - Creates a new column based on existing data or calculations.
df.drop('ColumnName', axis=1) - Removes a column from the DataFrame.
df.sort_values('ColumnName') - Sorts the DataFrame by a column.
df.fillna(value) - Fills missing values with a specified value.

Reading and Writing Data

In [12]:
# write the DataFrame to a csv file
df.to_csv("myfile.csv")
In [13]:
# Read a csv file into a Pandas DataFrame
df2 = pd.read_csv("myfile.csv")
df2.head()
Out[13]:
Unnamed: 0 Name Age City
0 0 Alice 25 New York
1 1 Bob 30 London
2 2 Charlie 35 Paris
3 3 Tom 42 New York
4 4 Eve 51 London

Plotting examples

Pandas provides convenient integration with Matplotlib, a popular plotting library in Python. Here are a few examples of data plotting using Pandas

Line Plot

You can create a line plot using the plot method of a DataFrame. Here's an example of plotting a line graph to visualize the trend of a numerical column over time:

In [17]:
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams.update({'font.size': 20, 'figure.figsize': [14,10]})

# Create a DataFrame with time-series data
data = {'Date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01'],
        'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)

# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Plot the line graph
plt.plot(df['Date'], df['Value'])
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Line Plot')
plt.show()
No description has been provided for this image

Bar Plot

Pandas makes it easy to create bar plots using the plot method. Here's an example of plotting a bar chart to compare values in different categories:

In [18]:
import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame with categorical data
data = {'Category': ['A', 'B', 'C', 'D'],
        'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)

# Plot the bar chart
plt.bar(df['Category'], df['Value'])
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar Plot')
plt.show()
No description has been provided for this image

Histogram

Pandas provides a simple way to create histograms using the plot method. Here's an example of plotting a histogram to visualize the distribution of a numerical column:

In [19]:
import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame with numerical data
data = {'Value': [10, 20, 15, 25, 30, 40, 35, 50]}
df = pd.DataFrame(data)

# Plot the histogram
df['Value'].plot.hist(bins=10)  # 'bins' specifies the number of bins or intervals
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
No description has been provided for this image
In [ ]: