from plotly import express as px
import numpy as np
import pandas as pd
import sqlite3
= sqlite3.connect("temps.db") # connect to database
conn = conn.cursor()
cursor = pd.read_csv("temps_stacked.csv")
df1 #df1.to_sql("temperatures", conn, index=False)
= pd.read_csv("countries.csv")
df2 #df2.to_sql("countries", conn, index=False)
= pd.read_csv("station-metadata.csv")
df3 #df3.to_sql("stations", conn, index=False)
# close connection conn.close()
In this post, we will be working with the data from National Oceanic and Atmospheric Administration (NOAA) to create interesting, interactive geographical visualizations.
1. Creating a Database
First, we will connect to a database and create three separate tables (named temperatures
, countries
, and stations
).
- The
temperatures
table will contain station ID, year of measurement, month of measurement, and average temperature. - The
countries
table will contain country names and their corresponding country codes. - The
stations
table will contain station ID, latitude, longitude, elevation, and the name of the station.
# temperatures df1.head()
ID | Year | Month | Temp | |
---|---|---|---|---|
0 | ACW00011604 | 1961 | 1 | -0.89 |
1 | ACW00011604 | 1961 | 2 | 2.36 |
2 | ACW00011604 | 1961 | 3 | 4.72 |
3 | ACW00011604 | 1961 | 4 | 7.73 |
4 | ACW00011604 | 1961 | 5 | 11.28 |
# countries df2.head()
FIPS 10-4 | ISO 3166 | Name | |
---|---|---|---|
0 | AF | AF | Afghanistan |
1 | AX | - | Akrotiri |
2 | AL | AL | Albania |
3 | AG | DZ | Algeria |
4 | AQ | AS | American Samoa |
# stations df3.head()
ID | LATITUDE | LONGITUDE | STNELEV | NAME | |
---|---|---|---|---|---|
0 | ACW00011604 | 57.7667 | 11.8667 | 18.0 | SAVE |
1 | AE000041196 | 25.3330 | 55.5170 | 34.0 | SHARJAH_INTER_AIRP |
2 | AEM00041184 | 25.6170 | 55.9330 | 31.0 | RAS_AL_KHAIMAH_INTE |
3 | AEM00041194 | 25.2550 | 55.3640 | 10.4 | DUBAI_INTL |
4 | AEM00041216 | 24.4300 | 54.4700 | 3.0 | ABU_DHABI_BATEEN_AIR |
2. Query Function
Now, we will write a query function query_climate_database()
, which filters and returns a new dataframe based on the user’s input. The user will specify the country, starting year, ending year, and the month. Then, the user will receive a new dataframe matching all the inputted information as well as the station name, latitude, longitude, and the average temperature at the station during the specific year and month.
def query_climate_database(country, year_begin, year_end, month):
"""
Query function to filter data
Args:
country: the country of interest
year_begin: earliest year for which should be returned
year_end: latest year for which should be returned
month: a specific month of the year
Return:
a dataframe containing all matches
"""
= sqlite3.connect("temps.db")
conn # use f-string
# left-join stations and countries using country codes
# left-join stations and temperatures using station IDs
= f"SELECT S.NAME, S.LATITUDE, S.LONGITUDE, C.Name Country, T.Year, T.Month, T.Temp \
cmd FROM stations S \
LEFT JOIN countries C on SUBSTRING(S.id, 1, 2) = C.'FIPS 10-4' \
LEFT JOIN temperatures t on S.id = T.id \
WHERE C.Name = '{country}' AND T.Year BETWEEN {year_begin} AND {year_end} AND T.Month = {month}"
= pd.read_sql_query(cmd, conn)
df
conn.close()return df
An example with the function is shown below with the inputs "India"
, 1980
, 2020
, and 1
.
= "India",
query_climate_database(country = 1980,
year_begin = 2020,
year_end = 1) month
NAME | LATITUDE | LONGITUDE | Country | Year | Month | Temp | |
---|---|---|---|---|---|---|---|
0 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1980 | 1 | 23.48 |
1 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1981 | 1 | 24.57 |
2 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1982 | 1 | 24.19 |
3 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1983 | 1 | 23.51 |
4 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1984 | 1 | 24.81 |
... | ... | ... | ... | ... | ... | ... | ... |
3147 | DARJEELING | 27.050 | 88.270 | India | 1983 | 1 | 5.10 |
3148 | DARJEELING | 27.050 | 88.270 | India | 1986 | 1 | 6.90 |
3149 | DARJEELING | 27.050 | 88.270 | India | 1994 | 1 | 8.10 |
3150 | DARJEELING | 27.050 | 88.270 | India | 1995 | 1 | 5.60 |
3151 | DARJEELING | 27.050 | 88.270 | India | 1997 | 1 | 5.70 |
3152 rows × 7 columns
3. Geographic Scatter Plots
We will use the above function to write a new function temperature_coefficient_plot()
, which will address the question:
How does the average yearly change in temperature vary within a given country?
The new function will accept the same four arguments as query_climate_database()
, but will have an added explicit argument min_obs
, which will be used to filter out stations that have less years worth of data than this specified number. Also, we will add **kwargs
for the user to pass onto px.scatter_mapbox()
, which is what we will use to create the final visualization.
In order to compute the estimates of yearly increase in temperature, we will use a linear regression model for each station and grab all the corresponding first coefficients.
def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
"""
Create a plot containing estimates of yearly temperature increase
Args:
country: the country of interest
year_begin: earliest year for which should be returned
year_end: latest year for which should be returned
month: a specific month of the year
min_obs: minimum required number of years of data for any given station
**kwargs: additional keyword arguments
Return:
a scatter_mapbox plot
"""
# pull the correct data using the query function
= query_climate_database(country = country, year_begin = year_begin,
df = year_end, month = month)
year_end
# filter out stations based on min_obs argument
= df.groupby('NAME').filter(lambda x : x['Year'].count() >= min_obs)
df
# fit the linear model for each station
= px.get_trendline_results(px.scatter(df, x="Year", y="Temp", facet_col="NAME",
lin_model ="ols", facet_col_wrap = 12, facet_row_spacing = 0.001))
trendline= lin_model["px_fit_results"]
result = [list(result.iloc[i].params[[1]]) for i in range(len(result))]
coef = list(np.round(np.concatenate(coef), 3)) # round the results
coef = dict(zip(df["NAME"].unique(),coef))
diff "Estimated Yearly Increase"] = df["NAME"].map(diff) # map the results onto a new column
df[
= {
months 1: "January", 2: "February", 3: "March", 4: "April",
5: "May", 6: "June", 7: "July", 8: "August",
9: "September", 10: "October", 11: "November", 12: "December"
}
# create plot
= px.scatter_mapbox(df, lon = "LONGITUDE", lat = "LATITUDE", color = "Estimated Yearly Increase",
fig = 0, hover_name = "NAME", height = 600,
color_continuous_midpoint = ["LATITUDE", "LONGITUDE", "Estimated Yearly Increase"],
hover_data = "Estimates of Yearly Increase in Temperature in " + months.get(month)
title + "<br>for stations in " + str(country) + ", years " + str(year_begin) + " - " + str(year_end),
**kwargs
)= 'Estimated Yearly <br> Increase (°C)'
fig.layout.coloraxis.colorbar.title =dict(l=50, r=50, t=100, b=50))
fig.update_layout(marginreturn fig
Let’s demonstrate the above function! We will again use the inputs "India"
, 1980
, 2020
, and 1
from Part 2, and add on 10
for the argument min_obs
. We will also add extra arguments to pass onto px.scatter_mapbox()
. If everything goes right, the resulting visualization should contain all stations in India with at least 10 years worth of data between the years 1980 and 2020 for the month of January.
The station data will then be plotted in the form of a scatter plot on a world map. Each data point will contain information about the corresponding station’s estimate of yearly temperature increase, which can be explicitly found by hovering over the data point or estimated by looking at its color.
= px.colors.diverging.RdGy_r # produce a color map
color_map = temperature_coefficient_plot("India", 1980, 2020, 1, 10,
fig = 2,
zoom ="carto-positron",
mapbox_style= color_map)
color_continuous_scale fig.show()
Cool! The above gives us a neat visualization created with Plotly Express that answers our central question. Let’s try passing on different arguments.
= px.colors.diverging.RdGy_r
color_map = temperature_coefficient_plot("France", 1993, 2012, 5, 5,
fig = 2,
zoom ="carto-positron",
mapbox_style= color_map)
color_continuous_scale fig.show()