Create ML Models with BigQuery ML: Challenge Lab [GSP341]

Link to the Lab: https://www.cloudskillsboost.google/focuses/14294?parent=catalog

Link to the Video: https://youtu.be/iYrWLbl1Z6o

Link to the Channel: https://www.youtube.com/channel/UCiWP5KYh4MWj-yp3j-DMIrA

STEPS

Task 1: Create a dataset to store your machine learning models

It can be called any name to pass the test but for the remaining instructions to work use austin as the dataset name.

Go to BigQuery

Click on Project id

Create dataset named austin

Task 2: Create a forecasting BigQuery machine learning model.

Create the first ML model using a JOIN between two bike share tables. Again any names will work but keep them as ‘austin_1’ and ‘austin_2’ for the remaining instructions to work.

CREATE OR REPLACE MODEL austin.location_model

OPTIONS

(model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

start_station_name,

EXTRACT(HOUR FROM start_time) AS start_hour,

EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

duration_minutes,

address as location

FROM

`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

JOIN

`bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations

trips.start_station_name = stations.name

WHERE

EXTRACT(YEAR FROM start_time) = <Training Year>

AND duration_minutes > 0

Task 3: Create the second machine learning model.

CREATE OR REPLACE MODEL austin.subscriber_model

OPTIONS

(model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

start_station_name,

EXTRACT(HOUR FROM start_time) AS start_hour,

subscriber_type,

duration_minutes

FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

WHERE EXTRACT(YEAR FROM start_time) = <Training Year>

Task 4: Evaluate the two machine learning models.

Query 1

SELECT

SQRT(mean_squared_error) AS rmse,

mean_absolute_error

FROM

ML.EVALUATE(MODEL austin.location_model, (

SELECT

start_station_name,

EXTRACT(HOUR FROM start_time) AS start_hour,

EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

duration_minutes,

address as location

FROM

bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips

JOIN

bigquery-public-data.austin_bikeshare.bikeshare_stations AS stations

trips.start_station_name = stations.name

WHERE EXTRACT(YEAR FROM start_time) = <Evaluation Year>)

)

QUERY 2:

SELECT

SQRT(mean_squared_error) AS rmse,

mean_absolute_error

FROM

ML.EVALUATE(MODEL austin.subscriber_model, (

SELECT

start_station_name,

EXTRACT(HOUR FROM start_time) AS start_hour,

subscriber_type,

duration_minutes

FROM

bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips

WHERE

EXTRACT(YEAR FROM start_time) = <Evaluation Year>)

)

Task 5: Use the subscriber type machine learning model to predict average trip durations

Use the second model, that model (model austin_2 in this case) to to predict the average duration length of all trips from the busiest rental station in 2019 (based on the number of rentals per station in 2019) where the subscriber_type=’Single Trip’.

The following query will list busiest stations in descending order. The busiest station for 2019 was “21st & Speedway @PCL”.

SELECT

start_station_name,

COUNT(*) AS trips

FROM

bigquery-public-data.austin_bikeshare.bikeshare_trips

WHERE

EXTRACT(YEAR FROM start_time) = <Evaluation Year>

GROUP BY

start_station_name

ORDER BY

trips DESC

Then predict trip length.

SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length

FROM ML.predict(MODEL austin.subscriber_model, (

SELECT

start_station_name,

EXTRACT(HOUR FROM start_time) AS start_hour,

subscriber_type,

duration_minutes

FROM

bigquery-public-data.austin_bikeshare.bikeshare_trips

WHERE

EXTRACT(YEAR FROM start_time) = <Evaluation Year>

AND subscriber_type = 'Single Trip'

AND start_station_name = '21st & Speedway @PCL'))

Congratulations You have Completed the Challenge Lab!

SAKURA SATIO

Qwiklabs Sakura Satio

Search This Blog

Create ML Models with BigQuery ML: Challenge Lab [GSP341]

Comments

Post a Comment

Popular posts from this blog

A Tour of Google Cloud Hands-on Labs [GSP282]

Insights from Data with BigQuery: Challenge Lab [GSP787]

Ensure Access & Identity in Google Cloud: Challenge Lab [GSP342]