Task 1: Create a dataset to store your machine learning models
It can be called any name to pass the test but for the remaining instructions to work use austin as the dataset name.
Go to BigQuery
Click on Project id
Create dataset named austin
Task 2: Create a forecasting BigQuery machine learning model.
Create the first ML model using a JOIN between two bike share tables. Again any names will work but keep them as ‘austin_1’ and ‘austin_2’ for the remaining instructions to work.
CREATE OR REPLACE MODEL austin.location_model
OPTIONS
(model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
duration_minutes,
address as location
FROM
`bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
JOIN
`bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
ON
trips.start_station_name = stations.name
WHERE
EXTRACT(YEAR FROM start_time) = <Training Year>
AND duration_minutes > 0
Task 3: Create the second machine learning model.
CREATE OR REPLACE MODEL austin.subscriber_model
OPTIONS
(model_type='linear_reg', labels=['duration_minutes']) AS
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
WHERE EXTRACT(YEAR FROM start_time) = <Training Year>
Task 4: Evaluate the two machine learning models.
Query 1
SELECT
SQRT(mean_squared_error) AS rmse,
mean_absolute_error
FROM
ML.EVALUATE(MODEL austin.location_model, (
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
duration_minutes,
address as location
FROM
bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips
JOIN
bigquery-public-data.austin_bikeshare.bikeshare_stations AS stations
ON
trips.start_station_name = stations.name
WHERE EXTRACT(YEAR FROM start_time) = <Evaluation Year>)
)
QUERY 2:
SELECT
SQRT(mean_squared_error) AS rmse,
mean_absolute_error
FROM
ML.EVALUATE(MODEL austin.subscriber_model, (
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM
bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips
WHERE
EXTRACT(YEAR FROM start_time) = <Evaluation Year>)
)
Task 5: Use the subscriber type machine learning model to predict average trip durations
Use the second model, that model (model austin_2 in this case) to to predict the average duration length of all trips from the busiest rental station in 2019 (based on the number of rentals per station in 2019) where the subscriber_type=’Single Trip’.
The following query will list busiest stations in descending order. The busiest station for 2019 was “21st & Speedway @PCL”.
SELECT
start_station_name,
COUNT(*) AS trips
FROM
bigquery-public-data.austin_bikeshare.bikeshare_trips
WHERE
EXTRACT(YEAR FROM start_time) = <Evaluation Year>
GROUP BY
start_station_name
ORDER BY
trips DESC
Then predict trip length.
SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length
FROM ML.predict(MODEL austin.subscriber_model, (
SELECT
start_station_name,
EXTRACT(HOUR FROM start_time) AS start_hour,
subscriber_type,
duration_minutes
FROM
bigquery-public-data.austin_bikeshare.bikeshare_trips
WHERE
EXTRACT(YEAR FROM start_time) = <Evaluation Year>
AND subscriber_type = 'Single Trip'
AND start_station_name = '21st & Speedway @PCL'))
Comments
Post a Comment