Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Predicting Horse Racing Result Using TensorFlow, Study notes of Veterinary

Estimating horse racing result has been a popular topic in machine ... so that we would like to conduct an experiment on predicting horse racing result.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

lilwayne
lilwayne šŸ‡¬šŸ‡§

4.1

(7)

243 documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Predicting Horse Racing Result
Using TensorFlow
LYU1603 Final Year Project Term 1 Report
CHENG Tsz Tung (1155051298)
LAU Ming Hei (1155051346)
Supervised by Prof. LYU Rung Tsong Michael
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download Predicting Horse Racing Result Using TensorFlow and more Study notes Veterinary in PDF only on Docsity!

Predicting Horse Racing Result

Using TensorFlow

LYU 1603 Final Year Project Term 1 Report CHENG Tsz Tung (1155051298) LAU Ming Hei ( 1155051346 ) Supervised by Prof. LYU Rung Tsong Michael

Table of Contents

  • Abstract
  • 1 Introduction
  • 1.1 Motivation
  • 1.2 Background
    • 1.2.1 Horse Racing..........................................................................................................
    • 1.2.2 Hong Kong Jockey Club.
    • 1.2.3 Pari-mutuel betting.................................................................................................
    • 1.2.4 Types of bets
  • 1.3 Objective
  • 2 Data Collection
  • 2.1 Fast Approach
  • 2.2 Web Crawling
  • 3 Data Storage
  • 3.1 Tools
    • 3.1.1 PostgreSQL
    • 3.1.2 Postico
  • 3.2 Database Structure
  • 4 More Data
  • 4.1 Extract Features
  • 5 Data Analysis
  • 5.1 Jockey
  • 5.2 Horse
  • 5.3 Trainer
  • 5.4 Draw............................................................................................................................
  • 5.5 Win odds
  • 5.6 Actual Weight
  • 5.7 Declare Weight
  • 5.8 Horse’s Age
  • 5.9 Time Since Last Race..................................................................................................
  • 5.10 Weight Different........................................................................................................
  • 5.11 Horse Recent Performance on the Same Track
  • 5.12 Public Intelligence
    • 5.12.1 Data Preparation.................................................................................................
    • 5.12.2 Closing Odds Model
    • 5.12.3 Accuracy
    1. Elo Rating System.........................................................................................................
  • 6.1 Elo Rating System for multiple player........................................................................
    • 6.1.1 Estimation score function
    • 6.1.2 Scoring function
    • 6.1.3 Ranking Function
  • 6.2 Compute the elo
    • 6.2.1 Parameter Tuning
    • 6.2.2 Result
    1. Possible ways to model the problem.............................................................................
  • 7.1 Strength of a horse
  • 7.2 Probability of a horse to win the race
  • 7.3 Finishing Time
  • 7.4 Which horse will win in a race
    1. Data pre-process and normalization..............................................................................
  • 8.1 Data Filtering
  • 8.2 Data Normalization
    • 8.2.1 Real Value Data....................................................................................................
    • 8.2.2 Categorical Data...................................................................................................
    • 8.2.3 Crossed Categorical Data
    • 8.3.3 Categorize real value data
    1. Model Training
  • 9.1 Pattern Matching Model
    • 9.1.1 Build Index File
    • 9.1.2 Find similar k-races..............................................................................................
    • 9.1.3 Prediction Race Result
    • 9.1.4 Accuracy
  • 9.2 Linear Model...............................................................................................................
    • 9.2.1 Data Preparation...................................................................................................
    • 9.2.2 Unbalanced dataset
    • 9.2.3 Training
    • 9.2.4 Result
  • 9.3 Deep Neural Network Model
    • 9.3.1 Data Preparation...................................................................................................
    • 9.3.2 Unbalanced dataset
    • 9.3.3 Training
    • 9.3.4 Result
  • 10 Models Evaluation
    1. Limitation and difficulties
    1. Future works
    1. Conclusion
    1. Acknowledgements
  • 15.References
  • 16 Appendix
    • 15.3 Table References

Abstract

Estimating horse racing result has been a popular topic in machine learning field, whilst the possibility of profit earning is depending on the accuracy of predicting the probabilities of horses to win in a race. Due to the comprehensive historical data provided by the Hong Kong Jockey Club, a lot of experiments could be done. This report would describe the process of tackling the problem through a standard data mining process, start by collecting and analyzing data. One interesting fact has been found, such that public intelligence is performing not bad in horse racing, thus the objective of the project is to develop a model which could perform as good as or even outperform the public intelligence. This report would discuss the method of feature selection and normalization, the reason of proposing new features, the possible ways to train the model, the difficulties when handling unbalanced dataset, the method to evaluate the model and the results derived from different learning algorithms. We show that it is possible to construct a model outperforms the public intelligence, also by setting some threshold and not participating in every race, it is possible to generate profit through the model trained with deep neural network and the model driven by pattern matching.

1 Introduction

1.1 Motivation Horse racing has been a famous topic in machine learning field, while the recent performance of deep neural network is stunning and there were a lot of new machine learning tools released recently, which could let us apply deep learning algorithm or other machine learning algorithm easily, so that we would like to conduct an experiment on predicting horse racing result. 1.2 Background 1.2.1 Horse Racing Horse racing is a sport that running horses at speed 1

. In Hong Kong, horse racing is not purely a sport, it has gambling components associated. Around 8-14 horses in a race, these are only one type of race in Hong Kong, the faster the winner. However, there are different types of betting, such as win bet, which is guessing the winner; Jockey Challenge, which is the best performance jockey. 1.2.2 Hong Kong Jockey Club. ā€œThe Hong Kong Jockey Club (HKJC) is a non-profit organization providing horse racing, sporting and betting entertainment in Hong Kong. It holds a government-granted monopoly in providing pari-mutuel betting on horse racing. The organization is the largest taxpayer in Hong Kong, as well as the largest community benefactor.ā€ 2 (^1) https://global.britannica.com/sports/horse-racing (^2) https://en.wikipedia.org/wiki/Hong_Kong_Jockey_Club

1.3 Objective To reduce the complexity of types of betting, we would restrict our discussion on win bet and race in Hong Kong only. Our objective is to create a model which could predict the winner in a race, and perform as well as the public intelligence, or even beat it, in terms of accuracy and profit earning, by using TensorFlow.

2 Data Collection

2.1 Fast Approach There exist companies offer the sale of horse racing historical data in Hong Kong, a fast approach is to buy data, though the price is considerable, due to a lack of budget, this approach is not suitable. 2.2 Web Crawling^6 Tailor-made python scripts were created to crawl data from the HKJC website, historical data and horses’ information from 2001 to 2015 horse seasons were collected. Data were structured in csv 7 format and there are 20 features in total. The following table is describing the structure of a row record in a race. Feature Description Date - Location - Race Number - Class - Distance - Going Track condition Course Track Pool Prize pool Place - Horse ID - Horse - Jockey - Trainer - Actual Weight Carried weight Declare Weight Overall weight Draw - LBW Length behind winner Running Position - 6 https://www.ciencedaily.com/terms/web_crawler.htm (^7) http://creativyst.com/Doc/Articles/CSV/CSV01.htm

3 Data Storage

3.1 Tools 3.1.1 PostgreSQL 8 Figure 1 PostgreSQL Logo PostgreSQL is an open source Relational Database Management System^9. The reason we choose to use it is because it has a good OSX GUI client, and SQL is good for extracting data from database. (^8) https://twitter.com/postgresql (^9) https://www.postgresql.org/about/

3.1.2 Postico 10 Figure 2 Postico User Interface Postico is a PostgreSQL OSX GUI client. It allows use to create table or execute SQL statement by simply clicking some button, and it provides a good user interface for user to view their data. 3.2 Database Structure In order to do further data analysis, data management system is required. A local database has been built with the following structure. There are many fields are used TEXT as the data type, the reason behind is because missing data exists in those fields. (^10) https://eggerapps.at/postico/

4 More Data

4.1 Extract Features As we have put our data into the database, we could write some functions to extract more features. Features Description Age The age of the horse Time since last race The number of days since the last race of as horse Weight different from last race The weight different since the last race of a horse Past place record on the same track The horse past performance on the same track Jockey’s winning percentage - Horse’s winning percentage - Trainer’s winning percentage - Table 4 Extract Features

5 Data Analysis

5.1 Jockey Following is the distribution of jockey participation over the past 15 years, we can see that a small portion of jockeys were participated in most of the race, which will make us difficult to use jockey as one of the feature in our training model. Figure 4 Jockey Participation

5.3 Trainer Following is the distribution of the number of horses that a trainer have trained over the past 15 years, we can see that a small number of trainer trained most of the horses, which will made us hard to use trainer’s name as one of the feature in our training model. Figure 6 Trained Horse Per Trainer

5.4 Draw Draw is the starting position, as the track is a stretched circle and starting positions will not adjust accordingly if horses are closer or farther to the inner circle like Track and field. So we might have a belief that the smaller number of draw, the higher the chance to win the race, since the distance required the horse to run is shorter. In anticipate race, the higher the rank of a horse, the larger the number of draw. The following is the distribution of winning percentage of draws over the past 15 years, we can see that the result is generally agree with our belief, the smaller of the draw, the higher chance to win the race, though draw 5 has a higher chance to win than draw 4. Figure 7 Win Percentage of Different Draw