Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Predicting Horse Racing Result Using TensorFlow, Study notes of Veterinary

University of Aberdeen Veterinary

Estimating horse racing result has been a popular topic in machine ... so that we would like to conduct an experiment on predicting horse racing result.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

lilwayne 🇬🇧

4.1

(7)

243 documents

1 / 55

This page cannot be seen from the preview

Don't miss anything!

Predicting Horse Racing Result

Using TensorFlow

LYU1603 Final Year Project Term 1 Report

CHENG Tsz Tung (1155051298)

LAU Ming Hei (1155051346)

Supervised by Prof. LYU Rung Tsong Michael

Partial preview of the text

Download Predicting Horse Racing Result Using TensorFlow and more Study notes Veterinary in PDF only on Docsity!

Predicting Horse Racing Result

Using TensorFlow

LYU 1603 Final Year Project Term 1 Report CHENG Tsz Tung (1155051298) LAU Ming Hei ( 1155051346 ) Supervised by Prof. LYU Rung Tsong Michael

Abstract
1 Introduction
1.1 Motivation
1.2 Background
- 1.2.1 Horse Racing..........................................................................................................
- 1.2.2 Hong Kong Jockey Club.
- 1.2.3 Pari-mutuel betting.................................................................................................
- 1.2.4 Types of bets
1.3 Objective
2 Data Collection
2.1 Fast Approach
2.2 Web Crawling
3 Data Storage
3.1 Tools
- 3.1.1 PostgreSQL
- 3.1.2 Postico
3.2 Database Structure
4 More Data
4.1 Extract Features
5 Data Analysis
5.1 Jockey
5.2 Horse
5.3 Trainer
5.4 Draw............................................................................................................................
5.5 Win odds
5.6 Actual Weight
5.7 Declare Weight
5.8 Horse’s Age
5.9 Time Since Last Race..................................................................................................
5.10 Weight Different........................................................................................................
5.11 Horse Recent Performance on the Same Track
5.12 Public Intelligence
- 5.12.1 Data Preparation.................................................................................................
- 5.12.2 Closing Odds Model
- 5.12.3 Accuracy
1. Elo Rating System.........................................................................................................
6.1 Elo Rating System for multiple player........................................................................
- 6.1.1 Estimation score function
- 6.1.2 Scoring function
- 6.1.3 Ranking Function
6.2 Compute the elo
- 6.2.1 Parameter Tuning
- 6.2.2 Result
1. Possible ways to model the problem.............................................................................
7.1 Strength of a horse
7.2 Probability of a horse to win the race
7.3 Finishing Time
7.4 Which horse will win in a race
1. Data pre-process and normalization..............................................................................
8.1 Data Filtering
8.2 Data Normalization
- 8.2.1 Real Value Data....................................................................................................
- 8.2.2 Categorical Data...................................................................................................
- 8.2.3 Crossed Categorical Data
- 8.3.3 Categorize real value data
1. Model Training
9.1 Pattern Matching Model
- 9.1.1 Build Index File
- 9.1.2 Find similar k-races..............................................................................................
- 9.1.3 Prediction Race Result
- 9.1.4 Accuracy
9.2 Linear Model...............................................................................................................
- 9.2.1 Data Preparation...................................................................................................
- 9.2.2 Unbalanced dataset
- 9.2.3 Training
- 9.2.4 Result
9.3 Deep Neural Network Model
- 9.3.1 Data Preparation...................................................................................................
- 9.3.2 Unbalanced dataset
- 9.3.3 Training
- 9.3.4 Result
10 Models Evaluation
1. Limitation and difficulties
1. Future works
1. Conclusion
1. Acknowledgements
15.References
16 Appendix
- 15.3 Table References

Abstract

Estimating horse racing result has been a popular topic in machine learning field, whilst the possibility of profit earning is depending on the accuracy of predicting the probabilities of horses to win in a race. Due to the comprehensive historical data provided by the Hong Kong Jockey Club, a lot of experiments could be done. This report would describe the process of tackling the problem through a standard data mining process, start by collecting and analyzing data. One interesting fact has been found, such that public intelligence is performing not bad in horse racing, thus the objective of the project is to develop a model which could perform as good as or even outperform the public intelligence. This report would discuss the method of feature selection and normalization, the reason of proposing new features, the possible ways to train the model, the difficulties when handling unbalanced dataset, the method to evaluate the model and the results derived from different learning algorithms. We show that it is possible to construct a model outperforms the public intelligence, also by setting some threshold and not participating in every race, it is possible to generate profit through the model trained with deep neural network and the model driven by pattern matching.

1 Introduction

1.1 Motivation Horse racing has been a famous topic in machine learning field, while the recent performance of deep neural network is stunning and there were a lot of new machine learning tools released recently, which could let us apply deep learning algorithm or other machine learning algorithm easily, so that we would like to conduct an experiment on predicting horse racing result. 1.2 Background 1.2.1 Horse Racing Horse racing is a sport that running horses at speed 1

. In Hong Kong, horse racing is not purely a sport, it has gambling components associated. Around 8-14 horses in a race, these are only one type of race in Hong Kong, the faster the winner. However, there are different types of betting, such as win bet, which is guessing the winner; Jockey Challenge, which is the best performance jockey. 1.2.2 Hong Kong Jockey Club. “The Hong Kong Jockey Club (HKJC) is a non-profit organization providing horse racing, sporting and betting entertainment in Hong Kong. It holds a government-granted monopoly in providing pari-mutuel betting on horse racing. The organization is the largest taxpayer in Hong Kong, as well as the largest community benefactor.” 2 (^1) https://global.britannica.com/sports/horse-racing (^2) https://en.wikipedia.org/wiki/Hong_Kong_Jockey_Club

1.3 Objective To reduce the complexity of types of betting, we would restrict our discussion on win bet and race in Hong Kong only. Our objective is to create a model which could predict the winner in a race, and perform as well as the public intelligence, or even beat it, in terms of accuracy and profit earning, by using TensorFlow.

2 Data Collection

2.1 Fast Approach There exist companies offer the sale of horse racing historical data in Hong Kong, a fast approach is to buy data, though the price is considerable, due to a lack of budget, this approach is not suitable. 2.2 Web Crawling^6 Tailor-made python scripts were created to crawl data from the HKJC website, historical data and horses’ information from 2001 to 2015 horse seasons were collected. Data were structured in csv 7 format and there are 20 features in total. The following table is describing the structure of a row record in a race. Feature Description Date - Location - Race Number - Class - Distance - Going Track condition Course Track Pool Prize pool Place - Horse ID - Horse - Jockey - Trainer - Actual Weight Carried weight Declare Weight Overall weight Draw - LBW Length behind winner Running Position - 6 https://www.ciencedaily.com/terms/web_crawler.htm (^7) http://creativyst.com/Doc/Articles/CSV/CSV01.htm

3 Data Storage

3.1 Tools 3.1.1 PostgreSQL 8 Figure 1 PostgreSQL Logo PostgreSQL is an open source Relational Database Management System^9. The reason we choose to use it is because it has a good OSX GUI client, and SQL is good for extracting data from database. (^8) https://twitter.com/postgresql (^9) https://www.postgresql.org/about/

3.1.2 Postico 10 Figure 2 Postico User Interface Postico is a PostgreSQL OSX GUI client. It allows use to create table or execute SQL statement by simply clicking some button, and it provides a good user interface for user to view their data. 3.2 Database Structure In order to do further data analysis, data management system is required. A local database has been built with the following structure. There are many fields are used TEXT as the data type, the reason behind is because missing data exists in those fields. (^10) https://eggerapps.at/postico/

4 More Data

4.1 Extract Features As we have put our data into the database, we could write some functions to extract more features. Features Description Age The age of the horse Time since last race The number of days since the last race of as horse Weight different from last race The weight different since the last race of a horse Past place record on the same track The horse past performance on the same track Jockey’s winning percentage - Horse’s winning percentage - Trainer’s winning percentage - Table 4 Extract Features

5 Data Analysis

5.1 Jockey Following is the distribution of jockey participation over the past 15 years, we can see that a small portion of jockeys were participated in most of the race, which will make us difficult to use jockey as one of the feature in our training model. Figure 4 Jockey Participation

5.3 Trainer Following is the distribution of the number of horses that a trainer have trained over the past 15 years, we can see that a small number of trainer trained most of the horses, which will made us hard to use trainer’s name as one of the feature in our training model. Figure 6 Trained Horse Per Trainer

5.4 Draw Draw is the starting position, as the track is a stretched circle and starting positions will not adjust accordingly if horses are closer or farther to the inner circle like Track and field. So we might have a belief that the smaller number of draw, the higher the chance to win the race, since the distance required the horse to run is shorter. In anticipate race, the higher the rank of a horse, the larger the number of draw. The following is the distribution of winning percentage of draws over the past 15 years, we can see that the result is generally agree with our belief, the smaller of the draw, the higher chance to win the race, though draw 5 has a higher chance to win than draw 4. Figure 7 Win Percentage of Different Draw

Predicting Horse Racing Result Using TensorFlow, Study notes of Veterinary

Related documents

Partial preview of the text

Download Predicting Horse Racing Result Using TensorFlow and more Study notes Veterinary in PDF only on Docsity!

Predicting Horse Racing Result

Using TensorFlow

Table of Contents

Abstract

1 Introduction

2 Data Collection

3 Data Storage

4 More Data

5 Data Analysis