















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Estimating horse racing result has been a popular topic in machine ... so that we would like to conduct an experiment on predicting horse racing result.
Typology: Study notes
1 / 55
This page cannot be seen from the preview
Don't miss anything!
LYU 1603 Final Year Project Term 1 Report CHENG Tsz Tung (1155051298) LAU Ming Hei ( 1155051346 ) Supervised by Prof. LYU Rung Tsong Michael
Estimating horse racing result has been a popular topic in machine learning field, whilst the possibility of profit earning is depending on the accuracy of predicting the probabilities of horses to win in a race. Due to the comprehensive historical data provided by the Hong Kong Jockey Club, a lot of experiments could be done. This report would describe the process of tackling the problem through a standard data mining process, start by collecting and analyzing data. One interesting fact has been found, such that public intelligence is performing not bad in horse racing, thus the objective of the project is to develop a model which could perform as good as or even outperform the public intelligence. This report would discuss the method of feature selection and normalization, the reason of proposing new features, the possible ways to train the model, the difficulties when handling unbalanced dataset, the method to evaluate the model and the results derived from different learning algorithms. We show that it is possible to construct a model outperforms the public intelligence, also by setting some threshold and not participating in every race, it is possible to generate profit through the model trained with deep neural network and the model driven by pattern matching.
1.1 Motivation Horse racing has been a famous topic in machine learning field, while the recent performance of deep neural network is stunning and there were a lot of new machine learning tools released recently, which could let us apply deep learning algorithm or other machine learning algorithm easily, so that we would like to conduct an experiment on predicting horse racing result. 1.2 Background 1.2.1 Horse Racing Horse racing is a sport that running horses at speed 1
. In Hong Kong, horse racing is not purely a sport, it has gambling components associated. Around 8-14 horses in a race, these are only one type of race in Hong Kong, the faster the winner. However, there are different types of betting, such as win bet, which is guessing the winner; Jockey Challenge, which is the best performance jockey. 1.2.2 Hong Kong Jockey Club. āThe Hong Kong Jockey Club (HKJC) is a non-profit organization providing horse racing, sporting and betting entertainment in Hong Kong. It holds a government-granted monopoly in providing pari-mutuel betting on horse racing. The organization is the largest taxpayer in Hong Kong, as well as the largest community benefactor.ā 2 (^1) https://global.britannica.com/sports/horse-racing (^2) https://en.wikipedia.org/wiki/Hong_Kong_Jockey_Club
1.3 Objective To reduce the complexity of types of betting, we would restrict our discussion on win bet and race in Hong Kong only. Our objective is to create a model which could predict the winner in a race, and perform as well as the public intelligence, or even beat it, in terms of accuracy and profit earning, by using TensorFlow.
2.1 Fast Approach There exist companies offer the sale of horse racing historical data in Hong Kong, a fast approach is to buy data, though the price is considerable, due to a lack of budget, this approach is not suitable. 2.2 Web Crawling^6 Tailor-made python scripts were created to crawl data from the HKJC website, historical data and horsesā information from 2001 to 2015 horse seasons were collected. Data were structured in csv 7 format and there are 20 features in total. The following table is describing the structure of a row record in a race. Feature Description Date - Location - Race Number - Class - Distance - Going Track condition Course Track Pool Prize pool Place - Horse ID - Horse - Jockey - Trainer - Actual Weight Carried weight Declare Weight Overall weight Draw - LBW Length behind winner Running Position - 6 https://www.ciencedaily.com/terms/web_crawler.htm (^7) http://creativyst.com/Doc/Articles/CSV/CSV01.htm
3.1 Tools 3.1.1 PostgreSQL 8 Figure 1 PostgreSQL Logo PostgreSQL is an open source Relational Database Management System^9. The reason we choose to use it is because it has a good OSX GUI client, and SQL is good for extracting data from database. (^8) https://twitter.com/postgresql (^9) https://www.postgresql.org/about/
3.1.2 Postico 10 Figure 2 Postico User Interface Postico is a PostgreSQL OSX GUI client. It allows use to create table or execute SQL statement by simply clicking some button, and it provides a good user interface for user to view their data. 3.2 Database Structure In order to do further data analysis, data management system is required. A local database has been built with the following structure. There are many fields are used TEXT as the data type, the reason behind is because missing data exists in those fields. (^10) https://eggerapps.at/postico/
4.1 Extract Features As we have put our data into the database, we could write some functions to extract more features. Features Description Age The age of the horse Time since last race The number of days since the last race of as horse Weight different from last race The weight different since the last race of a horse Past place record on the same track The horse past performance on the same track Jockeyās winning percentage - Horseās winning percentage - Trainerās winning percentage - Table 4 Extract Features
5.1 Jockey Following is the distribution of jockey participation over the past 15 years, we can see that a small portion of jockeys were participated in most of the race, which will make us difficult to use jockey as one of the feature in our training model. Figure 4 Jockey Participation
5.3 Trainer Following is the distribution of the number of horses that a trainer have trained over the past 15 years, we can see that a small number of trainer trained most of the horses, which will made us hard to use trainerās name as one of the feature in our training model. Figure 6 Trained Horse Per Trainer
5.4 Draw Draw is the starting position, as the track is a stretched circle and starting positions will not adjust accordingly if horses are closer or farther to the inner circle like Track and field. So we might have a belief that the smaller number of draw, the higher the chance to win the race, since the distance required the horse to run is shorter. In anticipate race, the higher the rank of a horse, the larger the number of draw. The following is the distribution of winning percentage of draws over the past 15 years, we can see that the result is generally agree with our belief, the smaller of the draw, the higher chance to win the race, though draw 5 has a higher chance to win than draw 4. Figure 7 Win Percentage of Different Draw