Real time data collecting

This is the first part in my series on collecting data

I thought I would go back to the beginning and post about how I collect the data, as the reports are nothing without data. Most of the charts are based on data which is updated once a week, things like Goals or minutes played so the actual data capture is relatively simple, but for some of the data like transfers this gets updated throughout the week so I wanted to see what insights I could get from that. When I say transfers I’m referring to the way people move players in and out of their teams every week.

Collecting real-time data is something I have experience with as I designed a process for work, so I use the same building blocks. The overall design looks like this, but lets break it down.

The APIs I use are provided for anyone, so its just a case of having the knowledge on how to use them.

A Webjobs – runs continuously and gets the data, the data is too big to be sent to an Event Hub so I do some work on the data, I extract the nodes I want and send that to an even Hub but I also send the entire json file to a Storage account. The reason being I can use that data when I need it.

The data that is sent to the Event Hub is consumed by the Stream Analytics and ends up in an Azure SQL database. Within the Stream I can modify the data or add additional data if I so wish.

In my design I also want to merge this data with other data sources so I use Azure Data Factory to copy the data from the cloud to a SQL db on a local machine which is being used as a Data Warehouse and contains all the other football data. I have various processes running which collects different types of data.

At this point I have real football data and fantasy football together so I can start to blend it together to see what insights I can get.

This is an ongoing project for me which I have been working on as a side project for many years in which time I have changed and modified all the elements. Originally I used to write stuff but now I use C#, I removed ETLs which were based on SSIS and now I use Azure or even Python, cant wait till the support for Python in Function Apps is better 🙂

So that was a brief overview but it gives you the main parts and how I collect from that data source.


Next part will be based on the other data sources and how we blend data before visualizing it.


Latest dashboards

Been playing again with Power BI, the latest dashboards shows various stats to assist with my player picking:

Points & Average points by teams and positions  over a given time period.

Points & Average points by players over a given time period.

Points & Average points by players outside the top 6 teams  over a given time period.

Goals scored & Assists by teams and positions  over a given time period.

Goals scored & Assists by players  outside the top 6 teams  over a given time period.

If I drill down on the assists to show players you can see Benteke had 4 assists prior to last weeks games, who’d have thought that 🙂


Goalkeepers stats.

The purpose of these dashboards is to show what is happening and to give you the ability to ‘drill down’ from teams to players. Need to start looking at the next game week and think about game week 31 as that is a reduced game week in terms of games.


More APIs for Fantasy data.

After a bit of digging I found some useful APIs to get data, so I created some more Python scripts to capture data and now I’m in a position where I’m looking at that data and considering redesigning the entire data model. The reason for this is some of the data I was creating will be more accurate with the new APIs.

The new data allows me to summarise the players data by week. I have made a few more dashboards to show the data so I can see what I can do. Here they are.

This gives me a summary based on the last 6 weeks.  I can see things like minutes played and goals. Here I have selected Midfield.

If I select Hazard from the treemap I get his stats. This will enable me to see how his points are broken down.

And the big question is normally Kane or Aguero for my forward.

Based on the last 6 weeks Sergio wins 🙂 but what happens if we look at the last 3 weeks, well, Firmino comes into the mix

The good thing is we are getting to a point where the data is enabling us to make a decision. With footy, emotions play a major part so having decisions based on data rather than emotions may make a different.

What happens if we want to include price. We can add Price and Price range to the dataset. This will give us the ability to look at players within a certain budget. Having the ability to select the weeks you want to look at will also show how they are progressing or not!

Adding in a price range we can select our budget. I want to look at the Midfield players in the £9 million to £11 million range, as expected the big hitters like Salah and KDB are there. In the last three weeks they have amassed 9 goals and 7 assists.


If I select KDB I can see that he has scored once and has 4 assists. The bottom right hand chart shows me he has had a price increase as well

But if my budget was less, in the £5 million to £7 million range it would look like this

The top two being Prowse and Ramsey, so lets look at them. 1 goal 2 assists for Prowse

3 goals for Ramsey in a single game, and his game time is less.

So in this situation I would probably pick Prowse, and he is cheaper.

In summary leveraging Power BI to visualise the data will hopefully give me more of a competitive edge going forward 🙂 Remember spend time on getting the data right then the Power BI part is easier

Next I need to use Python to provide some stats prior to displaying in Power BI.


Home Team dashboard

I’ve spent more time playing on dashboards, it appears the more time I spend the more options I want to include :). I decided to go with a Home Team dashboard.

You pick a League and a season and it gives you the home info.

If you select a Team you get the details

If we look at the games in details we can see that home draws are going to cost Liverpool.

You can look at the data across multiple season as well, here I selected 3 seasons and it gave me this data.

I’ve built quite a few dashboards now, I need to build the away dashboard next then I’m onto the predictive analysis stuff. I will then start to use R/Python to really push the data. The current Predictions stuff is not accurate enough yet so I’m busy building in some other variables.

Its amazing what you can add in and luckily with some of the variables I can retest all the previous results to see what effect it has. Some of the variables I cant use as I only have certain data points but going forward that will grow.

During my time on this side project it has made me realise the importance of collecting data, but not just collecting it but actually using it.



Working on the Prediction dashbord

A new dashboard has been added, this time it based on real football and it includes Predictions 🙂

This one allows you to select a home team and an away team and you can see stats based on that fixture. In the example I selected ‘Manchester City’ and ‘Newcastle’. Looking at the stats its going to be a home win.


It has also been predicted as a home win 67.74%. The predictions are still being tested as I have changed the algorithm a lot this season.

The stats in the tiles are based on historical fixtures between the 2 teams. Currently I’m using all data but I will modify this once I see how relevant all the data is.

If we change the teams to Arsenal/Palace we can see at one point this would have been a guaranteed Home win. But recent form shows a different story. The Prediction is 60.78% for a home win but I think it might be a draw.

if we use the players dashboard we can see it might be closer.

After this weeks games I will be doing some more analysis before I update all the data.

Watch out for another post coming out very soon.

Power BI and Fantasy footy data

The joys of data and hopefully some useful insights 🙂 Here is my latest dashboard on the Fantasy Footy Data.

3 week diff

This charts shows the data when you look at it over a three week period. The purpose of this data is see in form players, but by looking at three weeks it shows more analysis than looking at a single week. It shows the data in the following way:

  • Data by teams
  • Data by players
  • Data by position

It can be filtered by team or cost band.


Goals & Assists

These charts are showing goals and assists by players. It is the same format as the last chart. Interesting you can see that only two teams have not ahd a defender score and Chelsea have 11 goals from defenders.


This chart is showing goals and assists per minute played. The size of the bubble is total points. Bottom left quadrant is the one to be in.

This can be filtered by position.

If we zoom in we can see the stats for Mo Salah.


The drill down capabilities shows us the records which makes up the charts, if we right click on Chelsea and DEF we can see who has scored.


Drill down records

The records are showing us that Alonso has scored 6.


The purpose of these charts is to help me select which players do I bring into my team.

I will be posting once a week from now as we in a position when the data is flowing and we are seeing some good insights.


More analysis on players using Power BI – upto and including week 9

So in my quest to see if I can use stats to improve my Fantasy league team I have come up with a few more dashboards.

Team and Players

The first dashboard shows me the overall points for the teams, then the weekly points for the teams and then the players. Ignore the up/down as that is based on limited data, it will become more important as the weeks go by.

By selecting the team it will filter all the charts so you can see what each player got.

As well as the dashboards based on overall points etc. I wanted to see if I could create a ‘weighting’ which could be applied and then you can see if a player is worth buying, hence the Bargain column. It will be interesting to see the stats after I have uploaded this weeks data.



I will be updating the data and posting the latest stats later.


Power BI and fantasy league data

Back to playing with Power BI. I’m using the data from my Fantasy league team and seeing what insights I can get. This is the first post of many on this subject. Future posts will have more details in them.


The idea is to see the data from a details and KPI perspective.

KPIs are based on what I would my team to score, so Captain points are 15, other players just below 5. I came up with these numbers based on.

  • Team needs to get 60 points
  • Captain needs 15 points
  • 45 points shared between the other 10 players.

In the first set of charts the waterfall chart has drill down capabilities.

Top level, which shows points by week.

First level down shows points by players

Bottom level shows points by bench and first 11. ‘Y’ is bench.


I’m hoping this data will help me get higher in the leagues.

Small project start to finish – Footy data

So far I have been focused on showing you the charts that I created, here I am going to build a small project from the start. The goal is to produce  visualization that shows me FTHG against FTAG so I can see at a glance which teams will do well.

Starting off – Data

Getting the data right is the basis for a trouble free Power BI experience, as I’m using SQL Server it makes the job easier. You can data wrangle in Power BI but I prefer to do as much as possible before. My data model is made up of a single Fact table and various Dimension tables. The Fact is the results etc. and the Dimensions are the Teams, Season, Leagues etc. For the purpose of this I have not used Fact or Dim as either a Schema or in the naming convention, normally I would 🙂


Note: The Team table I have turned into 2 views as I want to select Home and Away teams. There are other ways of doing this but this is a simple way.


Power BI

Within Power BI I import the Tables and Views I require, and check the relationships between them. All the lines are solid so I’m happy.


The Dashboard is simply 2 charts and a selector. The selector is for the Leagues. The charts are a stacked bar chart and a quadrant chart. The stacked bar is based on Goal Difference. I created a new measure for this.

The Quadrant is based on FTHG against FTAG.

Now, you can select either of the charts and it will filter the other. Here I selected a specific value on the Stacked bar, Aston Villa -21 and it filtered the Quadrant chart.


If you hover over the Quadrant bubble for Aston Villa you can see FTHG 14 and FTAG 35 which gives you -21



This is a very simple project with just one dashboard, but because of the data even though it is simple we could produce quite a lot of charts.



More new charts

I have been going through the data and seeing what I could come up with. I have created some new charts based on various pieces of data.

Here is a chart showing the last 6 season worth of data, highlighting average scored and conceded goals. Nothing really shows that we didn’t know.








This chart is based on the promoted sides and how they did in the first six games. This is more interesting as I always said don’t bet on the first 6 games as this is when the promoted teams do better, or perceived to do better as the other teams are getting used to them.











This chart goes into more details and shows how the promoted teams did in the following season. We can see things like;

  • Which teams from 3 to 6 actually got promoted.
  • Where the promoted teams finished in the following season to being promoted. I also grouped this by where they finished in the promotion season.
  • How many promoted teams stayed up.
  • Some stats on points.









These are just the first set of charts, I am starting to go through the data to see what information it tells me.