Real time data collecting

This is the first part in my series on collecting data

I thought I would go back to the beginning and post about how I collect the data, as the reports are nothing without data. Most of the charts are based on data which is updated once a week, things like Goals or minutes played so the actual data capture is relatively simple, but for some of the data like transfers this gets updated throughout the week so I wanted to see what insights I could get from that. When I say transfers I’m referring to the way people move players in and out of their teams every week.

Collecting real-time data is something I have experience with as I designed a process for work, so I use the same building blocks. The overall design looks like this, but lets break it down.

The APIs I use are provided for anyone, so its just a case of having the knowledge on how to use them.

A Webjobs – runs continuously and gets the data, the data is too big to be sent to an Event Hub so I do some work on the data, I extract the nodes I want and send that to an even Hub but I also send the entire json file to a Storage account. The reason being I can use that data when I need it.

The data that is sent to the Event Hub is consumed by the Stream Analytics and ends up in an Azure SQL database. Within the Stream I can modify the data or add additional data if I so wish.

In my design I also want to merge this data with other data sources so I use Azure Data Factory to copy the data from the cloud to a SQL db on a local machine which is being used as a Data Warehouse and contains all the other football data. I have various processes running which collects different types of data.

At this point I have real football data and fantasy football together so I can start to blend it together to see what insights I can get.

This is an ongoing project for me which I have been working on as a side project for many years in which time I have changed and modified all the elements. Originally I used VB.net to write stuff but now I use C#, I removed ETLs which were based on SSIS and now I use Azure or even Python, cant wait till the support for Python in Function Apps is better 🙂

So that was a brief overview but it gives you the main parts and how I collect from that data source.

 

Next part will be based on the other data sources and how we blend data before visualizing it.

 

Leave a Reply

Your email address will not be published. Required fields are marked *