F1 Data Analysis Part I: Lap comparison
Dive into the available data on Formula One events and perform your own analysis. First part of a series.
Using Fast-F1
We all know there is an enormous amount of data generated in Formula One. And we know this data is kept private by all the teams. But some information is available for all races, including some basic telemetry data for all performing cars. Isn’t that cool?! This data is available through the Live Timing website of F1.com. Fortunately Oehrly did a tremendous amount of work, developing a wrapper around this live timing interface , called Fast-F1.
Using this package, it is possible to obtain all information on historic grand prix (2018 to now), and even follow live timing for a running event. Examples can be found on GitHub and in the documentation.
Before developing our own visualizations, let’s look at an example usage of Fast-F1 comparing the speed of the fastest qualifying lap of Norris and Riciardo during the Styrian Grand Prix of 2021:
Lines 5 and 6 setup the look and feel of the matplotlib graphs and enables caching of downloaded data. This last one is advised to enable since it saves a serious amount of time. Line 8 obtains a Session from Fast-F1, in this case the qualifying session of the Styrian GP in 2021. This Session object is the entry point to obtain the recorded laps of all drivers (line 9). This Laps dataframe contains the meta information of all recorded laps, including:
Time - Time the record was created
DriverNumber - Car number of the driver
Driver - 3 letter abbreviation of the driver
Team - Team the car belongs to
LapTime - Lap time record for this lap
LapNumber - Sequence Number within this session
Stint - Sequence number of stints
Sector1Time - Sector 1 time for this lap
Sector2Time - Sector 2 time for this lap
Sector3Time - Sector 3 time for this lap
Compound - Tyre compund used during this lap
TyreLife - Tyre age in laps
Besides providing this metadata, Laps can also provide filtering using .pick_driver(driver) and .pick_fastest(). Combing these two results in the fastest lap of the specified driver in this session (line 16). We can then request the telemetry data for this lap with .get_telemetry() resulting in the most detailed available information available, including:
SessionTime - Time in the session this entry was recorded
DriverAhead - Which driver is ahead on track (Driver number)
DistanceToDriverAhead - Distance to the draiver ahead
Time - Running time for current lap, starts at zero for each lap
Distance - Running distance for current lap
RPM - RPM of the engine at this moment
Speed - Speed of the car
nGear - Current selected gear
Throttle - Amount of throttle applied (0 - 100)
Brake - Amount of braking applied (0 - 100)
DRS - DRS Enabled
X,Y,Z - Location of the car in the X,Y,Z plane of the circuit
Plotting the speed against the distance travelled during this lap results in the following graph for the best qualifying lap of Norris and Ricciardio:
The graph shows that Norris is faster in most parts of the circuit, especially im the corners. The lap time of Norris is 1:03.768 and Ricciardo 1:04.719, a difference of 0.951 seconds.
When we replace the distance on the x-axis by the time, we see this more clear:
The difference in speed, and thus reaching corners at a different time, reduces the overlap of the lines, especially towards the end of the lap. Lines are easier to compare on this level, but checking for example the braking point will be more difficult. A breaking point is a fixed position on the track and not time dependent. So it depends on the analysis you want to perform which graph better suits the goal.
Zooming in on the last two corners of the lap shows the difference in corner speed, especially in the last corner before the final straight. The difference in speed is reduced by Ricciardo (so he might be running with less wing, resulting in lower corner speed but faster acceleration) but overall this is insufficient:
The lowest speed by Norris is 201 km/h at 3966 metres, and by Ricciardo 194 km/h at 3982 metres. Norris starts accelerating 16 metres earlier than Ricciardo. This data is obtained with the following code:
nor = laps.pick_driver('NOR').pick_fastest().get_telemetry()
nor = nor[(nor.Distance > 3500) & (nor.Distance < 4300)]
nor_low = nor['Speed'].min()
nor_low_pos = (nor[nor.Speed == nor_low]['Distance'])ric = laps.pick_driver('RIC').pick_fastest().get_telemetry()
ric = ric[(ric.Distance > 3500) & (nor.Distance < 4300)]
ric_low = ric['Speed'].min()
ric_low_pos = (ric[ric.Speed == ric_low]['Distance'])
This is only the tip of the iceberg!
Creating Wrapper class
After working with Fast-F1 for a while the need rose for a wrapper class, reducing the amount of code to write, improve code readability and implement an in-memory cache. Especially when writing multiple analysis on the same event over different sessions and for multiple drivers still take a while without an in-memory cache.
This wrapper class can also contain the written comparisons and overview generating methods developed. The full code is available on GitHub in lmeulen/F1Analysis.
The wrapper will contain the following data fields:
season - Year of the season analysed
event - Name of the current event analysed
events - Information on all events in the given season
drivers - Information on all drivers active in the given season
sessions - Dictionaty with cache with all downloaded sessions
The sessions dictionary acts as the in memory cache for all downloaded sessions. These can span multiple events, but are all witin on season. The latter is the choice made to keep the amount of memory used acceptable but can be relatively easily be extended over multiple years or reduced to only one event.
Getting the laps of a specific session:
f1h = F1Helper(2021, 'Sty')
laps = f1h.get_session(session='Q')
After the call to .get_session(), all sessions of the Styrian grand prix are stored in the in-memory cache. The first execution will take a bit more time to download all sessions (line 54–59), but following calls will be fast because the resulsts are cached (line 51). The speed makes it easier to start with a clean slate for every analysis by calling the .get_session() method. If no event is given, the default event is used (line 48).
Setting the default event tries to match the given string with then event names, the country names and the city names where the events take place. This makes it easier to select a given event without having to specify the full official name (inclusing the chances of a typing error).
For downloading events and drivers per year, the excellent online API of Ergast is used. Fast-F1 also uses this API. Note that the actual implementation on GitHub contains more error checking and parameter validation. For illustrational purposes the shorter version is shown in this article.
Lap comparison visualisation
Now it is the moment to make the first visualisation. This will be a lap comparison visual:
We will compare the fastest qualification laps of Alonso and Vettel in the qualification of the Austrian GP 2021. So first we obtain the lap information for the laps to compare :
f1h = F1Helper(2021, 'Aus')
laps = f1h.get_session(session='Q')
alo_lap = laps.pick_driver("ALO").pick_fastest()
vet_lap = laps.pick_driver("VET").pick_fastest()
The laps instance is of object type Laps from Fast-F1. So we can use the pick_driver and pick_fastest as in the previous example.
Before we can start creating the image, we need some helper functions. The lap information contains limited information on the drivers so we need the information retrieved from Eregast in the f1h.drivers cache.
Before we can compose the image above, we need some helper methods to translate data to printable strings. These include time2str to translate a lap time to a string and timediff2str to translate the difference between two times (lap times or sector times) to a string. For the driver we need to translate his code (3 letters) to his full name (first name camel-case, last name uppercase) and to transform the tyre compound to a compound identifier, sometimes with and sometimes without brackets:
The .get_driver(did) checks the parameter against the driver code, his family name and his starting engine. So with the Lap data frames and these helper functions we can create all the pieces of the visualisation we want to make.
We will make the graph using Pillow, a python package to edit (and create) images. We will start with a black canvas of 700x200 (width x height) and then adding the text parts. The layout and distances are as follows:
The left side (200x200 pixels) gives information on the first lap and the driver and is left aligned. The right side is the same for the second lap but right aligned. In the middle the comparison per sector and lap is presented.
With a inter-line spacing of 30 pixels and the first line at 20 pixels, there are five lines available to display text. A canvas is created with Image.new(width, height) and text is added with Image.text((x, y), text, RGBColor, font). The fonts used are a close resemblance of the official F1 fonts and are created by Logo Smith and free to download.
Two methods are created to add text to the canvas. The add_text(…) method adds a text at the specified location, with the specified font and colour. The text can be aligned left or right to the specified location. If the text is aligned left it simply starts at the location. Right alignment means the text ends at the specified location and thus starts at the location minus the text length. The add_time_diff() method is a wrapper around the add_text() method that determines the colour by looking at the first character. If this is a ‘+’ the text will be green, if it is a ‘-’ it will be yellow:
So now that all helper functions are in place, it is time to create the actual visual. The canvas is create in lines 13 and 14. The information in the left part of the visual is added in lines 18 till 28 and the right side in lines 34 till 45. Two vertical separators (lines) are added for visual ease (lines 30–32). The lap time comparison is done in two parts, first the static labels are added (47–51) and then the actual comparison (52–60). These two are split to get the alignment correct. A combined string can not be aligned left for the first part and right for te second part.
So now the visual is created by calling the compare_laps_graphic() method:
f1h = F1Helper(2021, 'Aus')
laps = f1h.get_session(session='Q')
alo_lap = laps.pick_driver("ALO").pick_fastest()
vet_lap = laps.pick_driver("VET").pick_fastest()
f1h.compare_laps_graphic(alo_lap, vet_lap)
Et voila:
Now that feels good :-) I know this visual has not the slick look and feel of the official F1 overlays used during broadcast (fancy colours, driver photos, etc) but it gives a lot of satisfaction to make your own.
What is next?
While analysing some qualification laps it became clear that the qualification laps have no indication on which qualifying part they belong to. It is one big set of laps, not a set for Q1, a set for Q2 and a set for Q3.
So in the next part it is time to split the laps over the three parts and present the qualification results accordingly. Sounds simple, but a lot more work than expected….
The complete code can be found on GitHub.
Final words
I hope you enjoyed this article. For more inspiration check some of my other articles:
- Side-by-side comparison of strings in Python
- Remove personal information from text with Python
- Parallel web requests with Python
- All public transport leads to Utrecht, not Rome
- Visualization of travel times with OTP and QGIS
Disclaimer: The views and opinions included in this article belong only to the author.