Eyes on Court
And the Hunger for Tennis Data
Perspectives matter. Where you’re positioned and the language(s) you speak affect both what you perceive and how you perceive. This is as true of tennis as any other endeavor you pursue. In tennis the perspectives you take may include player/opponent, parent, friend, coach, psychologist, massage therapist, club, academy, federation, sponsor, journalist/broadcaster, quant, and vendor (equipment / software). Each of these perspectives has something to say about what happens on a tennis court, and each is hungry for data related to what happened before, during, or after time on court.
What I hope to present in this article is an argument that further bolsters the position that the development of a set of data standards for tennis is the best path forward. There has been much “buzz” lately about tennis ratings (a topic I addressed recently) which distracts from more fundamental issues which should be addressed first. I think it is a mistake to see a single rating as an attribute that is universal in scope, as some sort of linchpin for the tennis ecosystem. Ratings are not the most important factor in the development and sustenance of a tennis player, and their misapplication could create a situation where less innovation is possible.
Let me explain.
Tennis has been described as a “fragmented, decentralized landscape,” but, as with many other sports, there are a growing number of efforts to knit the Tenniverse together (which is having the effect of making the landscape more “lumpy”). Generically speaking, there are “Competition Management Platforms” (example, example, example, example, example) which seek to cater to all sports with tools for managing, among other things, participants, courts, event calendars and competition formats. There are also platforms dedicated to racquet sports, such as Tencap and Setteo, and specifically to tennis, such as Str8 Sets. All of these are closed ecosystems. The knitting together of perspectives in these platforms views each actor as a customer, while data is centrally managed. The important point here is that data does not flow between actors/perspectives so much as it is presented to end users who represent different roles. These roles are both defined and constrained by the platform, and the data is contained within the platform.
Contrast the idea of a closed application ecosystem, where data flows are proprietary internal formats, with the idea that data could be produced and consumed easily (if not freely) by all actors within an open network. Independent actors with access to data, in standard formats, from many sources, can more effectively pursue innovative views of goals and concerns informed by their unique perspective, by the role that they play in the development of a player, a team, or the sport in general. Innovation is fueled by data; different sets of eyes see differently and are hungry for data that feeds their particular area of interest. Public datasets are increasing in number, as are explorations of how this data can be effectively analyzed and presented. There’s no reason that all members of the tennis family shouldn’t benefit from the vibrant Open Data and Open Source movements that are transforming how data is managed and how software is created.
At the moment, data flows related to tennis are rather limited. Live scores are officially fed to betting platforms and and various end-user applications, but the vast majority of data related to tennis is “scraped”, meaning that data is pulled from websites such as ITFTennis.com, ATP.com, WTA.com and national federations, including tennislink.USTA.com, by parsing the HTML or otherwise intercepting the data that is intended for presentation on these websites. Matchstats.com, Coretennis.net, Universaltennis.com, UltimateTennisStatistics.com, TennisAbstract.com, TennisRecuriting.net and dozens more that are country specific (i.e. icircolideltennis.it), as well as independent researchers, all rely on scraped data. Ironically, some sites (including Universaltennis.com) prohibit the scraping of their (mostly scraped) data in their Terms of Service; though to be fair the ratings calculated by a proprietary algorithm are the product of wholly owned intellectual capital and deserve to be safeguarded.
Many of the sites which “scrape” data can be considered innovative. They are adding value to data by combining it with data from other sources and making it easier or more pleasant (interactive data visualizations) to access. Some are adding historical views unavailable on the original sites, and some are generating unique statistics and player ratings (ELO-based/UTR). There is an incredible amount of inefficiency in each of these actors duplicating the effort of scraping data. Why shouldn’t they be able to access tournament results in a standard format? Further, imagine the innovation that could occur were other types of tennis data to be made accessible in standard formats (calendars, match tracking & results of video analysis, equipment sensors & biometrics, development goals, etc.). Software vendors who don’t aspire to becoming a closed ecosystem or being beholden to dominant interests could collaborate to create something greater than the sum of their parts.
Dozens of articles can be written about new tennis applications that could be pursued were there a set of standards in place (and vendors who perceived the benefit of supporting such standards). I alluded to a number of such possibilities in “TOSS: Tennis Open Software Standards”. With respect for the TL;DR meme, I’m choosing to focus now on one specific fragment of the possible: match results (including tournament contexts), and what can be derived from them. This focus is appropriate both because match results make up the bulk of tennis data which is transported over the internet, and because match results are the basis for both player rankings and player ratings, which are currently attracting a surprisingly large percentage of the “eyes on court”.
An Engine for Ratings and Rankings
Ratings are typically seen as a measure indicating players’ “level of play,” while Rankings seek to define where an individual stands within a category (usually an age or gendered group). Of course there are substantial criticisms of the numerous rating algorithms and ranking methodologies that exist… and one salient criticism is that in many countries they don’t exist at all!
Constructing tennis rank lists, in many places, is still a laborious manual process of collating points from tournaments which themselves are “categorized” by the relative strength of players who are expected to participate; tennis ratings, until very recently, have mostly been based on subjective evaluation of player skills. Some countries, even with sufficient information technology in place, generate rank lists monthly, quarterly or even annually! The problem is data management, and stems in part from lack of resources, but more often from lack of in-house technical sophistication.
Now that ratings calculated directly from match results have become a hot commodity, with the ITF board even approving an initiative to launch a new “International Tennis Rating,” it appears that there may be some near term relief for the frustrations that have been building among the constituents of national federations. (The crucial difference is that ratings calculated from match results are seen as tools which can be used by players, their families, coaches, clubs, academies, and universities, all of whom are hungry for data; they are no longer simply a metric used by organizations/federations for placing players into categories.) It seems simple, really… match results just need to make their way to a central data store from which a rating can be calculated and fed back to interested parties.
It seems simple, but the manner in which this vision is achieved will have profound implications for what other perspectives are possible, for how other points of view can be accommodated going forward. If match results are vacuumed up into a closed ecosystem (as now appears to be the case with Universal Tennis), if data pipelines are put into place without regard for other stakeholders who may usefully tap into the data streams, then creativity and innovation will be suppressed. My concern is that the “ratings hook” may be leveraged in the short term into a data architecture that excludes many existing interests and frustrates future attempts to crack it open.
Instead of the idea of a single rating (be it UTR or the ITF’s new rating) driving the consolidation of global match results, imagine the idea of a “Ratings and Rankings Engine” layered on top of a global results repository. Such an engine could be constructed so that users (including clubs, federations, researchers, vendors, etc.) would be able to filter match results according to any number of criteria and apply different ratings algorithms or ranking calculations to the resulting dataset, to support existing competition structures (some federations use different ratings for different player categories) or in the pursuit of new competition “products”. In this architecture a federation which has no existing ranking (or rating) system could, simply by adding match results, select from a menu of possible metrics. A rating could be applied to results from all events in a given time period, or only to events sanctioned by a selection of governing bodies; additional attributes could determine how dynamically ratings change with the addition of new data. National rankings in European countries, for instance, could easily be specified such that they include results from all Tennis Europe events (as is reportedly done in Germany). Third parties such as TennisRecruiting.net and UniversalTennis.com could tap into such a repository to offer their unique services, with the assurance that official changes to match results would percolate through the system, obviating the need for customer service departments to field calls from around the world.
Most significantly, perhaps, a set of standards for the aggregation of match results (and the accompanying ancillary data such as tournament dates and locations) would mean that existing vendors could be invited to participate in the construction of the platform and existing investments on the part of federations, academies, and clubs could be leveraged to put the platform into place. Existing vendors benefit from a number of the attributes which a global results repository needs to define, and by becoming consumers as well as producers of the data made available could broaden their reach and add depth to their offerings.
Prerequisites & Perspective Taking
Properly curating player results, globally, requires a unique “Global ID” for every player. At present one of the biggest problems aggregators of match results face is that players have multiple identification numbers (regional, national, international; sometimes even club and custom organizer IDs). This problem is further compounded when there are no IDs available to scrape; the presentation of players’ names can vary widely, and international sites tend to use a latin alphabet whereas national sites often use local alphabets; names are often written differently when translated to the latin alphabet and diacritics are removed. Putting a strategy in place for the rollout of a Global ID is the first step that should taken before any rating which purports to be international can be taken seriously.
If a global results repository were put into place we can imagine how such a resource could be viewed from the various perspectives which different stakeholders bring to the court, or apply to the data generated by what happens on court. The International Tennis Federation could have a broad view of how various competition structures are used around the world and cross-sectional analysis could better identify factors related to player retention, for instance; this would inform the guidance they provide their constituents. Understanding the frequency with which players have matches which can be considered competitive is relatively low-hanging fruit and can deliver great value even before the application of ratings is considered.
Regional Associations (Tennis Europe, Asian Tennis Federation, Confederacion Sudamericana de Tenis, Confédération africaine de tennis) represent the pooled interest of federations within connected geographical areas, particularly with respect to junior tennis, and can benefit by hosting a subset of a global datastore, giving them an ability to perform more comprehensive analysis of player activities within their domains. At present they are mostly confined to data generated by tournaments which are conducted in their name and they have little or no visibility into national events. Such visibility could better inform the coordination of event calendars and facilitate inter-country integration, where desirable, at any level.
Aggregated match and tournament data can be seen as a lattice or a trellis on top of which the vines of player development may be trained. The perspectives of coaches and players suggest that a global repository of results provides a framework which, if properly architected, can be used to add context to data generated by other activities related to match play. Imagine if the data islands generated by match trackers, racquet sensors, video analysis, and coaches’ training programs could all be brought together into a combined view so that players, parents and coaches could navigate smoothly between all of the components brought to bear on a development plan.
How the eye is trained
Marketing can often make us feel like we’re in the middle of a maelstrom of claims, swirling about us. At the moment Ratings for Tennis appear to be taking up a lot of space in the various “feeds” that fill our screens. My position is that while I’m a fan of ratings and am convinced that they can be useful, it is a mistake to try to apply them “universally”. I side with Marshall McLuhan in believing that amusement is the proper strategy for insuring our survival in the face of a marketing onslaught, and that like Poe’s sailor trapped in a vortex, observing the relative velocities of memes by taking a position of rational detachment can be extremely informative, and an antidote to tunnel vision. Ratings have accelerated too rapidly, due to aggressive marketing, and are distracting from a broader view of what can be achieved if an open architecture and a set of data standards are put into place which can benefit all. An eye trained to use peripheral vision can observe not only the trajectory of the ball, but also the movement of the player across the net. Ratings may be coming at you, but you will reap more rewards with a view of the whole court, and the context within which a contest occurs.
About the Author — Generating Perspectives
Prior to becoming somewhat conscious of the “Tenniverse”, I pursued systems theory, actor-network theory, cybernetics and permaculture while reflecting on a professional career built upon a foundation of systems integration. Now I’m pursuing a holistic view of data and the information flows that relate to tennis, so I’m interested in what people who take different perspectives (actors in social networks related to tennis) have to say to each other, and whether they can effectively communicate. This interest inspired a proposal for a set of software standards for tennis, and I chose the name CourtHive for my tennis-themed Open Source software development efforts to evoke both an image of a court encompassing many points of view as well as the idea of the hive mind. I also see a bee’s compound eye, and the fact that they have five eyes, as suggestive of there being multiple perspectives, multiple ways that the same object, person, or event can be perceived.