C-squares


C-squares is a system of spatially unique, location-based identifiers for areas on the surface of the earth, represented as cells from a latitude- and longitude-based Discrete Global Grid at a hierarchical set of resolution steps, obtained by progressively subdividing 10×10 degree World Meteorological Organization squares; the term "c-square" is also available for use to designate any component cell of the grid. Individual cell identifiers incorporate literal values of latitude and longitude in an interleaved notation, together with additional digits that support intermediate grid resolutions of 5, 0.5, 0.05 degrees, etc.
The system was initially designed to represent data "footprints" or spatial extents in a more flexible manner than a standard minimum bounding rectangle, and to support "lightweight", text-based spatial querying; it can also provide a set of identifiers for grid cells used for assembly, storage and analysis of spatially organised data, in a unified notation that transcends national or jurisdictional boundaries. Dataset extents expressed in c-squares notation can be visualised using a web-based utility, the c-squares mapper, an online instance of which is currently provided by CSIRO Oceans and Atmosphere in Australia. C-squares codes and associated published software are free to use and the software is released under version 2 of the GNU General Public License, a licence of the Free Software Foundation.

History

The c-squares method was developed by Tony Rees at CSIRO Oceans and Atmosphere in Australia in 2001–2, initially as a method for spatial indexing, rapid query, and compact storage and visualization of dataset spatial "footprints" in an agency-specific metadata directory ; it was first publicly announced at the 2002 "EOGEO" Technical Workshop held at Ispra, Italy in May 2002. A more complete description was published in the scientific literature in 2003, together with a web-accessible mapping utility entitled the "c-squares mapper" for visualisation of data extents expressed in the c-squares notation. Since that time, a number of projects and international collaborations have employed c-squares to support spatial indexing and/or map production, including FishBase, the Ocean Biogeographic Information System, AquaMaps, data analysis to support the designation of marine biogeographic realms, for multi-national fisheries data collation by the Scientific, Technical and Economic Committee for Fisheries of the European Commission, and for data reporting by ICES. For its application in displaying and modelling global biodiversity data, c-squares was one of four components cited in the award of the Ebbe Nielsen Prize to Rees by the Global Biodiversity Information Facility in 2014. The concept of representing dataset "footprints" as cells of spatial data of this nature and alignment was stated to have been inspired by the data addressing method in the U.S. National Oceanographic Data Center "World Ocean Database" product, which uses 10 degree World Meteorological Organization squares for organising its data content, and the set of 1:100,000 topographic maps issued by the national mapping agency for Australia ; each map covers a 0.5 degree square and, with its associated mapsheet labels, can notionally be used as a unit of spatial identification. The method has been discussed further in texts on georeferencing, including those by Hill, 2006 and Guo et al., 2020.
The system name "c-squares" was chosen because it can be represented as an acronym and also because it signals that this method belongs to a notional group of similarly named, latitude-longitude gridded subdivisions of the Globe that includes World Meteorological Organization Squares and Marsden squares, and contrasts with other tessellations of the Globe that use different shaped basic units such as rectangles, triangles, diamonds, and hexagons. It is also intended that any individual component cell of the grid can be referred to as a "c-square".

Rationale

Spatial data are inherently 2-dimensional; without additional indexing, a numeric range query in 2 dimensions is required to retrieve data items within a particular area. Such queries are computationally expensive so it can be beneficial to pre-process the data in some manner that reduces the inherent dimensionality from two to one dimension, for example as labelled cells of a grid. The grid labels can then be indexed by standard, one dimensional methods for rapid search and retrieval, and/or searched by simple alphanumeric text searches. C-squares is an example of such a grid where the cell identifiers are designed to be human- as well as machine-readable, and to be concordant with recognizable and commonly intervals of latitude and longitude.
Additional areas where a grid-based approach to spatial indexing can be beneficial can be for the representation of data "footprints" in support of spatial search, data binning to reduce complex and potentially voluminous data into "blocks" which then can be more easily compared and summarised, and the potential for a hierarchical approach wherein finer resolutions of the grid are nested into coarser ones, with a shared notation. A jurisdiction-independent, grid such as c-squares can also be used to integrate data across national boundaries, in contrast to the national grids of various countries such as those of the United Kingdom, Ireland, etc., which are not the same in their approach and may have differences or gaps where such grids overlap, or fail to meet.
A potential disadvantage of "equal angle" grids, which are based on standardised units of latitude and longitude, is that the length of the "sides" and the shape of the grid cells is not constant on the ground, and some particular effects are noticeable at the poles, where the cells become 3- rather than 4-sided in practice. These disadvantages can be offset by the advantages that data transformation in and out of grid notation can be accomplished by relatively straightforward steps, the results are congruent with conventional maps that show intervals of latitude and longitude, and the concepts of "1-degree squares" and "0.5 degree squares" may have familiarity and meaning to human users, in a way that non-square, purely mathematically derived shapes and sizes may not.

The c-squares global grid notation

Initial 10 degree squares

10-degree c-squares are specified as being identical to equivalent World Meteteorological Organization square codes, refer illustration at right. These squares are aligned with 10-degree subdivisions of the global latitude–longitude grid, which for c-squares use is specified as employing the WGS84 datum. WMO squares are encoded with four digits, in the series 1xxx, 3xxx, 5xxx and 7xxx. The leading digit indicates the "global quadrant" with 1 for north-east, 3 for south-east, 5 for south-west and 7 for north-west. The next digit, 0 through 8, corresponds to the tens of latitude degrees either north or south; while the remaining 2 digits, 00 through 17, correspond to the tens of longitude degrees either east or west. Thus the 10 degree cell with its lower left corner at 0,0 is encoded 1000, and acts as a bin to contain all spatial data between 0 and 10 degrees north and 0 and 9.999... degrees east; the 10 degree cell with its lower left corner at 80 N, 170 E is encoded 1817, and acts as a bin to contain all spatial data between 80 and 90 degrees north and 170 and 179.999... degrees east.

Subsequent recursive subdivision

C-squares extends the initial WMO 10×10 square notation via a recursive series of "cycles", each 3 digits long, separated by the colon character, the number of characters indicating the resolution encoded, as per these examples:
  • 1000... 10×10 degree square
  • 1000:1... 5×5 degree square
  • 1000:100... 1×1 degree square
  • 1000:100:1... 0.5×0.5 degree square
  • 1000:100:100... 0.1×0.1 degree square
  • 1000:100:100:1... 0.05×0.05 degree square
Cell size is typically selected to suit the nature of the data to be encoded, the overall spatial extent of the area in question, the desired spatial resolution of the resulting grid, and the computing resources available. For example, relatively generalised, global compilations may be best suited to aggregate data by 10- or 5- degree cells, while more local gridded areas may favour 1-, 0.5- or 0.1- degree cells, as appropriate.
The nominal sizes given above reflect the fact that at the equator, 1 degree of both latitude and longitude correspond to around 110 km, with the actual value for longitude declining between there and the poles, where it becomes zero ; at a sample northern hemisphere latitude e.g. that of London, a 1×1 degree square measures approximately 111×69 km.
To produce the 1 or 3 digits in any cycle following the initial 4-digit, 10-degree square identifier, first an "intermediate quadrant", 1 through 4 is designated, where 1 indicates low absolute values of both latitude and longitude, 2 indicates low longitude and high latitude, 3 indicates high latitude and low longitude, and 4 indicates high values for both; "low" and high" being taken from the relevant portion of the data to be gridded. This leading digit in a cycle is then followed simply by the next applicable digit for first latitude and then longitude: thus an input value of latitude +11.0, longitude +12.0 degrees will be encoded as the 5 degree c-square code 1101:1 and the 1 degree code 1101:112. Inspection of this code will show that the input latitude value can be recovered directly from the digits 1101:112 while the longitude is included as 1101:112; the sign for these is both positive, as indicated by the first digit of the leading 4.
From 2002 onwards, an online "" is available at the website of CSIRO Marine Research which will convert input values of latitude and longitude to the equivalent c-square code at user selectable resolutions from 10 to 0.1 degree cell size. Alternatively it is a comparatively simple task to program from first principles according to the c-squares specification; an example is available .

C-squares strings, and the c-squares mapper

A set of c-squares can be represented as a concatenated list of individual square codes, separated by the "pipe" character, thus: 7500:110:3|7500:110:1|1500:110:3|1500:110:1. This set of squares can then serve as an indication of a dataset extent, similar in function to a MultiPolygon in the Well-known text representation of geometry, the functional difference being that defined points forming the boundary of a polygon can be continuously variable, while those for the c-square boundaries are constrained to fixed intervals from the grid square resolution in use. If these strings are stored, for example as "long text" within a field of a conventional text storage system they can be used for the operation of spatial searches.
C-squares strings can also be used directly as input to an instance of the "c-squares mapper", a web-based utility in operation since 2002 at CSIRO in Australia and also at other global locations. To visualize the position of any set of squares on a map, the current syntax to address an installation of the "c-squares mapper" is :
.
It should be noted here that the above call to the c-squares mapper is a simple one, with only a single parameter which produces a simple "default map"; the mapper is in fact quite highly customizable, capable of accepting up to seven c-squares strings concurrently, plotting them in user-specified colours, with a choice of empty of filled squares, user-selectable base map, etc. etc.; a full list of available input parameters is provided on the mapper "technical information" page. A more sophisticated map produced using a larger number of available parameters is the colour-coded example at right. Commencing in 2006, an upgrade of the mapper incorporating the independently-written Xplanet software also allows the plots of supplied c-squares to be displayed on a user-rotatable and zoomable globe, which can offer a more realistic view for either Pacific Ocean- or polar- centred data than are possible with a flat map projection.
Th c-squares mapper is one of several options currently available for real time mapping of fish point data records in FishBase, as per for the species Salmo trutta ; similar options are also available for other marine species via SeaLifeBase as per . Since 2006, the mapper has also produced in excess of 100,000 species maps for the AquaMaps project.

Spatial searching

In a system that uses c-squares codes as units of spatial indexing, a text-based search on any of these square identifiers will retrieve data associated with the relevant square. If a wildcard search is supported, a search on "7500%" will retrieve all data items in that ten degree square, a search on "7500:1%" will retrieve all data items in that five degree square, etc.
The asterisk character "*" has a special meaning in c-squares notation, being a "compact" notation indicating that all finer cells within a higher level cell are included, to the level of resolution indicated by the number of asterisks. In the example above, "7500:*" would indicate that all 4 five-degree cells within parent ten-degree cell "7500" are filled, "7500:***" would indicate that all 100 one-degree cells within parent ten-degree cell "7500" are filled, etc. This approach enables the filling of contiguous blocks of cells with an economy of characters in many cases, that is useful for efficient storage and transfer of c-squares codes as required.

Spatial data reporting, assembly, and analysis

C-squares has been employed at a range of resolutions for data reporting, assembly and analysis on scales ranging from global to local, also incorporating multi-national data compilations where a gridded data system is required that is not tied to the boundaries of any single jurisdiction. Examples include:
  • 5×5 degree squares:
  • *production of the first world scale map of marine biogeographic realms based on distributions of 65,000 marine species, by Costello et al., 2017
  • 1×1 degree squares:
  • *geographic presentation of marine mammal, seabird and sea turtle data by the OBIS-SEAMAP project
  • 0.5×0.5 degree squares:
  • *modelling of marine species distributions by the AquaMaps project, plus associated spatial search; the AquaMaps front page https://www.aquamaps.org/ offers "click on a map" spatial search facility based on 0.5×0.5 degree c-squares, example spatial search result . AquaMaps have been further employed in subsequent studies such as the Ecological Assessment of the Sustainable Impacts of Fisheries approach.
  • *reporting and collation of fishing activity by member states by the Scientific, Technical and Economic Committee for Fisheries of the European Commission Data contributed by 23 member States is available as a data product "Fisheries landings & effort: data by c-square ", further discussed in a 2020 STECF Working Group Report.
  • *analysing and forecasting fisheries time series data in the Indian Ocean
  • *delineating high priority areas for marine biodiversity conservation in the Coral Triangle, bordered by both the Pacific and Indian Oceans
  • *AquaMaps makes available its base data coverages of global marine environmental variables as c-squares gridded data at 0.5 degree resolution
  • 0.1×0.1 degree squares:
  • *fish catch reporting for the purpose of stock assessment by the New South Wales Department of Primary Industries/Fisheries New South Wales in Australia; examples:
  • 0.05×0.05 degree squares:
  • *Vessel monitoring system data and fishing logbook data for International Council for the Exploration of the Sea and others, as also implemented in the ICES "FishFrame" regional database
  • *identification of vulnerable marine ecosystems in the North-East Atlantic for the EU-funded Horizon 2020 ATLAS Project. According to Turner et al., 2021, "ATLAS partners helped develop a data aggregation approach, the VME Index, to help identify areas where vulnerable marine ecosystems are known or are likely to occur. The VME Index is a single metric based on a multi-criteria assessment method that combines VME indicator records within a C-square based on the abundance/presence of VME indicator taxa and how reliable the underlying data are.... ICES has been using the VME Index since 2018 to provide advice concerning the protection of vulnerable marine ecosystems."
  • *a 2019 ICES Report "Working Group on Spatial Fisheries Data " contains a number of example maps plotted using 0.05×0.05 degree c-squares, and also a discussion of whether or not a move towards 0.01×0.01 degree square reporting would be beneficial or detrimental
  • *a 2023 study of the effects of bottom trawlers in the southern Baltic Sea by H. Corell et al.
  • *identification of at-risk benthic areas and habitats in the Eastern Mediterranean Sea by Smith et al., 2023.
  • 0.01×0.01 degree squares:
  • *a survey of spatial patterns in deep-sea trawling off the Portuguese continental coast by Campos et al., 2021.
C-squares labelled cells were adopted as the underlying grid for analysis by the European Union-funded MINOUW project, via their web application, in support of spatial data supplied by project researchers across different European countries in a range of formats, in combination with layers of spatial information from external sources.

Target audience/potential users

According to its design principles, the principal target audience for c-squares is data custodians who wish to organise spatial data by latitude-longitude grid squares at any of the resolutions supported by the system, namely any decimal subdivision of either 10×10 or 5×5 degree squares, to support associated data query, retrieval, analysis, representation, and potential external data exchange and aggregation. Fine resolution c-squares may also be used as a general "location encoder", selected desirable attributes of which are discussed further by the developers of the Google Open Location Code method, since the c-squares method satisfies the majority of the criteria set out in that discussion document. As evidenced by the references cited in this article, principal adopters of the method to date have been concerned with marine data in particular; this most likely stems from the fact that the oceans are trans-national in their governance, therefore otherwise established local or national grids are unsuitable for analysis of ocean or fisheries data on anything other than a local scale. Although initially deployed in marine-related systems, in essence the system is terrain-agnostic and is applicable equally to both marine and terrestrial data.
An additional aspect of c-squares noted by Larsen et al., 2009 and either implicit or explicit in other equivalent "data aggregation methods" is the use of such frameworks to "allow general level analyses without exposing the precise coordinates of potentially sensitive information". For example, real time data on the exact location of fishing vessels is frequently considered "commercial in confidence" to avoid release to competitors of the best fishing localities according to the nature of the resource, which may be continually moving, while for biodiversity data, the exact location of individuals or nests of rare species may again not be desirable to release to the public. The use of grid cells or similar methods to accurately represent the general location of data points without revealing their more exact location, while still rendering the data available for statistical analysis, is a recognised useful approach in such situations, refer e.g. Chapman, 2020.

Congruence with other latitude-longitude geocoding systems

At its maximum scale, 10 degree c-squares are congruent with both World Meteorological Organization squares and Marsden squares, which share the same boundaries but use a different notation. Both 1 degree and 0.5 degree c-squares are partially congruent with "standard resolution" ICES Statistical Rectangles, which utilize a grid cell area of 1×0.5 degrees over a restricted portion of the Globe : 2 vertically adjacent ICES rectangles are exactly equivalent to a single 1 degree c-square, while if needed, the content of a single ICES rectangle can be apportioned between 2 horizontally adjacent 0.5 degree c-squares for data interchange at that resolution.
A separate system, QDGC or Quarter Degree Grid Cells, has been developed for interchange of some biodiversity data in Africa, and later extended to cope with data across the Equator and Prime Meridian. QDGC cells, at 0.25×0.25 degrees, lie between the 0.5×0.5 and 0.1×0.1 degree resolution steps of the c-squares system, and are thus not exactly compatible with it, although the "parent" squares of the QDGC grid from which they are derived, at 1×1 and 0.5×0.5 degrees, are congruent with equivalent c-squares grid cells, however using a different notation. In their proposal for an "extended" QDGC system, Larsen et al. additionally describe the potential subdivision of 0.25×0.25 degree QDGC cells by a recursive factor of 2, giving cell sizes of 0.125, 0.0625, 0.03125 degrees, etc., which progressively depart further from the "decimal degrees" concept incorporated into c-squares.

Licensing and software availability

There is no licence required to use the c-squares method, which has been openly published in the scientific literature since 2003. Source code for the mapper, etc., available via the SourceForge website, is released under the GNU General Public License version 2.0, which provides free use and redistribution, and subsequent modification for any purpose so long as that licence is retained with the product and any subsequent modifications, in other words, that all the released improved versions will also be free software.