pygmt.select

pygmt.select(data=None, output_type='pandas', outfile=None, **kwargs)[source]

Select data table subsets based on multiple spatial criteria.

This is a filter that reads (x, y) or (longitude, latitude) positions from the first 2 columns of data and uses a combination of 1-7 criteria to pass or reject the records. Records can be selected based on whether or not they:

  1. are inside a rectangular region (region [and projection])

  2. are within dist km of any point in pointfile (dist2pt)

  3. are within dist km of any line in linefile (dist2line)

  4. are inside one of the polygons in polygonfile (polygon)

  5. are inside geographical features (based on coastlines)

  6. have z-values within a given range

  7. are inside bins of a grid mask whose nodes are non-zero

The sense of the tests can be reversed for each of these 7 criteria by using the reverse parameter.

Full option list at https://docs.generic-mapping-tools.org/6.5/gmtselect.html

Aliases:

  • A = area_thresh

  • C = dist2pt

  • D = resolution

  • F = polygon

  • G = gridmask

  • I = reverse

  • J = projection

  • L = dist2line

  • N = mask

  • R = region

  • V = verbose

  • Z = z_subregion

  • b = binary

  • d = nodata

  • e = find

  • f = coltypes

  • g = gap

  • h = header

  • i = incols

  • o = outcols

  • s = skiprows

  • w = wrap

Parameters:
  • data (str, numpy.ndarray, pandas.DataFrame, xarray.Dataset, or geopandas.GeoDataFrame) – Pass in either a file name to an ASCII data table, a 2-D numpy.ndarray, a pandas.DataFrame, an xarray.Dataset made up of 1-D xarray.DataArray data variables, or a geopandas.GeoDataFrame containing the tabular data.

  • output_type (Literal['pandas', 'numpy', 'file'], default: 'pandas') –

    Desired output type of the result data.

    • pandas will return a pandas.DataFrame object.

    • numpy will return a numpy.ndarray object.

    • file will save the result to the file specified by the outfile parameter.

  • outfile (str | None, default: None) – File name for saving the result data. Required if output_type="file". If specified, output_type will be forced to be "file".

  • area_thresh (float or str) – min_area[/min_level/max_level][+a[g|i][s|S]][+l|r][+ppercent]. Features with an area smaller than min_area in km2 or of hierarchical level that is lower than min_level or higher than max_level will not be plotted [Default is "0/0/4" (all features)].

  • dist2pt (str) – pointfile|lon/lat+ddist. Pass all records whose locations are within dist of any of the points in the ASCII file pointfile. If dist is zero, the 3rd column of pointfile must have each point’s individual radius of influence. If you only have a single point, you can specify lon/lat instead of pointfile. Distances are Cartesian and in user units. Alternatively, if region and projection are used, the geographic coordinates are projected to map coordinates (in centimeters, inches, meters, or points, as determined by PROJ_LENGTH_UNIT) before Cartesian distances are compared to dist.

  • dist2line (str) – linefile+ddist[+p]. Pass all records whose locations are within dist of any of the line segments in the ASCII multiple-segment file linefile. If dist is zero, we will scan each sub-header in linefile for an embedded -Ddist setting that sets each line’s individual distance value. Distances are Cartesian and in user units. Alternatively, if region and projection are used, the geographic coordinates are projected to map coordinates (in centimeters, inches, meters, or points, as determined by PROJ_LENGTH_UNIT) before Cartesian distances are compared to dist. Append +p to ensure only points whose orthogonal projections onto the nearest line-segment fall within the segment’s endpoints [Default considers points “beyond” the line’s endpoints].

  • polygon (str) – polygonfile. Pass all records whose locations are within one of the closed polygons in the ASCII multiple-segment file polygonfile. For spherical polygons (lon, lat), make sure no consecutive points are separated by 180 degrees or more in longitude.

  • resolution (str) – resolution[+f]. Ignored unless mask is set. Selects the resolution of the coastline data set to use ((f)ull, (h)igh, (i)ntermediate, (l)ow, or (c)rude). The resolution drops off by ~80% between data sets. [Default is l]. Append (+f) to automatically select a lower resolution should the one requested not be available [Default is abort if not found]. Note that because the coastlines differ in details it is not guaranteed that a point will remain inside [or outside] when a different resolution is selected.

  • gridmask (str) – Pass all locations that are inside the valid data area of the grid gridmask. Nodes that are outside are either NaN or zero.

  • reverse (str) –

    [cflrsz]. Reverse the sense of the test for each of the criteria specified:

    • c select records NOT inside any point’s circle of influence.

    • f select records NOT inside any of the polygons.

    • g will pass records inside the cells with z equal zero of the grid mask in gridmask.

    • l select records NOT within the specified distance of any line.

    • r select records NOT inside the specified rectangular region.

    • s select records NOT considered inside as specified by mask (and area_thresh, resolution).

    • z select records NOT within the range specified by z_subregion.

  • projection (str) – projcode[projparams/]width|scale. Select map projection.

  • mask (str or list) –

    Pass all records whose location is inside specified geographical features. Specify if records should be skipped (s) or kept (k) using 1 of 2 formats:

    • wet/dry.

    • ocean/land/lake/island/pond.

    [Default is s/k/s/k/s (i.e., s/k), which passes all points on dry land].

  • region (str or list) – xmin/xmax/ymin/ymax[+r][+uunit]. Specify the region of interest.

  • verbose (bool or str) –

    Select verbosity level [Default is w], which modulates the messages written to stderr. Choose among 7 levels of verbosity:

    • q - Quiet, not even fatal error messages are produced

    • e - Error messages only

    • w - Warnings [Default]

    • t - Timings (report runtimes for time-intensive algorithms)

    • i - Informational messages (same as verbose=True)

    • c - Compatibility warnings

    • d - Debugging messages

  • z_subregion (str or list) – min[/max][+a][+ccol][+i]. Pass all records whose 3rd column (z; col = 2) lies within the given range or is NaN (use skiprows to skip NaN records). If max is omitted then we test if z equals min instead. This means equality within 5 ULPs (unit of least precision; http://en.wikipedia.org/wiki/Unit_in_the_last_place). Input file must have at least three columns. To indicate no limit on min or max, specify a hyphen (-). If your 3rd column is absolute time then remember to supply coltypes="2T". To specify another column, append +ccol, and to specify several tests pass a list of arguments as you have columns to test. Note: When more than one z_subregion argument is given then the reverse="z" cannot be used. In the case of multiple tests you may use these modifiers as well: +a passes any record that passes at least one of your z tests [Default is all tests must pass], and +i reverses the tests to pass record with z value NOT in the given range. Finally, if +c is not used then it is automatically incremented for each new z_subregion argument, starting with 2.

  • binary (bool or str) –

    i|o[ncols][type][w][+l|b]. Select native binary input (using binary="i") or output (using binary="o"), where ncols is the number of data columns of type, which must be one of:

    • c - int8_t (1-byte signed char)

    • u - uint8_t (1-byte unsigned char)

    • h - int16_t (2-byte signed int)

    • H - uint16_t (2-byte unsigned int)

    • i - int32_t (4-byte signed int)

    • I - uint32_t (4-byte unsigned int)

    • l - int64_t (8-byte signed int)

    • L - uint64_t (8-byte unsigned int)

    • f - 4-byte single-precision float

    • d - 8-byte double-precision float

    • x - use to skip ncols anywhere in the record

    For records with mixed types, append additional comma-separated combinations of ncols type (no space). The following modifiers are supported:

    • w after any item to force byte-swapping.

    • +l|b to indicate that the entire data file should be read as little- or big-endian, respectively.

    Full documentation is at https://docs.generic-mapping-tools.org/6.5/gmt.html#bi-full.

  • nodata (str) – i|onodata. Substitute specific values with NaN (for tabular data). For example, nodata="-9999" will replace all values equal to -9999 with NaN during input and all NaN values with -9999 during output. Prepend i to the nodata value for input columns only. Prepend o to the nodata value for output columns only.

  • find (str) – [~]“pattern” | [~]/regexp/[i]. Only pass records that match the given pattern or regular expressions [Default processes all records]. Prepend ~ to the pattern or regexp to instead only pass data expressions that do not match the pattern. Append i for case insensitive matching. This does not apply to headers or segment headers.

  • coltypes (str) – [i|o]colinfo. Specify data types of input and/or output columns (time or geographical data). Full documentation is at https://docs.generic-mapping-tools.org/6.5/gmt.html#f-full.

  • gap (str or list) –

    x|y|z|d|X|Y|Dgap[u][+a][+ccol][+n|p]. Examine the spacing between consecutive data points in order to impose breaks in the line. To specify multiple criteria, provide a list with each item containing a string describing one set of criteria.

    • x|X - define a gap when there is a large enough change in the x coordinates (upper case to use projected coordinates).

    • y|Y - define a gap when there is a large enough change in the y coordinates (upper case to use projected coordinates).

    • d|D - define a gap when there is a large enough distance between coordinates (upper case to use projected coordinates).

    • z - define a gap when there is a large enough change in the z data. Use +ccol to change the z data column [Default col is 2 (i.e., 3rd column)].

    A unit u may be appended to the specified gap:

    • For geographic data (x|y|d), the unit may be arc- d(egrees), m(inutes), and s(econds) , or (m)e(ters), f(eet), k(ilometers), M(iles), or n(autical miles) [Default is (m)e(ters)].

    • For projected data (X|Y|D), the unit may be i(nches), c(entimeters), or p(oints).

    Append modifier +a to specify that all the criteria must be met [default imposes breaks if any one criterion is met].

    One of the following modifiers can be appended:

    • +n - specify that the previous value minus the current column value must exceed gap for a break to be imposed.

    • +p - specify that the current value minus the previous value must exceed gap for a break to be imposed.

  • header (str) –

    [i|o][n][+c][+d][+msegheader][+rremark][+ttitle]. Specify that input and/or output file(s) have n header records [Default is 0]. Prepend i if only the primary input should have header records. Prepend o to control the writing of header records, with the following modifiers supported:

    • +d to remove existing header records.

    • +c to add a header comment with column names to the output [Default is no column names].

    • +m to add a segment header segheader to the output after the header block [Default is no segment header].

    • +r to add a remark comment to the output [Default is no comment]. The remark string may contain \n to indicate line-breaks.

    • +t to add a title comment to the output [Default is no title]. The title string may contain \n to indicate line-breaks.

    Blank lines and lines starting with # are always skipped.

  • incols (str or 1-D array) –

    Specify data columns for primary input in arbitrary order. Columns can be repeated and columns not listed will be skipped [Default reads all columns in order, starting with the first (i.e., column 0)].

    • For 1-D array: specify individual columns in input order (e.g., incols=[1,0] for the 2nd column followed by the 1st column).

    • For str: specify individual columns or column ranges in the format start[:inc]:stop, where inc defaults to 1 if not specified, with columns and/or column ranges separated by commas (e.g., incols="0:2,4+l" to input the first three columns followed by the log-transformed 5th column). To read from a given column until the end of the record, leave off stop when specifying the column range. To read trailing text, add the column t. Append the word number to t to ingest only a single word from the trailing text. Instead of specifying columns, use incols="n" to simply read numerical input and skip trailing text. Optionally, append one of the following modifiers to any column or column range to transform the input columns:

      • +l to take the log10 of the input values.

      • +d to divide the input values by the factor divisor [Default is 1].

      • +s to multiple the input values by the factor scale [Default is 1].

      • +o to add the given offset to the input values [Default is 0].

  • outcols (str or 1-D array) –

    cols[,…][,t[word]]. Specify data columns for primary output in arbitrary order. Columns can be repeated and columns not listed will be skipped [Default writes all columns in order, starting with the first (i.e., column 0)].

    • For 1-D array: specify individual columns in output order (e.g., outcols=[1,0] for the 2nd column followed by the 1st column).

    • For str: specify individual columns or column ranges in the format start[:inc]:stop, where inc defaults to 1 if not specified, with columns and/or column ranges separated by commas (e.g., outcols="0:2,4" to output the first three columns followed by the 5th column). To write from a given column until the end of the record, leave off stop when specifying the column range. To write trailing text, add the column t. Append the word number to t to write only a single word from the trailing text. Instead of specifying columns, use outcols="n" to simply read numerical input and skip trailing text. Note: If incols is also used then the columns given to outcols correspond to the order after the incols selection has taken place.

  • skiprows (bool or str) –

    [cols][+a][+r]. Suppress output for records whose z-value equals NaN [Default outputs all records]. Optionally, supply a comma-separated list of all columns or column ranges to consider for this NaN test [Default only considers the third data column (i.e., cols = 2)]. Column ranges must be given in the format start[:inc]:stop, where inc defaults to 1 if not specified. The following modifiers are supported:

    • +r to reverse the suppression, i.e., only output the records whose z-value equals NaN.

    • +a to suppress the output of the record if just one or more of the columns equal NaN [Default skips record only if values in all specified cols equal NaN].

  • wrap (str) –

    y|a|w|d|h|m|s|cperiod[/phase][+ccol]. Convert the input x-coordinate to a cyclical coordinate, or a different column if selected via +ccol. The following cyclical coordinate transformations are supported:

    • y - yearly cycle (normalized)

    • a - annual cycle (monthly)

    • w - weekly cycle (day)

    • d - daily cycle (hour)

    • h - hourly cycle (minute)

    • m - minute cycle (second)

    • s - second cycle (second)

    • c - custom cycle (normalized)

    Full documentation is at https://docs.generic-mapping-tools.org/6.5/gmt.html#w-full.

Return type:

DataFrame | ndarray | None

Returns:

ret – Return type depends on outfile and output_type:

  • None if outfile is set (output will be stored in file set by outfile)

  • pandas.DataFrame or numpy.ndarray if outfile is not set (depends on output_type)

Example

>>> import pygmt
>>> # Load a table of ship observations of bathymetry off Baja California
>>> ship_data = pygmt.datasets.load_sample_data(name="bathymetry")
>>> # Only return the data points that lie within the region between
>>> # longitudes 246 and 247 and latitudes 20 and 21
>>> out = pygmt.select(data=ship_data, region=[246, 247, 20, 21])