Skip to content

Module bastionlab.polars

Sub-modules

Functions

train_test_split(*arrays:¬†"List['RemoteArray']", train_size:¬†Optional[float]¬†=¬†None, test_size:¬†Optional[float]¬†=¬†0.25, shuffle:¬†Optional[bool]¬†=¬†False, random_state:¬†Optional[int]¬†=¬†None) ‚ÄĎ> List[[bastionlab.polars.frame](frame.md).RemoteArray]

Split RemoteArrays into train and test subsets.

Args: train_size (Optional[float], optional): It should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If None, the value is automatically set to the complement of the test size. test_size (Optional[float], optional): It should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25. Defaults to 0.25. shuffle (Optional[bool], optional): Whether or not to shuffle the data before splitting. random_state (Optional[int], optional): Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

Classes

BastionLabPolars()

Main BastionLabPolars API class.

This class contains all the endpoints allowed on the BastionLab server for Polars. It is instantiated by the bastionlab.Client class and is accessible through the bastionlab.Client.polars property.

Methods

RemoteArray(self, identifier:¬†Optional[str]¬†=¬†None, reference:¬†Optional[bastionlab_pb2.Reference]¬†=¬†None) ‚ÄĎ> [bastionlab.polars.frame](frame.md).RemoteArray :

get_df(self, identifier:¬†str) ‚ÄĎ> FetchableLazyFrame

Returns a FetchableLazyFrame from an BastionLab DataFrame identifier.

Args: identifier (str): A unique identifier for the Remote DataFrame.

Returns: FetchableLazyFrame

list_dfs(self) ‚ÄĎ> List[FetchableLazyFrame]

Enlists all the DataFrames available on the BastionLab server.

Returns: List[FetchableLazyFrame]

send_df(self, df:¬†polars.internals.dataframe.frame.DataFrame, policy:¬†[bastionlab.polars.policy](policy.md).Policy¬†=¬†Policy(safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Review(), savable=True), sanitized_columns:¬†List[str]¬†=¬†[]) ‚ÄĎ> FetchableLazyFrame

This method is used to send pl.DataFrame to the BastionLab server.

It readily accepts pl.DataFrame and also specifies the DataFrame policy and a list of sensitive columns.

Args: df (pl.DataFrame): Polars DataFrame policy (Policy, optional): BastionLab Remote DataFrame policy. This specifies which operations can be performed on DataFrames and they specified the data owner. sanitized_columns (List[str], optional): This field contains (sensitive) columns in the DataFrame that are to be removed when a Data Scientist wishes to fetch a query performed on the DataFrame.

Returns: FetchableLazyFrame

Facet()

Namespace for matplotlib functions

Class variables

col: Optional[str] :

inner_rdf: [bastionlab.polars.frame](frame.md).RemoteLazyFrame :

kwargs: dict :

row: Optional[str] :

Methods

barplot(self:¬†LDF, x:¬†Optional[str]¬†=¬†None, y:¬†Optional[str]¬†=¬†None, hue:¬†Optional[str]¬†=¬†None, ax:¬†mat.axes¬†=¬†None, estimator:¬†str¬†=¬†'mean', vertical:¬†bool¬†=¬†True, title:¬†str¬†=¬†None, auto_label:¬†bool¬†=¬†True, x_label:¬†str¬†=¬†None, y_label:¬†str¬†=¬†None, colors:¬†Union[str,¬†list[str]]¬†=¬†['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], width:¬†float¬†=¬†0.75, **kwargs) ‚ÄĎ> mat.axes

Draws a bar chart for each subset in row/column facet grid.

barplot filters data down to necessary columns only and then calls Seaborn's barplot function. Args: x (str) = None: The name of column to be used for x axes. y (str) = None: The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped barplot ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied estimator (str) = "mean": string representation of estimator to be used in aggregated query. Options are: "mean", "median", "count", "max", "min", "std" and "sum" vertical (bool) = True: option for vertical (True) or horizontal barplot (False) title (str) = None: string title for plot auto_label (bool) = True: If True, labels for axes will be derived from x/y columns automatically. If false, x_label and y_label arguments used x_label (str) = None: label for x axes if auto_label set to false y_label (str) = None: label for y axes if auto_label set to false colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars **kwargs: Other keyword arguments that will be passed to Matplotlib's bar/barh() function. Raises: ValueError: Incorrect column name given, no x or y values provided, estimator function not recognized RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Seaborn when the barplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

histplot(self:¬†Facet, x:¬†str¬†=¬†'count', y:¬†str¬†=¬†'count', bins:¬†int¬†=¬†10, colors:¬†Union[str,¬†list[str]]¬†=¬†['lightblue'], **kwargs) ‚ÄĎ> mat.axes

Draws a histplot for each subset in row/column facet grid.

Facet's histplot iterates over each possible combination of row/column values in the dataset, filters the dataset to rows where the values match this combination of row/column values and applies histplot to this dataset.

Args: x (str): The name of column to be used for x axes. Default value is "count", which trigger pl.count() to be used on this axes. y (str): The name of column to be used for y axes. Default value is "count", which trigger pl.count() to be used on this axes. bins (int): An integer bin value which x axes will be grouped by. Default value is 10. colors (Union[str, list[str]]) = ["lightblue"]: colors to be used for barplot **kwargs: Other keyword arguments that will be passed to Matplotlib's bar function, in the case of one column being supplied, or imshow function, where both x and y columns are supplied.

Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib when the bar or imshow function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

lineplot(self:¬†LDF, x:¬†str, y:¬†str, **kwargs) ‚ÄĎ> None

Draws a lineplot based on x and y values for each subset in row/column facet grid. Lineplot filters data down to necessary columns only and then calls Seaborn's lineplot function on rows of dataset where values match with each combination of row/grid values.

Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. **kwargs: Other keyword arguments that will be passed to Seaborn's lineplot function. Raises: ValueError: Incorrect column name given various exceptions: Note that exceptions may be raised from Seaborn when the lineplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details.

scatterplot(self:¬†Facet, x:¬†str¬†=¬†None, y:¬†str¬†=¬†None, hue:¬†str¬†=¬†None, ax:¬†mat.axes¬†=¬†None, colors:¬†Union[str,¬†list[str]]¬†=¬†['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], **kwargs) ‚ÄĎ> None

Draws a scatter plot for each subset in row/column facet grid. Scatterplot filters data down to necessary columns only before calling Seaborn's scatterplot function on rows of dataset where values match with each combination of row/grid values.

Draws a scatter plot Scatterplot filters data down to necessary columns only and then calls Seaborn's scatterplot function. Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped scatterplots colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied **kwargs: Other keyword arguments that will be passed to Matplotlib.pyplot's scatter function. Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib.pyplot when the scatter function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

FetchableLazyFrame()

A class to represent a FetchableLazyFrame, which can then be accessed as a Polar's dataframe via the fetch() method.

Ancestors (in MRO)

Instance variables

identifier: str

Gets identifier

Return: returns identifier

Methods

delete(self) :

fetch(self) ‚ÄĎ> polars.internals.dataframe.frame.DataFrame
Fetches your FetchableLazyFrame and returns it as a Polars DataFrame Returns: Polars.DataFrame: returns a Polars DataFrame instance of your FetchableLazyFrame

save(self) :

to_array(self:¬†"'FetchableLazyFrame'") ‚ÄĎ> [bastionlab.polars.frame](frame.md).RemoteArray

Converts a FetchableLazyFrame into a RemoteArray

Returns: RemoteArray

RemoteArray()

Intermediate representation for conversion between Tensor and Dataframes.

Ancestors (in MRO)

Methods

to_tensor(self) ‚ÄĎ> 'RemoteTensor'

Converts RemoteArray to RemoteTensor

RemoteArray is BastionLab's internal intermediate representation which is akin to numpy arrays but are essentially pointers to a DataFrame on the server which when to_tensor is called converts the DataFrame to Tensor on the server.

Returns: RemoteTensor

RemoteLazyFrame()

A class to represent a RemoteLazyFrame.

Delegated attributes

dtypes : dict[str, pl.DataType] Get dtypes of columns in LazyFrame.
schema : dict[str, pl.DataType] The dataframe's schema.

Descendants

Static methods

sql(query:¬†str, *rdfs:¬†LDF) ‚ÄĎ> ~LDF
Parses given SQL query and interpolates {} placeholders with given RemoteLazyFrames. Args: query (str): the SQL query rdfs (RemoteLazyFrame): DataFrames used in the SQL query Returns: RemoteLazyFrame: The resulting RemoteLazyFrame

Instance variables

columns :

composite_plan: str
Gets composite_plan Returns: Composite_plan as str

dtypes :

schema :

Methods

apply_udf(self:¬†LDF, columns:¬†List[str], udf:¬†Callable) ‚ÄĎ> ~LDF
Applied user-defined function to selected columns of RemoteLazyFrame and returns result Args: columns (List[str]): List of columns that user-defined function should be applied to udf (Callable): user-defined function to be applied to columns, must be a compatible input for the torch.jit.script function. Returns: RemoteLazyFrame: An updated RemoteLazyFrame after udf applied
barplot(self:¬†LDF, x:¬†str¬†=¬†None, y:¬†str¬†=¬†None, hue:¬†str¬†=¬†None, ax:¬†mat.axes¬†=¬†None, estimator:¬†str¬†=¬†'mean', vertical:¬†bool¬†=¬†True, title:¬†str¬†=¬†None, auto_label:¬†bool¬†=¬†True, x_label:¬†str¬†=¬†None, y_label:¬†str¬†=¬†None, colors:¬†Union[str,¬†list[str]]¬†=¬†['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], width:¬†float¬†=¬†0.75, **kwargs) ‚ÄĎ> mat.axes
Draws a barchart barplot calculates bar's data using aggregated queries and then plots using Matplotlib's bar()/barh() function. Args: x (str) = None: The name of column to be used for x axes. y (str) = None: The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped barplot ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied estimator (str) = "mean": string representation of estimator to be used in aggregated query. Options are: "mean", "median", "count", "max", "min", "std" and "sum" vertical (bool) = True: option for vertical (True) or horizontal barplot (False) title (str) = None: string title for plot auto_label (bool) = True: If True, labels for axes will be derived from x/y columns automatically. If false, x_label and y_label arguments used x_label (str) = None: label for x axes if auto_label set to false y_label (str) = None: label for y axes if auto_label set to false colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars **kwargs: Other keyword arguments that will be passed to Matplotlib's bar/barh() function. Raises: ValueError: Incorrect column name given, no x or y values provided, estimator function not recognized RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Seaborn when the barplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.
boxplot(self: LDF, x: str = None, y: str = None, colors: Union[str, list[str]] = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], vertical: bool = True, ax: "'mat.axes'" = None, widths: float = 0.75, median_linestyle: str = '-', median_color: str = 'black', median_linewidth: float = 0.75, **kwargs)

Draws a boxplot based on x and y values.

boxplot uses aggregated queries to get data necessary to create a boxplot using matplotlib's boxplot

kwargs arguments are fowarded to matplotlib's Axes.bxp boxplot function

Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. colors (Union[str, list[str]]): The color(s) or name of builtin BastionLab color palette to be used for boxes vertical (bool): Option for vertical or horizontal orientation ax (matplotlib.axes): axes to plot on. A new axes is created if set to None. widths (float): boxes' widths median_linestyle (str): linestyle for median line median_color (str): color for median line median_linewidth (float): boxes' widths **kwargs: keyword arguments that will be passed to Matplolib's bxp function Raises: ValueError: Incorrect column name given various exceptions: Note that exceptions may be raised from Seaborn when the lineplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details.

cache(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.cache.
clone(self:¬†LDF) ‚ÄĎ> ~LDF
clones RemoteLazyFrame Returns: RemoteLazyFrame: clone of current RemoteLazyFrame
collect(self:¬†LDF) ‚ÄĎ> ~LDF
runs any pending queries/actions on RemoteLazyFrame that have not yet been performed. Returns: FetchableLazyFrame: FetchableLazyFrame of datarame after any queries have been performed
describe(self:¬†LDF) ‚ÄĎ> polars.internals.dataframe.frame.DataFrame
Provides the following summary statistics for our RemoteLazyFrame: - count - null count - mean - std - min - max - median Raises: Exception: Where necessary queries to get statistical information for the operation are rejected by the data owner Returns: A Polars DataFrame containing statistical information
drop(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.drop.
drop_nulls(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.drop_nulls.
explode(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.explode.
facet(self:¬†LDF, col:¬†Optional[str]¬†=¬†None, row:¬†Optional[str]¬†=¬†None, **kwargs) ‚ÄĎ> <built-in¬†function¬†any>

Creates a multi-plot grid for plotting conditional relationships. Args: col (Optional[str] = None): column value for grid row (Optional[str] = None): row value for grid **kwargs: Any additional keywords to be sent to Facet class to be applied to matplotlib pyplot's subplot function

Returns: Facet instance created based on arguments given

Raises: ValueError: Incorrect col/row argument provided

fill_nan(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.fill_nan.
fill_null(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.fill_null.
filter(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.filter.
first(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.first.
groupby(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.groupby.
groupby_dynamic(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.groupby_dynamic.
groupby_rolling(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.groupby_rolling.
head(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.head.
histplot(self:¬†LDF, x:¬†str¬†=¬†'count', y:¬†str¬†=¬†'count', ax:¬†mat.axes¬†=¬†None, bins:¬†int¬†=¬†10, colors:¬†Union[str,¬†list[str]]¬†=¬†['lightblue'], **kwargs) ‚ÄĎ> mat.axes

Histplot plots a univariate histogram, where one x or y axes is provided or a bivariate histogram, where both x and y axes values are supplied.

Histplot filters down a RemoteLazyFrame to necessary columns only, groups x axes into bins and performs aggregated queries before calling either matplotlib.pyplot's bar (for univaritate histograms) or imshow function (for bivariate histograms), which helps us to limit data retrieved from the server to a minimum. Args: x (str): The name of column to be used for x axes. Default value is "count", which trigger pl.count() to be used on this axes. y (str): The name of column to be used for y axes. Default value is "count", which trigger pl.count() to be used on this axes. ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied bins (int): An integer bin value which x axes will be grouped by. Default value is 10. colors (Union[str, list[str]]) = ["lightblue"]: colors to be used for barplot **kwargs: Other keyword arguments that will be passed to Matplotlib's bar function, in the case of one column being supplied, or imshow function, where both x and y columns are supplied.

Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib when the bar or imshow function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

interpolate(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.interpolate.
join(self:¬†LDF, other:¬†LDF, left_on:¬†Union[str,¬†pl.Expr,¬†Sequence[Union[str,¬†pl.Expr]],¬†None]¬†=¬†None, right_on:¬†Union[str,¬†pl.Expr,¬†Sequence[Union[str,¬†pl.Expr]],¬†None]¬†=¬†None, on:¬†Union[str,¬†pl.Expr,¬†Sequence[Union[str,¬†pl.Expr]],¬†None]¬†=¬†None, how:¬†pl.internals.type_aliases.JoinStrategy¬†=¬†'inner', suffix:¬†str¬†=¬†'_right', allow_parallel:¬†bool¬†=¬†True, force_parallel:¬†bool¬†=¬†False) ‚ÄĎ> ~LDF
Joins columns of another DataFrame. Args: other (RemoteLazyFrame): The other RemoteLazyFrame you want to join your current dataframe with. left_on (Union[str, pl.Expr, Sequence[Union[str, pl.Expr]], None] = None): Name(s) of the left join column(s). right_on (Union[str, pl.Expr, Sequence[Union[str, pl.Expr]], None] = None): Name(s) of the right join column(s). on (Union[str, pl.Expr, Sequence[Union[str, pl.Expr]], None] = None): Name(s) of the join columns in both DataFrames. how (str = 'inner'): Join strategy, which can be either 'inner', 'left', 'outer', 'semi', 'anti' or 'cross'. suffix (str = '_right'): Suffix to append to columns with a duplicate name. allow_parallel (bool = True): Boolean value for allowing the physical plan to evaluate the computation of both RemoteLazyFrames up to the join in parallel. force_parallel (bool = False): Boolean value for forcing parallel the physical plan to evaluate the computation of both RemoteLazyFrames up to the join in parallel. Raises: Exception: Where remote dataframes are from two different servers. Returns: RemoteLazyFrame: An updated RemoteLazyFrame after join performed
join_asof(self:¬†LDF, other:¬†LDF, left_on:¬†Union[str,¬†None]¬†=¬†None, right_on:¬†Union[str,¬†None]¬†=¬†None, on:¬†Union[str,¬†None]¬†=¬†None, by_left:¬†Union[str,¬†Sequence[str],¬†None]¬†=¬†None, by_right:¬†Union[str,¬†Sequence[str],¬†None]¬†=¬†None, by:¬†Union[str,¬†Sequence[str],¬†None]¬†=¬†None, strategy:¬†pl.internals.type_aliases.AsofJoinStrategy¬†=¬†'backward', suffix:¬†str¬†=¬†'_right', tolerance:¬†Union[str,¬†int,¬†float,¬†None]¬†=¬†None, allow_parallel:¬†bool¬†=¬†True, force_parallel:¬†bool¬†=¬†False) ‚ÄĎ> ~LDF

Performs an asof join, which is similar to a left-join but matches on nearest key rather than equal keys.

Args: other (RemoteLazyFrame): The other RemoteLazyFrame you want to join your current dataframe with. left_on (Union[str, None] = None): Name(s) of the left join column(s). right_on (Union[str, None] = None): Name(s) of the right join column(s). on (Union[str, None] = None): Name(s) of the join columns in both DataFrames. by_left (Union[str, Sequence[str], None] = None): Join on these columns before doing asof join by_right (Union[str, Sequence[str], None] = None): Join on these columns before doing asof join by (Union[str, Sequence[str], None] = None): Join on these columns before doing asof join strategy (str = "backward"): Join strategy: can be either 'backward' or 'forward'. suffix (str = "_right"): Suffix to append to columns with a duplicate name. tolerance (Union[str, int, float, None] = None): Numeric tolerance. By setting this the join will only be done if the near keys are within this distance. suffix (str): Suffix to append to columns with a duplicate name. allow_parallel (bool = True): Boolean value for allowing the physical plan to evaluate the computation of both RemoteLazyFrames up to the join in parallel. force_parallel (bool = False): Boolean value for forcing parallel the physical plan to evaluate the computation of both RemoteLazyFrames up to the join in parallel. Raises: Exception: Where remote dataframes are from two different servers. Returns: RemoteLazyFrame: An updated RemoteLazyFrame after join performed

last(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.last.
limit(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.limit.
lineplot(self: LDF, x: str, y: str, hue: str = None, size: str = None, style: str = None, units: str = None, **kwargs)

Draws a lineplot based on x and y values.

Lineplot filters data down to necessary columns only and then calls Seaborn's lineplot function with this scaled down dataframe.

Lineplot accepts any additional options supported by Seaborn's lineplot as kwargs, which can be viewed in Seaborn's documentation.

Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. hue (str = None): The name of the column to be used as a grouping variable that will produce lines with different colors. size (str = None): The name of the column to be used as a grouping variable that will produce lines with different widths. style (str = None): The name of the column to be used as a grouping variable that will produce lines with different dashes and/or markers. units (str = None): The name of the column to be used as a grouping variable identifying sampling units. **kwargs: Other keyword arguments that will be passed to Seaborn's lineplot function. Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Seaborn when the lineplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details.

max(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.max.
max_abs_scale(self:¬†LDF, cols:¬†Union[str,¬†List[str]]) ‚ÄĎ> ~LDF

Rescales each data point between -1 and 1 by dividing each data point by its maximum absolute value.

Args: cols (Union[str, List[str]]): The name of the column(s) which scaling should be applied to. Returns: Copy of original RemoteLazyFrame with scaling applied to specified column(s) Raises: ValueError: Column with a name provided as the cols argument not found in dataset.

mean(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.mean.
mean_scale(self:¬†LDF, cols:¬†Union[str,¬†List[str]]) ‚ÄĎ> ~LDF

Similar to the Min/Max scaling method, but subtracts the overall mean value of data instead of the min value.

Args: cols (Union[str, List[str]]): The name of the column(s) which scaling should be applied to. Returns: Copy of original RemoteLazyFrame with scaling applied to specified column(s) Raises: ValueError: Column with a name provided as the cols argument not found in dataset.

median(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.median.
median_quantile_scale(self:¬†LDF, cols:¬†Union[str,¬†List[str]]) ‚ÄĎ> ~LDF

Rescales data by subtracting the median value from data points and dividing the result by the IQR (inter-quartile range).

Args: cols (Union[str, List[str]]): The name of the column(s) which scaling should be applied to. Returns: Copy of original RemoteLazyFrame with scaling applied to specified column(s) Raises: ValueError: Column with a name provided as the cols argument not found in dataset.

melt(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.melt.
min(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.min.
minmax_scale(self:¬†LDF, cols:¬†Union[str,¬†List[str]]) ‚ÄĎ> ~LDF

Rescales data using the Min/Max or normalization method to a range of [0,1] by subtracting the overall minimum value of the data and then dividing the result by the difference between the minimum and maximum values.

Args: cols (Union[str, List[str]]): The name of the column(s) which scaling should be applied to. Returns: Copy of original RemoteLazyFrame with scaling applied to specified column(s) Raises: ValueError: Column with a name provided as the cols argument not found in dataset.

pieplot(self:¬†LDF, parts:¬†str, title:¬†str¬†=¬†None, labels:¬†Union[str,¬†list[str]]¬†=¬†None, ax:¬†List[str]¬†=¬†None, fig_kwargs:¬†dict¬†=¬†None, pie_labels:¬†bool¬†=¬†True, key:¬†bool¬†=¬†True, key_loc:¬†str¬†=¬†'center left', key_title:¬†Optional[str]¬†=¬†None, key_bbox=(1, 0, 0.5, 1)) ‚ÄĎ> None
Draws a pie chart based on values within single column. pieplot collects necessary data only and calculates percentage values before calling matplotlib pyplot's pie function to create a pie chart. Args: parts (str): The name of the column containing bar chart segment values. title (str = None): Title to be displayed with the bar chart. labels (Union[str, list[str]] = None): The labels of segments in pie charts. Either a list of string labels following the same order as the values in your parts column or the name of a column containing the labels. ax (List[str]): Here you can send your own matplotlib axis if required. Note- if you do this, the fig_kwargs arguments will not be used. fig_kwargs (dict = None): A dictionary argument where you can add any kwargs you wish to be forwarded onto matplotlib.pyplot.subplots() when creating the figure that the pie chart will be displayed on. pie_labels (bool = True): You can modify this boolean value if you do not with to label the segments of your pie chart. key (bool = True): This key value specifies whether you want a color map key placed to the side of your pie chart. key_loc (str = "center left"): A string argument where you can modify the location of your segment color key on your pie chart to be forward to matplotlib's legend function. key_title (Optional[str] = None): A string argument where you can specify a title for this segment color key to be forward to matplotlib's legend function. key_bbox (tuple = 1, 0, 0.5, 1): bbox_to_anchor argument to be forward to matplotlib's legend function. Raises: ValueError: Incorrect column name given as parts or labels argument. various exceptions: Note that exceptions may be raised from matplotlib pyplot's pie or subplots functions, for example if fig_kwargs keywords are not valid.
quantile(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.quantile.
rename(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.rename.
reverse(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.reverse.
scatterplot(self:¬†LDF, x:¬†str, y:¬†str, hue:¬†str¬†=¬†None, ax:¬†mat.axes¬†=¬†None, colors:¬†Union[str,¬†list[str]]¬†=¬†['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], **kwargs) ‚ÄĎ> mat.axes
Draws a scatter plot Scatterplot filters data down to necessary columns only and then calls Seaborn's scatterplot function. Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped scatterplots ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied colors (Union[str, list[str]]) = Palettes.dict["standard"]: color(s) for plot(s) **kwargs: Other keyword arguments that will be passed to Matplotlib.pyplot's scatter function. Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib.pyplot when the scatter function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.
select(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.select.
shift(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.shift.
shift_and_fill(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.shift_and_fill.
slice(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.slice.
sort(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.sort.
std(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.std.
sum(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.sum.
tail(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.tail.
take_every(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.take_every.
unique(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.unique.
unnest(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.unnest.
var(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.var.
vstack(self:¬†LDF, df2:¬†LDF) ‚ÄĎ> ~LDF
appends df2 to df1 provided columns have the same name/type Args: df2 (RemoteLazyFrame): The RemoteLazyFrame you wish to append to your current RemoteLazyFrame. Returns: RemoteLazyFrame: The combined RemoteLazyFrame as result of vstack
with_column(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.with_column.
with_columns(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.with_columns.
with_context(_self, *args, **kwargs)
See the polars documentation for pl.LazyFrame.with_context.
with_row_count(self:¬†LDF, name:¬†str¬†=¬†'index') ‚ÄĎ> ~LDF
adds new column with row count Args: name (String): The name of the new index column. Returns: RemoteLazyFrame: The RemoteLazyFrame with new row count/index column
zscore_scale(self:¬†LDF, cols:¬†Union[str,¬†List[str]]) ‚ÄĎ> ~LDF

Rescales data by subtracting the mean from data poiints and then dividing the result by the standard deviation of the data.

Args: cols (Union[str, List[str]]): The name of the column(s) which scaling should be applied to. Returns: Copy of original RemoteLazyFrame with scaling applied to specified column(s) Raises: ValueError: Column with a name provided as the cols argument not found in dataset.

RemoteLazyGroupBy()

Builder for a GroupBy operation on a RemoteLazyFrame.

Ancestors (in MRO)

  • typing.Generic

Methods

agg(_self, *args, **kwargs) :

head(_self, *args, **kwargs) :

tail(_self, *args, **kwargs) :