Module bastionlab.polars

Sub-modules

Functions

train_test_split(*arrays: "List['RemoteArray']", train_size: Optional[float] = None, test_size: Optional[float] = 0.25, shuffle: Optional[bool] = False, random_state: Optional[int] = None) ‑> List[[bastionlab.polars.frame](frame.md).RemoteArray]

Split RemoteArrays into train and test subsets.

Args: train_size (Optional[float], optional): It should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If None, the value is automatically set to the complement of the test size. test_size (Optional[float], optional): It should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If None, the value is set to the complement of the train size. If train_size is also None, it will be set to 0.25. Defaults to 0.25. shuffle (Optional[bool], optional): Whether or not to shuffle the data before splitting. random_state (Optional[int], optional): Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

Classes

BastionLabPolars()

Main BastionLabPolars API class.

This class contains all the endpoints allowed on the BastionLab server for Polars. It is instantiated by the bastionlab.Client class and is accessible through the bastionlab.Client.polars property.

Methods

RemoteArray(self, identifier: Optional[str] = None, reference: Optional[bastionlab_pb2.Reference] = None) ‑> [bastionlab.polars.frame](frame.md).RemoteArray :

get_df(self, identifier: str) ‑> FetchableLazyFrame

Returns a FetchableLazyFrame from an BastionLab DataFrame identifier.

Args: identifier (str): A unique identifier for the Remote DataFrame.

Returns: FetchableLazyFrame

list_dfs(self) ‑> List[FetchableLazyFrame]

Enlists all the DataFrames available on the BastionLab server.

Returns: List[FetchableLazyFrame]

send_df(self, df: polars.internals.dataframe.frame.DataFrame, policy: [bastionlab.polars.policy](policy.md).Policy = Policy(safe_zone=Aggregation(min_agg_size=10), unsafe_handling=Review(), savable=True), sanitized_columns: List[str] = []) ‑> FetchableLazyFrame

This method is used to send pl.DataFrame to the BastionLab server.

It readily accepts pl.DataFrame and also specifies the DataFrame policy and a list of sensitive columns.

Args: df (pl.DataFrame): Polars DataFrame policy (Policy, optional): BastionLab Remote DataFrame policy. This specifies which operations can be performed on DataFrames and they specified the data owner. sanitized_columns (List[str], optional): This field contains (sensitive) columns in the DataFrame that are to be removed when a Data Scientist wishes to fetch a query performed on the DataFrame.

Returns: FetchableLazyFrame

Facet()

Namespace for matplotlib functions

Class variables

col: Optional[str] :

inner_rdf: [bastionlab.polars.frame](frame.md).RemoteLazyFrame :

kwargs: dict :

row: Optional[str] :

Methods

barplot(self: LDF, x: Optional[str] = None, y: Optional[str] = None, hue: Optional[str] = None, ax: mat.axes = None, estimator: str = 'mean', vertical: bool = True, title: str = None, auto_label: bool = True, x_label: str = None, y_label: str = None, colors: Union[str, list[str]] = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], width: float = 0.75, **kwargs) ‑> mat.axes

Draws a bar chart for each subset in row/column facet grid.

barplot filters data down to necessary columns only and then calls Seaborn's barplot function. Args: x (str) = None: The name of column to be used for x axes. y (str) = None: The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped barplot ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied estimator (str) = "mean": string representation of estimator to be used in aggregated query. Options are: "mean", "median", "count", "max", "min", "std" and "sum" vertical (bool) = True: option for vertical (True) or horizontal barplot (False) title (str) = None: string title for plot auto_label (bool) = True: If True, labels for axes will be derived from x/y columns automatically. If false, x_label and y_label arguments used x_label (str) = None: label for x axes if auto_label set to false y_label (str) = None: label for y axes if auto_label set to false colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars **kwargs: Other keyword arguments that will be passed to Matplotlib's bar/barh() function. Raises: ValueError: Incorrect column name given, no x or y values provided, estimator function not recognized RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Seaborn when the barplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

histplot(self: Facet, x: str = 'count', y: str = 'count', bins: int = 10, colors: Union[str, list[str]] = ['lightblue'], **kwargs) ‑> mat.axes

Draws a histplot for each subset in row/column facet grid.

Facet's histplot iterates over each possible combination of row/column values in the dataset, filters the dataset to rows where the values match this combination of row/column values and applies histplot to this dataset.

Args: x (str): The name of column to be used for x axes. Default value is "count", which trigger pl.count() to be used on this axes. y (str): The name of column to be used for y axes. Default value is "count", which trigger pl.count() to be used on this axes. bins (int): An integer bin value which x axes will be grouped by. Default value is 10. colors (Union[str, list[str]]) = ["lightblue"]: colors to be used for barplot **kwargs: Other keyword arguments that will be passed to Matplotlib's bar function, in the case of one column being supplied, or imshow function, where both x and y columns are supplied.

Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib when the bar or imshow function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

lineplot(self: LDF, x: str, y: str, **kwargs) ‑> None

Draws a lineplot based on x and y values for each subset in row/column facet grid. Lineplot filters data down to necessary columns only and then calls Seaborn's lineplot function on rows of dataset where values match with each combination of row/grid values.

Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. **kwargs: Other keyword arguments that will be passed to Seaborn's lineplot function. Raises: ValueError: Incorrect column name given various exceptions: Note that exceptions may be raised from Seaborn when the lineplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details.

scatterplot(self: Facet, x: str = None, y: str = None, hue: str = None, ax: mat.axes = None, colors: Union[str, list[str]] = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], **kwargs) ‑> None

Draws a scatter plot for each subset in row/column facet grid. Scatterplot filters data down to necessary columns only before calling Seaborn's scatterplot function on rows of dataset where values match with each combination of row/grid values.

Draws a scatter plot Scatterplot filters data down to necessary columns only and then calls Seaborn's scatterplot function. Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped scatterplots colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied **kwargs: Other keyword arguments that will be passed to Matplotlib.pyplot's scatter function. Raises: ValueError: Incorrect column name given RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Matplotlib.pyplot when the scatter function is called. See Matplotlib's documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

FetchableLazyFrame()

A class to represent a FetchableLazyFrame, which can then be accessed as a Polar's dataframe via the fetch() method.

Ancestors (in MRO)

bastionlab.polars.frame.RemoteLazyFrame

Instance variables

identifier: str

Gets identifier

Return: returns identifier

Methods

delete(self) :

fetch(self) ‑> polars.internals.dataframe.frame.DataFrame: Fetches your FetchableLazyFrame and returns it as a Polars DataFrame Returns: Polars.DataFrame: returns a Polars DataFrame instance of your FetchableLazyFrame

save(self) :

to_array(self: "'FetchableLazyFrame'") ‑> [bastionlab.polars.frame](frame.md).RemoteArray

Converts a FetchableLazyFrame into a RemoteArray

Returns: RemoteArray

RemoteArray()

Intermediate representation for conversion between Tensor and Dataframes.

Ancestors (in MRO)

bastionlab.polars.frame.RemoteLazyFrame

Methods

to_tensor(self) ‑> 'RemoteTensor'

Converts RemoteArray to RemoteTensor

RemoteArray is BastionLab's internal intermediate representation which is akin to numpy arrays but are essentially pointers to a DataFrame on the server which when to_tensor is called converts the DataFrame to Tensor on the server.

Returns: RemoteTensor

RemoteLazyFrame()

A class to represent a RemoteLazyFrame.

Delegated attributes

dtypes : dict[str, pl.DataType] Get dtypes of columns in LazyFrame.
schema : dict[str, pl.DataType] The dataframe's schema.

Descendants

bastionlab.polars.frame.FetchableLazyFrame
bastionlab.polars.frame.RemoteArray

Static methods

sql(query: str, *rdfs: LDF) ‑> ~LDF: Parses given SQL query and interpolates {} placeholders with given RemoteLazyFrames. Args: query (str): the SQL query rdfs (RemoteLazyFrame): DataFrames used in the SQL query Returns: RemoteLazyFrame: The resulting RemoteLazyFrame

Instance variables

columns :

composite_plan: str: Gets composite_plan Returns: Composite_plan as str

dtypes :

schema :

Methods

apply_udf(self: LDF, columns: List[str], udf: Callable) ‑> ~LDF

Applied user-defined function to selected columns of RemoteLazyFrame and returns result Args: columns (List[str]): List of columns that user-defined function should be applied to udf (Callable): user-defined function to be applied to columns, must be a compatible input for the torch.jit.script function. Returns: RemoteLazyFrame: An updated RemoteLazyFrame after udf applied

barplot(self: LDF, x: str = None, y: str = None, hue: str = None, ax: mat.axes = None, estimator: str = 'mean', vertical: bool = True, title: str = None, auto_label: bool = True, x_label: str = None, y_label: str = None, colors: Union[str, list[str]] = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], width: float = 0.75, **kwargs) ‑> mat.axes

Draws a barchart barplot calculates bar's data using aggregated queries and then plots using Matplotlib's bar()/barh() function. Args: x (str) = None: The name of column to be used for x axes. y (str) = None: The name of column to be used for y axes. hue (str) = None: The name of column to be used for grouped barplot ax (matplotlib.axes) = None: matplotlib axes to be used for plot- a new axes is generated if not supplied estimator (str) = "mean": string representation of estimator to be used in aggregated query. Options are: "mean", "median", "count", "max", "min", "std" and "sum" vertical (bool) = True: option for vertical (True) or horizontal barplot (False) title (str) = None: string title for plot auto_label (bool) = True: If True, labels for axes will be derived from x/y columns automatically. If false, x_label and y_label arguments used x_label (str) = None: label for x axes if auto_label set to false y_label (str) = None: label for y axes if auto_label set to false colors (Union[str, list[str]]) = Palettes.dict["standard"]: colors for bars **kwargs: Other keyword arguments that will be passed to Matplotlib's bar/barh() function. Raises: ValueError: Incorrect column name given, no x or y values provided, estimator function not recognized RequestRejected: Could not continue in function as data owner rejected a required access request various exceptions: Note that exceptions may be raised from Seaborn when the barplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details. Returns: Returns the Matplotlib Axes object with the plot drawn onto it.

boxplot(self: LDF, x: str = None, y: str = None, colors: Union[str, list[str]] = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'], vertical: bool = True, ax: "'mat.axes'" = None, widths: float = 0.75, median_linestyle: str = '-', median_color: str = 'black', median_linewidth: float = 0.75, **kwargs)

Draws a boxplot based on x and y values.

boxplot uses aggregated queries to get data necessary to create a boxplot using matplotlib's boxplot

kwargs arguments are fowarded to matplotlib's Axes.bxp boxplot function

Args: x (str): The name of column to be used for x axes. y (str): The name of column to be used for y axes. colors (Union[str, list[str]]): The color(s) or name of builtin BastionLab color palette to be used for boxes vertical (bool): Option for vertical or horizontal orientation ax (matplotlib.axes): axes to plot on. A new axes is created if set to None. widths (float): boxes' widths median_linestyle (str): linestyle for median line median_color (str): color for median line median_linewidth (float): boxes' widths **kwargs: keyword arguments that will be passed to Matplolib's bxp function Raises: ValueError: Incorrect column name given various exceptions: Note that exceptions may be raised from Seaborn when the lineplot function is called, for example, where kwargs keywords are not expected. See Seaborn documentation for further details.

cache(_self, *args, **kwargs)