Module create_objects.py¶

Contains functions that load the Excel file and create the pd.DataFrame objects used in the project.

create_objects.create_average_nota_by_region(dataframe: DataFrame) → DataFrame¶

Calculates the average evaluation scores (nota) by region and returns a DataFrame.

Parameters:: dataframe (pd.DataFrame) – The DataFrame containing evaluation scores and state information.
Returns:: A DataFrame with the average evaluation scores by region.
Return type:: pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'],
...     ' Nota Padronizada - Organização Didático-Pedagógica': [8.5, 7.2, 6.8, 8.0, 7.5, 6.2],
...     ' Nota Padronizada - Infraestrutura e Instalações Físicas': [7.8, 7.5, 6.4, 8.2, 6.9, 6.1],
...     ' Nota Padronizada - Oportunidade de Ampliação da Formação': [7.2, 7.0, 6.3, 7.9, 6.5, 5.9],
...     ' Nota Padronizada - Regime de Trabalho': [8.1, 7.8, 7.0, 8.3, 7.1, 6.4]
... })
>>> avg_nota_df = create_average_nota_by_region(sample_data)
>>> isinstance(avg_nota_df, pd.DataFrame)
True
>>> len(avg_nota_df) > 0
True
>>> ' Nota Padronizada - Organização Didático-Pedagógica' in avg_nota_df.columns
True

create_objects.create_mean_of_general_score(dataframe: DataFrame) → DataFrame¶

Creates a DataFrame with the mean Enade score for each state, only for public universities.

Parameters:: df (pd.DataFrame) – The DataFrame containing Enade score data and university information.
Returns:: A DataFrame with the mean Enade score for each state.
Return type:: pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'],
...     'Categoria Administrativa': ['Pública Federal', 'Pública Estadual', 'Privada', 'Pública Municipal', 'Pública Federal', 'Pública Estadual'],
...     ' Conceito Enade (Contínuo)': [3.5, 4.0, 3.2, 4.5, 3.9, 4.2]
... })
>>> mean_score_df = create_mean_of_general_score(sample_data)
>>> isinstance(mean_score_df, pd.DataFrame)
True
>>> mean_score_df.to_dict()
{'Sigla da UF ': {0: 'AM', 1: 'BA', 2: 'PA', 3: 'RJ', 4: 'SP'}, ' Conceito Enade (Contínuo)': {0: 3.9, 1: 4.5, 2: 4.2, 3: 4.0, 4: 3.5}}

create_objects.create_non_attendance_df(dataframe: DataFrame) → DataFrame¶

Creates a dataframe with the non attendance mean of each course

Parameters:: dataframe (pd.DataFrame) – Dataframe used for manipulation.
Returns:: A pandas dataframe with the data of the average non attendance rate for each course
Return type:: pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Área de Avaliação': ['Course A', 'Course B', 'Course A', 'Course C'],
...     ' Nº de Concluintes Inscritos': [100, 120, 90, 80],
...     ' Nº de Concluintes Participantes': [80, 110, 70, 70]
... })
>>> df_desistence = create_non_attendance_df(sample_data)
>>> isinstance(df_desistence, pd.DataFrame)
True
>>> 'Taxa de Desistência Média' in df_desistence.columns
True
>>> len(df_desistence) > 0
True

create_objects.create_region_column_df(dataframe: DataFrame, uf_column: str) → Series¶

Creates a new series that maps state abbreviations to regions and adds it to the given DataFrame.

Parameters:

df (pd.DataFrame) – The DataFrame containing state data.
uf_column (str) – The name of the column in the DataFrame that contains state abbreviations.

Returns:

A new Pandas Series with region names corresponding to each state abbreviation.

Return type:

pd.Series

Examples

>>> sample_data = pd.DataFrame({
...     'State Abbreviation': ['SP', 'RJ', 'MG'],
...     'Number': [45500, 17500, 21300]
... })
>>> region_series = create_region_column_df(sample_data, 'State Abbreviation')
>>> isinstance(region_series, pd.Series)
True
>>> len(region_series) == len(sample_data)
True
>>> 'Region' in sample_data.columns
True
>>> region_series.tolist()
['Sudeste', 'Sudeste', 'Sudeste']

create_objects.load_data_as_df(file_path: str) → DataFrame¶

Loads the data from a .xlsx file and return a pandas dataframe.

Parameters:: file_path (str) – The path to where the file with data is located.
Returns:: A pandas dataframe with the data contained in the excel file
Return type:: pd.DataFrame

Examples

>>> sample_file_xlsx = "./data/dataframes/resultados_cpc_2021.xlsx"
>>> df_xlsx = load_data_as_df(sample_file_xlsx)
>>> isinstance(df_xlsx, pd.DataFrame)
True
>>> len(df_xlsx) > 0
True

>>> sample_file_csv = "./data/dataframes/resultados_cpc_2021.csv"
>>> df_csv = load_data_as_df(sample_file_csv)
>>> isinstance(df_csv, pd.DataFrame)
True
>>> len(df_csv) > 0
True

create_objects.load_data_as_geodf(file_path: str) → GeoDataFrame¶

Loads the map data from a .json file and return a geopandas geodataframe.

Parameters:: file_path (str) – The path to where the file with data is located.
Returns:: A geopandas geodataframe with the map data for plotting.
Return type:: gpd.GeoDataFrame

Examples

>>> sample_geojson_file = "./data/map/brasil_estados.json"
>>> geodf = load_data_as_geodf(sample_geojson_file)
>>> isinstance(geodf, gpd.GeoDataFrame)
True
>>> len(geodf) > 0
True

Module create_objects.py¶

Table of Contents

This Page