Module create_objects.py

Contains functions that load the Excel file and create the pd.DataFrame objects used in the project.

create_objects.create_average_nota_by_region(dataframe: DataFrame) DataFrame

Calculates the average evaluation scores (nota) by region and returns a DataFrame.

Parameters:

dataframe (pd.DataFrame) – The DataFrame containing evaluation scores and state information.

Returns:

A DataFrame with the average evaluation scores by region.

Return type:

pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'],
...     ' Nota Padronizada - Organização Didático-Pedagógica': [8.5, 7.2, 6.8, 8.0, 7.5, 6.2],
...     ' Nota Padronizada - Infraestrutura e Instalações Físicas': [7.8, 7.5, 6.4, 8.2, 6.9, 6.1],
...     ' Nota Padronizada - Oportunidade de Ampliação da Formação': [7.2, 7.0, 6.3, 7.9, 6.5, 5.9],
...     ' Nota Padronizada - Regime de Trabalho': [8.1, 7.8, 7.0, 8.3, 7.1, 6.4]
... })
>>> avg_nota_df = create_average_nota_by_region(sample_data)
>>> isinstance(avg_nota_df, pd.DataFrame)
True
>>> len(avg_nota_df) > 0
True
>>> ' Nota Padronizada - Organização Didático-Pedagógica' in avg_nota_df.columns
True
create_objects.create_mean_of_general_score(dataframe: DataFrame) DataFrame

Creates a DataFrame with the mean Enade score for each state, only for public universities.

Parameters:

df (pd.DataFrame) – The DataFrame containing Enade score data and university information.

Returns:

A DataFrame with the mean Enade score for each state.

Return type:

pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'],
...     'Categoria Administrativa': ['Pública Federal', 'Pública Estadual', 'Privada', 'Pública Municipal', 'Pública Federal', 'Pública Estadual'],
...     ' Conceito Enade (Contínuo)': [3.5, 4.0, 3.2, 4.5, 3.9, 4.2]
... })
>>> mean_score_df = create_mean_of_general_score(sample_data)
>>> isinstance(mean_score_df, pd.DataFrame)
True
>>> mean_score_df.to_dict()
{'Sigla da UF ': {0: 'AM', 1: 'BA', 2: 'PA', 3: 'RJ', 4: 'SP'}, ' Conceito Enade (Contínuo)': {0: 3.9, 1: 4.5, 2: 4.2, 3: 4.0, 4: 3.5}}
create_objects.create_non_attendance_df(dataframe: DataFrame) DataFrame

Creates a dataframe with the non attendance mean of each course

Parameters:

dataframe (pd.DataFrame) – Dataframe used for manipulation.

Returns:

A pandas dataframe with the data of the average non attendance rate for each course

Return type:

pd.DataFrame

Examples

>>> sample_data = pd.DataFrame({
...     'Área de Avaliação': ['Course A', 'Course B', 'Course A', 'Course C'],
...     ' Nº de Concluintes Inscritos': [100, 120, 90, 80],
...     ' Nº de Concluintes Participantes': [80, 110, 70, 70]
... })
>>> df_desistence = create_non_attendance_df(sample_data)
>>> isinstance(df_desistence, pd.DataFrame)
True
>>> 'Taxa de Desistência Média' in df_desistence.columns
True
>>> len(df_desistence) > 0
True
create_objects.create_region_column_df(dataframe: DataFrame, uf_column: str) Series

Creates a new series that maps state abbreviations to regions and adds it to the given DataFrame.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing state data.

  • uf_column (str) – The name of the column in the DataFrame that contains state abbreviations.

Returns:

A new Pandas Series with region names corresponding to each state abbreviation.

Return type:

pd.Series

Examples

>>> sample_data = pd.DataFrame({
...     'State Abbreviation': ['SP', 'RJ', 'MG'],
...     'Number': [45500, 17500, 21300]
... })
>>> region_series = create_region_column_df(sample_data, 'State Abbreviation')
>>> isinstance(region_series, pd.Series)
True
>>> len(region_series) == len(sample_data)
True
>>> 'Region' in sample_data.columns
True
>>> region_series.tolist()
['Sudeste', 'Sudeste', 'Sudeste']
create_objects.load_data_as_df(file_path: str) DataFrame

Loads the data from a .xlsx file and return a pandas dataframe.

Parameters:

file_path (str) – The path to where the file with data is located.

Returns:

A pandas dataframe with the data contained in the excel file

Return type:

pd.DataFrame

Examples

>>> sample_file_xlsx = "./data/dataframes/resultados_cpc_2021.xlsx"
>>> df_xlsx = load_data_as_df(sample_file_xlsx)
>>> isinstance(df_xlsx, pd.DataFrame)
True
>>> len(df_xlsx) > 0
True
>>> sample_file_csv = "./data/dataframes/resultados_cpc_2021.csv"
>>> df_csv = load_data_as_df(sample_file_csv)
>>> isinstance(df_csv, pd.DataFrame)
True
>>> len(df_csv) > 0
True
create_objects.load_data_as_geodf(file_path: str) GeoDataFrame

Loads the map data from a .json file and return a geopandas geodataframe.

Parameters:

file_path (str) – The path to where the file with data is located.

Returns:

A geopandas geodataframe with the map data for plotting.

Return type:

gpd.GeoDataFrame

Examples

>>> sample_geojson_file = "./data/map/brasil_estados.json"
>>> geodf = load_data_as_geodf(sample_geojson_file)
>>> isinstance(geodf, gpd.GeoDataFrame)
True
>>> len(geodf) > 0
True