Module create_objects.py¶
Contains functions that load the Excel file and create the pd.DataFrame objects used in the project.
- create_objects.create_average_nota_by_region(dataframe: DataFrame) DataFrame ¶
Calculates the average evaluation scores (nota) by region and returns a DataFrame.
- Parameters:
dataframe (pd.DataFrame) – The DataFrame containing evaluation scores and state information.
- Returns:
A DataFrame with the average evaluation scores by region.
- Return type:
pd.DataFrame
Examples
>>> sample_data = pd.DataFrame({ ... 'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'], ... ' Nota Padronizada - Organização Didático-Pedagógica': [8.5, 7.2, 6.8, 8.0, 7.5, 6.2], ... ' Nota Padronizada - Infraestrutura e Instalações Físicas': [7.8, 7.5, 6.4, 8.2, 6.9, 6.1], ... ' Nota Padronizada - Oportunidade de Ampliação da Formação': [7.2, 7.0, 6.3, 7.9, 6.5, 5.9], ... ' Nota Padronizada - Regime de Trabalho': [8.1, 7.8, 7.0, 8.3, 7.1, 6.4] ... }) >>> avg_nota_df = create_average_nota_by_region(sample_data) >>> isinstance(avg_nota_df, pd.DataFrame) True >>> len(avg_nota_df) > 0 True >>> ' Nota Padronizada - Organização Didático-Pedagógica' in avg_nota_df.columns True
- create_objects.create_mean_of_general_score(dataframe: DataFrame) DataFrame ¶
Creates a DataFrame with the mean Enade score for each state, only for public universities.
- Parameters:
df (pd.DataFrame) – The DataFrame containing Enade score data and university information.
- Returns:
A DataFrame with the mean Enade score for each state.
- Return type:
pd.DataFrame
Examples
>>> sample_data = pd.DataFrame({ ... 'Sigla da UF ': ['SP', 'RJ', 'MG', 'BA', 'AM', 'PA'], ... 'Categoria Administrativa': ['Pública Federal', 'Pública Estadual', 'Privada', 'Pública Municipal', 'Pública Federal', 'Pública Estadual'], ... ' Conceito Enade (Contínuo)': [3.5, 4.0, 3.2, 4.5, 3.9, 4.2] ... }) >>> mean_score_df = create_mean_of_general_score(sample_data) >>> isinstance(mean_score_df, pd.DataFrame) True >>> mean_score_df.to_dict() {'Sigla da UF ': {0: 'AM', 1: 'BA', 2: 'PA', 3: 'RJ', 4: 'SP'}, ' Conceito Enade (Contínuo)': {0: 3.9, 1: 4.5, 2: 4.2, 3: 4.0, 4: 3.5}}
- create_objects.create_non_attendance_df(dataframe: DataFrame) DataFrame ¶
Creates a dataframe with the non attendance mean of each course
- Parameters:
dataframe (pd.DataFrame) – Dataframe used for manipulation.
- Returns:
A pandas dataframe with the data of the average non attendance rate for each course
- Return type:
pd.DataFrame
Examples
>>> sample_data = pd.DataFrame({ ... 'Área de Avaliação': ['Course A', 'Course B', 'Course A', 'Course C'], ... ' Nº de Concluintes Inscritos': [100, 120, 90, 80], ... ' Nº de Concluintes Participantes': [80, 110, 70, 70] ... }) >>> df_desistence = create_non_attendance_df(sample_data) >>> isinstance(df_desistence, pd.DataFrame) True >>> 'Taxa de Desistência Média' in df_desistence.columns True >>> len(df_desistence) > 0 True
- create_objects.create_region_column_df(dataframe: DataFrame, uf_column: str) Series ¶
Creates a new series that maps state abbreviations to regions and adds it to the given DataFrame.
- Parameters:
df (pd.DataFrame) – The DataFrame containing state data.
uf_column (str) – The name of the column in the DataFrame that contains state abbreviations.
- Returns:
A new Pandas Series with region names corresponding to each state abbreviation.
- Return type:
pd.Series
Examples
>>> sample_data = pd.DataFrame({ ... 'State Abbreviation': ['SP', 'RJ', 'MG'], ... 'Number': [45500, 17500, 21300] ... }) >>> region_series = create_region_column_df(sample_data, 'State Abbreviation') >>> isinstance(region_series, pd.Series) True >>> len(region_series) == len(sample_data) True >>> 'Region' in sample_data.columns True >>> region_series.tolist() ['Sudeste', 'Sudeste', 'Sudeste']
- create_objects.load_data_as_df(file_path: str) DataFrame ¶
Loads the data from a .xlsx file and return a pandas dataframe.
- Parameters:
file_path (str) – The path to where the file with data is located.
- Returns:
A pandas dataframe with the data contained in the excel file
- Return type:
pd.DataFrame
Examples
>>> sample_file_xlsx = "./data/dataframes/resultados_cpc_2021.xlsx" >>> df_xlsx = load_data_as_df(sample_file_xlsx) >>> isinstance(df_xlsx, pd.DataFrame) True >>> len(df_xlsx) > 0 True
>>> sample_file_csv = "./data/dataframes/resultados_cpc_2021.csv" >>> df_csv = load_data_as_df(sample_file_csv) >>> isinstance(df_csv, pd.DataFrame) True >>> len(df_csv) > 0 True
- create_objects.load_data_as_geodf(file_path: str) GeoDataFrame ¶
Loads the map data from a .json file and return a geopandas geodataframe.
- Parameters:
file_path (str) – The path to where the file with data is located.
- Returns:
A geopandas geodataframe with the map data for plotting.
- Return type:
gpd.GeoDataFrame
Examples
>>> sample_geojson_file = "./data/map/brasil_estados.json" >>> geodf = load_data_as_geodf(sample_geojson_file) >>> isinstance(geodf, gpd.GeoDataFrame) True >>> len(geodf) > 0 True