Skip to content

Excel

Excel Module for autopylot. This module contains functions for working with excel and spreadsheets

Examples:

>>> excel.get_row_column_count(df=df)
    (10, 5)
>>> excel.get_single_cell(df=df,column_name="Column1", cell_number=1)
    "abc"
>>> excel.excel_create_file(output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1")

This module contains the following functions:

  • authenticate_google_spreadsheet(credential_file_path): This creates authentication object for google spreadsheet.
  • get_dataframe_from_google_spreadsheet(auth, spreadsheet_url, sheet_name): Get dataframe from google spreadsheet.
  • tabular_data_from_website(website_url, table_number): Get tabular data from website.
  • upload_dataframe_to_google_spreadsheet(auth, spreadsheet_url, sheet_name, df): Upload dataframe to google spreadsheet.
  • create_file(output_folder, output_filename, output_sheetname): Create excel file.
  • to_dataframe(input_filepath, input_sheetname, header): Convert excel file to dataframe.
  • get_row_column_count(df): Get row and column count of dataframe.
  • dataframe_to_excel(df, output_folder, output_filename, output_sheetname, mode): Convert dataframe to excel file.
  • set_single_cell(df, column_name, cell_number, value): Set single cell value in excel file.
  • get_single_cell(df, column_name, cell_number, header): Get single cell value from excel file.
  • get_all_header_columns(df): Get all header columns from excel file.
  • get_all_sheet_names(input_filepath): Get all sheet names from excel file.
  • drop_columns(df, cols): Drop columns from data frame.
  • clear_sheet(df): Clear sheet from excel file.
  • remove_duplicates(df, column_name): Remove duplicates from excel file.
  • is_NaN(value): Check if value is NaN.
  • df_from_list(list_of_lists, column_names): Create dataframe from list of lists.
  • df_from_string(df_string, word_delimiter, line_delimiter, column_names): Create dataframe from string.
  • df_extract_sub_df(df, row_start, row_end, column_start, column_end): Extract sub dataframe from dataframe.
  • set_value_in_df(df, row_number, column_number, value): Set value in dataframe.
  • get_value_in_df(df, row_number, column_number): Get value from dataframe.
  • df_drop_rows(df, row_start, row_end): Drop rows from dataframe.
  • df_vlookup(df1, df2, column_name, how): Vlookup in dataframe.
  • df_convert_column_to_type(df, column_name, column_type): Convert column to type in dataframe.
  • group_by_column_values_n_split(df, column_name, output_folder, show_output): Group by column values and split dataframe.
  • if_value_exists(df, columns, value): Check if value exists in dataframe.
  • apply_template_format(raw_data_filepath, raw_data_sheetname,template_filepath, template_sheetname, output_folder, output_filename): Apply template format to excel file.
  • concat_all_sheets(input_filepath, sheet_names): Concatenate all sheets from excel file.
  • merge_all_files(input_folder, output_folder, output_filename): Merge all excel files in a folder.

apply_template_format(raw_data_filepath, raw_data_sheetname, template_filepath, template_sheetname, output_folder=None, output_filename=None)

Applies the template format to the excel raw data

Parameters:

Name Type Description Default
raw_data_filepath str || WindowsPath

The raw data filepath.

required
raw_data_sheetname str

The raw data sheetname.

required
template_filepath str || WindowsPathp

The template filepath.

required
template_sheetname str

The template sheetname.

required
output_folder str || WindowsPath

The output folder.

None
output_filename str

The output filename.

None

Returns:

Type Description
None

None

Examples:

>>> apply_template_format(raw_data_filepath='my_autopylot
aw_data
aw_data.xlsx', input_sheetname='Sheet1', input_template_filepath='my_autopylot
aw_data emplate.xlsx', input_template_sheetname='Sheet1', output_folder='my_autopylot
aw_data', output_filename='raw_data.xlsx')
None

authenticate_google_spreadsheet(credential_file_path)

Creates authentication object for google spreadsheet.

Parameters:

Name Type Description Default
credential_file_path str || WindowsPath

Credential file path.

required

Returns:

Name Type Description
auth_object object

Authentication object.

Examples:

>>> auth = excel.authenticate_google_spreadsheet(credential_file_path="C:\Users\user\Desktop\credentials.json")

clear_sheet(df)

Clears the sheet

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.clear_sheet(df=df)
df

concat_all_sheets(input_filepath, sheet_names=None)

Concat all sheets of given excel file

Parameters:

Name Type Description Default
input_filepath str || WindowsPath

The excel file path.

required
sheet_names list

The sheet names as list.

None

Returns:

Name Type Description
data pd.DataFrame

The data.

Examples:

>>> concat_all_sheets(input_filepath='my_autopylot
aw_data
aw_data.xlsx', sheet_names=['Sheet1', 'Sheet2'])
data

create_file(output_folder, output_filename, output_sheetname='Sheet1')

Creates an excel file with a sheet in the specified folder.

Parameters:

Name Type Description Default
output_folder str || WindowsPath

Output folder path.

required
output_filename str

Output file name.

required
output_sheetname str || list

Output sheet name.

'Sheet1'

Returns:

Type Description
None

None

Examples:

>>> excel.create_file(output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1")

dataframe_to_excel(df, output_folder, output_filename, output_sheetname='Sheet1', mode='a')

Converts the dataframe to excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
output_folder str || WindowsPath

Output folder path.

required
output_filename str

Output file name.

required
output_sheetname str

Output sheet name.

'Sheet1'
mode str

Mode of the excel file.

'a'

Returns:

Type Description
None

None

Examples:

>>> excel.dataframe_to_excel(df=df, output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1", mode='a')

df_convert_column_to_type(df, column_name, column_type)

Converts the column to the given type.

Parameters:

Name Type Description Default
df pd.DataFrame

Input dataframe.

required
column_name str

Name of the column.

required
column_type str

Type of the column.

required

Returns:

Name Type Description
df pd.DataFrame

Converted dataframe.

Examples:

>>> df_convert_column_to_type(df=df, column_name='Column 1', column_type='str')
df

df_drop_rows(df, row_start, row_end)

Drops rows from dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_start int

row start (inclusive)

required
row_end int

row end (exclusive)

required

Returns:

Name Type Description
df pd.DataFrame

dataframe with rows dropped

Examples:

>>> df = excel.df_drop_rows(df=df, row_start=1, row_end=2)

df_extract_sub_df(df, row_start, row_end, column_start, column_end)

Extracts sub dataframe from the given dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_start int

row start (inclusive)

required
row_end int

row end (exclusive)

required
column_start int

column start (inclusive)

required
column_end int

column end (exclusive)

required

Returns:

Name Type Description
df pd.DataFrame

sub dataframe

Examples:

>>> excel.df_extract_sub_df(df=df,row_start= 1, row_end=2, column_start=3, column_end=4)
sub_dataframe

df_from_list(list_of_lists, column_names)

Converts list of lists to dataframe

Parameters:

Name Type Description Default
list_of_lists list

list of lists to be converted to dataframe

required
column_names list

column names

required

Returns:

Name Type Description
df pd.DataFrame

dataframe

Examples:

>>> excel.df_from_list(list_of_lists=[[1,2,3],[4,5,6]], column_names=["col1", "col2", "col3"])
dataframe
..   col1  col2  col3
0     1     2     3
1     4     5     6

df_from_string(df_string, word_delimiter=' ', line_delimiter='\n', column_names=None)

Converts string to dataframe

Parameters:

Name Type Description Default
df_string str

string to be converted to dataframe

required
word_delimiter str

word delimiter.

' '
line_delimiter str

line delimiter.

'\n'
column_names list

column names.

None

Returns:

Name Type Description
df pd.DataFrame

dataframe

Examples:

>>> print(excel.df_from_string(df_string="a b c;d e f",word_delimiter=" ",line_delimiter= ";",column_names= ["Column 1","Column 2","Column 3"]))
dataframe 
..   Column 1  Column 2  Column 3
0     a     b     c
1     d     e     f

df_vlookup(df1, df2, column_name, how='left')

Performs vlookup operation on two dataframes.

Parameters:

Name Type Description Default
df1 pd.DataFrame

First dataframe.

required
df2 pd.DataFrame

Second dataframe.

required
column_name str

Column name on which vlookup is to be performed.

required
how str

Type of vlookup. Default is 'left'.

'left'

Returns:

Name Type Description
df pd.DataFrame

Dataframe after vlookup operation.

Examples:

>>> df_vlookup(df1=df1, df2=df2, column_name='Column 1', how='left')

drop_columns(df, cols)

Drops the columns from the excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
cols str || list

Column name to be dropped.

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.drop_columns(df=df, cols="column_name")
df

get_all_header_columns(df)

Gets all header columns from the excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
data list

List of header columns

Examples:

>>> excel.get_all_header_columns(df=df)
["Column1", "Column2"]

get_all_sheet_names(input_filepath)

Gets the sheet names from the excel file

Parameters:

Name Type Description Default
input_filepath str || Windowspath

Path of the excel file.

required

Returns:

Name Type Description
data list

List of sheet names

Examples:

>>> excel.get_all_sheet_names(input_filepath="demo")
["Sheet1", "Sheet2"]

get_dataframe_from_google_spreadsheet(auth, spreadsheet_url, sheet_name='Sheet1')

Get dataframe from google spreadsheet

Parameters:

Name Type Description Default
auth object

Authentication object.

required
spreadsheet_url str

Spreadsheet URL.

required
sheet_name str

Sheet name.

'Sheet1'

Returns:

Name Type Description
df pd.DataFrame

Dataframe object.

Examples:

>>> excel.get_dataframe_from_google_spreadsheet(auth=auth,spreadsheet_url="https://docs.google.com/spreadsheets/d/1X2X3X4X5X6X7X8X9X/edit#gid=0", sheet_name="Sheet1")
df

get_row_column_count(df)

Returns the row and column count of the dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
Count tuple(int, int)

Row and column count of the dataframe.

Examples:

>>> excel.get_row_column_count(df=df)
(10, 5)

get_single_cell(df, column_name, cell_number, header=1)

Gets the text from the desired column/cell number for the given excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str

Column name of the excel file.

required
cell_number int

Cell number of the excel file.

required
header int

Header row number.

1

Returns:

Name Type Description
data str

Text from the desired column/cell number for the given excel file

Examples:

>>> excel.get_single_cell(df=df, column_name="Column 1",cell_number= 1)
"abc"

get_value_in_df(df, row_number, column_number)

Gets value from dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_number int

Row number of the cell

required
column_number int

Column number of the cell

required

Returns:

Name Type Description
value str

value in the cell

Examples:

>>> excel.get_value_in_df(df=df, row_number=1, column_number=2)
abc

group_by_column_values_n_split(df, column_name, output_folder, show_output=False)

Groups the dataframe by the given column and splits the dataframe into multiple excel files.

Parameters:

Name Type Description Default
df pd.DataFrame

Input dataframe.

required
column_name str

Name of the column.

required
output_folder str

Output folder path.

required
show_output bool

If True, shows the output. Default is False.

False

Returns:

Type Description
None

None

Examples:

>>> group_by_column_values_n_split(df=df, column_name='Column 1', output_folder='\output_folder')

if_value_exists(df, columns, value)

Checks if the given value exists in the given column.

Parameters:

Name Type Description Default
df pd.DataFrame

Input dataframe.

required
columns str

Name of the column.

required
value str

Value to be searched.

required

Returns:

Name Type Description
Result bool

True if value exists, else False.

Examples:

>>> if_value_exists(df=df, columns='Column 1', value='demo')
True

is_NaN(value)

Checks if the value is NaN (Not a Number)

Parameters:

Name Type Description Default
value str of number

value to be checked

required

Returns:

Name Type Description
Result bool

True if value is NaN, False otherwise

Examples:

>>> excel.is_NaN(value="abc")
False

merge_all_files(input_folder, output_folder, output_filename)

Merges all the excel files in the given folder.

Parameters:

Name Type Description Default
input_folder str || Windows Path

Path of the input folder.

required
output_folder str || Windows Path

Path of the output folder.

required
output_filename str

Name of the output file.

required

Returns:

Type Description
None

None

Examples:

>>> merge_all_files(input_folder='C:\Users\abc\Desktop\input_folder', output_folder='C:\Users\abc\Desktop\output_folder', output_filename='output_file')

remove_duplicates(df, column_name)

Removes the duplicates from the given column

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str || int || list

Column name of the excel file..

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.remove_duplicates(df=df, column_name="column 1")
df

set_single_cell(df, column_name, cell_number, value)

Description

Writes the given text to the desired column/cell number for the given excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str

Column name of the excel file.

required
cell_number int

Cell number of the excel file.

required
value str

Text to be written to the excel file.

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> df=excel.set_single_cell(df=df, column_name="Column 1",cell_number= 1, value="abc")
df

set_value_in_df(df, row_number, column_number, value)

Sets value in dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe to be modified

required
row_number int

Row number of the cell

required
column_number int

Column number of the cell

required
value str

value to be set in the cell

required

Returns:

Name Type Description
df pd.DataFrame

dataframe with value set

Examples:

>>> excel.set_value_in_df(df=df, row_number=1, column_number=2, value="abc")
modified_dataframe

tabular_data_from_website(website_url, table_number=1)

Returns a dataframe from a website table.

Parameters:

Name Type Description Default
website_url str

Website URL.

required
table_number int

Table number.

1

Returns:

Name Type Description
df pd.DataFrame

Dataframe containing tabular data from website.

Examples:

>>> excel.tabular_data_from_website(website_url='https://en.wikipedia.org/wiki/Wiki')

to_dataframe(input_filepath, input_sheetname, header=1)

Converts excel file to dataframe.

Parameters:

Name Type Description Default
input_filepath str || WindowsPath

Input file path.

required
input_sheetname str

Input sheet name.

required
header int

Header row number.

1

Returns:

Name Type Description
df pd.DataFrame

Dataframe of the excel file.

Examples:

>>> excel.to_dataframe(input_filepath="C:\Users\user\Desktop\test.xlsx", input_sheetname="Sheet1")
dataframe

upload_dataframe_to_google_spreadsheet(auth, spreadsheet_url, sheet_name, df)

Uploads a dataframe to a google spreadsheet.

Parameters:

Name Type Description Default
auth object

Authentication object.

required
spreadsheet_url str

Spreadsheet URL.

required
sheet_name str

Sheet name.

required
df pd.DataFrame

Dataframe object.

required

Returns:

Type Description
None

None

Examples:

>>> excel.upload_dataframe_to_google_spreadsheet(auth=auth, spreadsheet_url="https://docs.google.com/spreadsheets/d/1X2X3X4X5X6X7X8X9X/edit#gid=0", sheet_name="Sheet1", df=df)