Fixing tags on Azure invoices

Merging two CSV files using Pandas in Python 3 on Linux is a quick and easy process that can save you a lot of time and effort, however in my case I used other script to extract specific tag values from a CSV file, converts them into a JSON object, and writes them to a new CSV file, in that way I provided a workaround for those tags wanted.

The following code extracts the content from the specified column(tags) and stores it in a new DataFrame, then it writes the new DataFrame to a CSV file named tags.csv.

#!/bin/python3

import csv
import json
import os

import pandas as pd

# VARIABLES
columna = 1 # columna de tags
tags = ['Accounting', 'Contact', 'Name', 'costCenter', 'Entity']
rowcount = 0
filename="tags.csv" # name of the new csv file
path=os.getcwd()

# read the CSV file into a pandas DataFrame
df = pd.read_csv(path+"/demo.csv")

# extract the whole content from the "column_name" column
column_content = df['Tags']

df = pd.DataFrame(column_content)
df.to_csv('tags.csv')

# Open the CSV file and create a reader object
with open('tags.csv', 'r') as f:
    reader = csv.reader(f)
    
# Extract the desired column and store it in a list
    column = []
    for row in reader:
        column.append(row[columna])

#iterating through the whole file
for row in open("tags.csv"):
  rowcount+= 1 # more and less 62209 rows

with open('file1.csv', mode='w', newline='') as file:
    writer = csv.writer(file)
    # writing headers
    writer.writerow(tags)

    for i in range(1,rowcount):
        row = column[i]

        # print(row) # debug

        jaxon = "{" + row + "}"

        # print(" ") # debug

        # Parse the content string as a JSON object
        json_content = json.loads(jaxon)
        n = 0
        account = []
        for x in tags:
            try:
                value = json_content[tags[n]]
                n = n+1
            except (KeyError,IndexError,TypeError):
                value = "-"
                n = n+1
            account.append(value)
            # print(value)
            # print("")
            # print(account) 

        writer.writerow(account)

# delete .csv
os.remove(filename)

# read from file1
df = pd.read_csv('file1.csv')
df.index.name = 'index'
# write + index from file1 to file2
df.to_csv('file1.csv')

The following script reads the .csv file named demo.csv using Pandas, sets the index name of the DataFrame, and writes the DataFrame to a new .csv file named file2.csv.

#!/bin/python3

import os

import pandas as pd

# pandas parameters:
pd.options.display.max_columns = None
pd.options.display.max_rows = None
path=os.getcwd()

# read the CSV file into a pandas DataFrame
df = pd.read_csv(path+"/demo.csv")
# write + index from file1 to file2
df.index.name = 'index'
df.to_csv('file2.csv')

And finally the next script merges two CSV files named file2.csv and file1.csv, and writes the merged DataFrame to a new .csv file named merged_file.csv, also I deleted the original .csv files to avoid confusion and redundancy.

#!/bin/python3

import os

import pandas as pd

# Load the two CSV files into dataframes
df1 = pd.read_csv("file2.csv")
df2 = pd.read_csv("file1.csv")

# Merge the dataframes based on the shared column
merged_df = pd.merge(df1, df2, on="index")

# Write the merged dataframe to a new CSV file
merged_df.to_csv("merged_file.csv", index=False)

# delete .csv
os.remove("file2.csv")
os.remove("file1.csv")

You can use a main.py to call the modules:

#!/bin/python3

from modules import builder_f1, control_f2, merge

Hope the above solution it can be useful.