Merging two CSV files using Pandas in Python 3 on Linux is a quick and easy process that can save you a lot of time and effort, however in my case I used other script to extract specific tag values from a CSV file, converts them into a JSON object, and writes them to a new CSV file, in that way I provided a workaround for those tags wanted.
The following code extracts the content from the specified column(tags) and stores it in a new DataFrame, then it writes the new DataFrame to a CSV file named tags.csv.
#!/bin/python3
import csv
import json
import os
import pandas as pd
# VARIABLES
columna = 1 # columna de tags
tags = ['Accounting', 'Contact', 'Name', 'costCenter', 'Entity']
rowcount = 0
filename="tags.csv" # name of the new csv file
path=os.getcwd()
# read the CSV file into a pandas DataFrame
df = pd.read_csv(path+"/demo.csv")
# extract the whole content from the "column_name" column
column_content = df['Tags']
df = pd.DataFrame(column_content)
df.to_csv('tags.csv')
# Open the CSV file and create a reader object
with open('tags.csv', 'r') as f:
reader = csv.reader(f)
# Extract the desired column and store it in a list
column = []
for row in reader:
column.append(row[columna])
#iterating through the whole file
for row in open("tags.csv"):
rowcount+= 1 # more and less 62209 rows
with open('file1.csv', mode='w', newline='') as file:
writer = csv.writer(file)
# writing headers
writer.writerow(tags)
for i in range(1,rowcount):
row = column[i]
# print(row) # debug
jaxon = "{" + row + "}"
# print(" ") # debug
# Parse the content string as a JSON object
json_content = json.loads(jaxon)
n = 0
account = []
for x in tags:
try:
value = json_content[tags[n]]
n = n+1
except (KeyError,IndexError,TypeError):
value = "-"
n = n+1
account.append(value)
# print(value)
# print("")
# print(account)
writer.writerow(account)
# delete .csv
os.remove(filename)
# read from file1
df = pd.read_csv('file1.csv')
df.index.name = 'index'
# write + index from file1 to file2
df.to_csv('file1.csv')
The following script reads the .csv file named demo.csv using Pandas, sets the index name of the DataFrame, and writes the DataFrame to a new .csv file named file2.csv.
#!/bin/python3
import os
import pandas as pd
# pandas parameters:
pd.options.display.max_columns = None
pd.options.display.max_rows = None
path=os.getcwd()
# read the CSV file into a pandas DataFrame
df = pd.read_csv(path+"/demo.csv")
# write + index from file1 to file2
df.index.name = 'index'
df.to_csv('file2.csv')
And finally the next script merges two CSV files named file2.csv and file1.csv, and writes the merged DataFrame to a new .csv file named merged_file.csv, also I deleted the original .csv files to avoid confusion and redundancy.
#!/bin/python3
import os
import pandas as pd
# Load the two CSV files into dataframes
df1 = pd.read_csv("file2.csv")
df2 = pd.read_csv("file1.csv")
# Merge the dataframes based on the shared column
merged_df = pd.merge(df1, df2, on="index")
# Write the merged dataframe to a new CSV file
merged_df.to_csv("merged_file.csv", index=False)
# delete .csv
os.remove("file2.csv")
os.remove("file1.csv")
You can use a main.py to call the modules:
#!/bin/python3
from modules import builder_f1, control_f2, merge
Hope the above solution it can be useful.