Blog

How to Document Metadata: Google Storage Handling with Git and Jenkins

Category
Software development
How to Document Metadata: Google Storage Handling with Git and Jenkins

Documentation versioning and metadata connected to it can change daily in most businesses.

For example, we can have functional documentation of applications that have to be changed, or even a big data project where you have to change a large amount of metadata, and all that across multiple storages.

document metadata Google storage handling with Git and Jenkins

Modern businesses will handle and store their documentation on some of the well-known cloud storage. Adding, updating or removing such documents and changing their metadata can be tedious and error-prone work when done manually.

Let’s imagine a project where you have to change data versioning, data security parameters, or even just a data change date or any other parts of the documentation. Moreover, you have to make sure change is made across different storages.  

In this blog, I provide a solution on how to effectively automate document metadata handling and the whole process around it using Google Storage, some shell scripts, Git, and Jenkins CI tool.

Tooling introduction for the metadata document

Google storage (GS) is a service for storing any file in Google Cloud. That can be an image, document, zip file, or any other. According to the hierarchy of GS, files are stored inside buckets, buckets are combined with a project, and the project is grouped within an organization.

Besides storing files with the basic metadata (identified properties for objects – date and time, type of file, permissions, etc.), GS also offers to create custom metadata which will be crucial for this demonstration.

Jenkins is an open-source automation server that is used for building, testing, and deploying parts of the software program. What I like most about it is that it’s quite user-friendly. Also, there are a lot of manuals and examples on the internet so it’s not hard to figure out how it works.

Everyone has heard of Git, so I don’t have anything special to say about it that you don’t already know. For this purpose, just the basic features of Git will be used.

How to effectively automate document metadata handling?

The goal is to create an application that can read and download documents. 

During application design, we distinguished two types of people: developers and testers. Therefore, we will distinguish the environments in which they work. 

For developers, we will use the DEVenv. For testers, we will use QAenv

Accordingly, we create two GS buckets. One will be for DEVenv, the developers bucket. And another for QAenv, the QAs bucket. Both buckets contain identical files with the same metadata. 

Each file and its corresponding metadata are added manually to the GS bucket. In any case, if there is a need for any change, we have to do that manual and boring routine of:

  • search for that file on the GS and then make a change in three mouse clicks – open the object overflow menu (),
  • choose edit metadata,
  • add new metadata key-value pairs,
  • and press save.

Moreover, we can have hundreds of such documents that we constantly delete and put manually, first on one and then on the other GS bucket. Doing this as a manual job with many files is very prone to errors. In other words – very hard and tedious work.

Building an automated process of adding files on Google Storage

First we create a bucket inside the GS for DEVenv and call it dev-storage. Then we do the same for the QAenv and call it qa-storage

Next step is to  make a directory within the Git repository named google-storage-handler.

Now, let’s have two new directories within the project directory – backup and files. The backup directory is used to store copies of documents with correspondent metadata from the GS bucket. The files directory is used to store documents that will be uploaded in GS and that are the current active version of documents stored on GS.

Within the project directory we create metadata.json that stores metadata for documents. This metadata is actually what we want to have uploaded on GS.

We also have three shell scripts which will run in subsequent order: download.sh, delete.sh, upload.sh. We will continue explaining those in the next section.

JSON file metadata.json have stored metadata values like:

{
	filename_key: {
	category: “ ”,
	title: “ ”,
	lang: “ ”
}
}Code language: CSS (css)

Reading JSON files inside shell script is possible by installing jq:

  • for installation using brew  – brew install jq
  • for Linux (using apt-get) users – apt-get install jq

Using jq we can simply process JSON. For example in our scripts we will use it in combination with sed to replace metadata in JSON files. More on these tools can be found on these links: jq manual and sed

The scripts

We use download.sh script to make a new directory inside backup directory named with current date and time – this is our copy of files from GS at the present time

mkdir -p -- "$(date +%Y-%m-%d\ %H:%M:%S)"
Code language: JavaScript (javascript)

Next, sort all directories inside backup directory to get the newest one, and save all files from GS inside new added directory with command

gsutil -m cp -R "gs://$1/" "$fullpathtonewdirectory"Code language: JavaScript (javascript)

And at the end of script, we go through new added directory and for each new added document from GS (doc) save correspondent metadata with command

gsutil ls -L "gs://$1/${doc}" >> "${filename}.metadata"Code language: JavaScript (javascript)

In delete.sh new files are looped through and corresponding items on GS bucket that share the name of a file are deleted

gsutil rm "gs://$1/${filename}"Code language: JavaScript (javascript)

Last script, upload.sh, checks files directory and adds all documents in array, then loops through array and adds each file on GS bucket with command

gsutil cp "files/${filename}" "gs://$1/${filename}"Code language: JavaScript (javascript)

For each document added on GS, it will add three metadata items – category, title, lang. Variable filename_key contains the name of a file that is represented inside metadata.json as a primary key. Subkeys containing final metadata values are category, title and lang. Commands for adding metadata are

gsutil setmeta -h "x-goog-meta-Category:$((jq ".${filename_key} | .category" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Title:$((jq ".${filename_key} | .title" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"
gsutil setmeta -h "x-goog-meta-Lang:$((jq ".${filename_key} | .lang" metadata.json) | sed -e 's/^"//' -e 's/"$//')" "gs://$1/${filename}"Code language: JavaScript (javascript)

In the example above, we use an input variable indicating for which bucket the script will run – dev-storage or qa-storage.

Running with Jenkins CI

Finally,  let’s use Jenkins to run all these scripts automatically. We are just a few steps from getting there.

The first thing to do is add a new Jenkinsfile within the project directory for Jenkins and configure Jenkins deployment.

Setting up deployment pipeline with parameters

stage('Setup deployment') {
 steps {
   script {
     echo "Start editing files for google storage handler"
     targetEnv="${deployEnvironment}"
     if (targetEnv == 'dev') {
       credsId = "credential for google storage bucket on dev-storage"
       bucketName = "dev-storage"
     } else {
       credsId = "credential for google storage bucket on qa-storage"
       bucketName = "qa-storage"
     }
   }
 }
}Code language: PHP (php)

Running scripts in order (download.sh, delete.sh, upload.sh) – stage is the same for the remaining two scripts only by modifying the download.sh to delete.sh and to upload.sh

stage('Backup google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
       sh download.sh ${bucketName}
    """
   }
 }
}
stage('Clear google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
       sh delete.sh ${bucketName}
    """
   }
 }
}
stage('Transfer files directory to google storage bucket') {
 steps {
   withCredentials([file(credentialsId: credsId, variable: 'keyjson')]) {
     sh "gcloud auth activate-service-account --key-file=${keyjson}"
     sh """
         sh upload.sh ${bucketName}
      """
   }
 }
}Code language: PHP (php)

And also, for our new Jenkins pipeline, we can use a dropdown menu for choosing parameters like this:

Google storage handling with Git and Jenkins 02

Regarding pipelines, usually, they are built by developers because project managers do not have access to such a tool, nor do they know how to use it. So, when new documents are added to the project, developers, by order of the project manager, build storage, and then documents are added to the application as well.

Conclusion

To sum up, we can conclude that nowadays, with so many automation tools, like the ones described in this blog or any others, a lot can be done to save time and some nerves by not having to manually do the work that is very prone to human error.

I hope this blog has found you well, that it was helpful, and that something new has been learned!

CONTACT US

Exceptional ideas need experienced partners.