Project Based Data Processing
  • 30 Apr 2024
  • 15 Minutes to read
  • Dark
    Light

Project Based Data Processing

  • Dark
    Light

Article Summary

Using one Tegsoft for multiple projects or brands (multi-tenant usage) is very common. So you may sometimes need to move data between databases; list, delete, or anonymize CDR data.

This article is going to describe project-based data operations and the content will be technical. For project definitions and management, you can check the "Project Management" article. 

Workspace and Environment

All the data management processes are handled by the “Tegsoft Touch Data Processing” service. This component can be executed as a docker container or can be executed from a command line interface. Both usages require the preparation of a config file.

Data Processing Config File

The config file is in JSON format and has various different sections. The file can be accessed via an http URL or a local file.

A sample config file is shown below,

{
   "activePeriods":[
      {
         "timeBegin":0,
         "timeEnd":2359
      }
   ],
   "projectRules":[
      {
         "name":"Project Summary",
         "description":"Display the project summary",
         "source":{
            "dbUser":"tobe",
            "dbPassword":"ab2037ef5bb349a1a46116581cb8fec5",
            "dbDriver":"com.ibm.db2.jcc.DB2Driver",
            "dbUrl":"jdbc:db2://TEGSOFT_DATABASE_SERVER_IP:50000/tobe",
            "PBXID":"ebd37af5-260a-436f-9bdf-44bcc9b1b946",
            "UNITUID":"4a55c1e3-edd5-46ef-b66f-d74634e8469a"
         },
         "projectId":"a0696b51-2ea1-4566-be89-58df485cab2d",
         "projectRuleType":"projectSummary",
         "beginDate":"20191027",
         "beginTime":"00:00:00",
         "endDate":"20251128",
         "endTime":"00:00:00"
      },
      {
         "name":"List project CDR",
         "description":"List project CDR",
         "source":{
            "dbUser":"tobe",
            "dbPassword":"ab2037ef5bb349a1a46116581cb8fec5",
            "dbDriver":"com.ibm.db2.jcc.DB2Driver",
            "dbUrl":"jdbc:db2://arge29.tegsoftcloud.com:50000/tobe",
            "PBXID":"ebd37af5-260a-436f-9bdf-44bcc9b1b946",
            "UNITUID":"4a55c1e3-edd5-46ef-b66f-d74634e8469a"
         },
         "projectId":"a0696b51-2ea1-4566-be89-58df485cab2d",
         "projectRuleType":"listCDR",
         "beginDate":"20191027",
         "beginTime":"00:00:00",
         "endDate":"20251128",
         "endTime":"00:00:00",
         "initialFetchSize":300,
         "periodDurationDays":3
      }
   ]
}

Execution Periods

Execution of the process can be limited to a specific period or periods, this is handled via “activePeriods” definition. The definition is an array and contains allowed periods. Each period is defined via blocks marked with beginning and ending marks. If a period has multiple markers like the “time between” and “date between” both period conditions need to match.

If any period matches whether with a single condition or multiple conditions, execution can start if none matches then the process will not execute. If there is no period defined then the process will execute.

The “activePeriods” element can be used globally in the config file or under each rule individually. If global conditions don’t match execution will not do anything, if conditions under the rule don’t match the related rule will not execute.

Syntax,

"activePeriods":[
      { // Period 1
         Condition 1, Condition 2, ... Condition N
      },
      {
         // Period 2
      },
....
      {
         // Period n
      }
]

Both begin and end parameters are included.

timeBegin: That field value represents the beginning time of the period in 24-hour format. Value needs to be in a decimal form. Examples,

11pm or 23:00 → 2300

8:30am or 08:30 → 830

One past midnight 00:01 → 1

timeEnd: That field value represents the ending time of the period in 24-hour format. Value needs to be in a decimal form. Examples,

11pm or 23:00 → 2300

8:30am or 08:30 → 830

One past midnight 00:01 → 1

time: That field value represents the exact time of the period in 24-hour format. Value needs to be in a decimal form. Examples,

11pm or 23:00 → 2300

8:30am or 08:30 → 830

One past midnight 00:01 → 1

The time period is compared with the current time,

If you have timeBegin: 1 and timeEnd: 2359

That 22:10 will match or 08:30 will match but 00:00 will not match.

dayOfMonthBegin: This is the starting point of the “day of the month” condition. This field takes integer values between 1 - 31

dayOfMonthEnd: This is the ending point of the “day of the month” condition. This field takes integer values between 1 - 31

dayOfMonth: This is the exact day of the month. This field takes integer values between 1 - 31

dayOfWeekBegin: This is the starting point of the “day of the week” condition. This field takes integer values between 1 - 7. The week starts on Monday and ends on Sunday. Monday: 1 and Sunday: 7.

dayOfWeekEnd: This is the ending point of the “day of the week” condition. This field takes integer values between 1 - 7. The week starts on Monday and ends on Sunday. Monday: 1 and Sunday: 7.

dayOfWeek: This is the exact day of the week condition. This field takes integer values between 1 - 7. The week starts on Monday and ends on Sunday. Monday: 1 and Sunday: 7.

dateBegin: This is the starting point of the date condition. This field takes string value in YYYYMMDD format like 20230727 is 27th of July 2023

dateEnd: This is the ending point of the date condition. This field takes string value in YYYYMMDD format like 20230727 is 27th of July 2023

date: This is the exact date condition. This field takes string value in YYYYMMDD format like 20230727 is 27th of July 2023

Project Rules

This section is for defining a set of actions to execute related to the projects. This article mainly covers topics related to this section.

name: It is always good to have a rule name so that the process can be tracked, managed, and monitored. The name needs to be unique against all rules in the file.

description: An optional description area for taking notes and sharing more information about the rule.

source: Defines source database connection parameters including dbUser, dbPassword (encrypted), dbDriver, dbUrl, PBXID and UNITUID.

projectId: Unique ID of the project that process will access data

projectRuleType: That field defines the process that will be executed. The field value can take one of these parameters projectSummary, listCDR, listAudioFiles, copyAudioFiles, moveAudioFiles, deleteAudioFiles, deleteProjectCDR, annonymizeCDR. Each process will be described in this article.

Data Processing Docker Image

Data Processing Service can be executed in a docker container. This service is dockerized and published in public docker repositories. You can activate the container on any docker manager solution with a command like below, for more accurate syntax please check your docker manager’s technical documentation.

This kind of usage is good for the execution of continuous processes.


docker run \
      -d --restart unless-stopped \
      -p 8380:80 \
      --env configUrl=https://CONFIG_SERVER_URL/repconfig.json \
      --env TZ=DESIRED_TIMEZONE \
      tegsoft/tegsofttouchdataprocessingserver:68

The configUrl environment variable needs to be set to a valid URL that points to a valid configuration.

The TZ parameter is for the data processing server timezone please check the link for the correct timezone value.

Command Line Execution

Data Processing Service can be executed from a command line interface. Before you start command line environment needs to be prepared. The environment can be prepared in a Tegsoft Compute Instance, some actions (projectSummary, listCDR, deleteProjectCDR, annonymizeCDR) may execute on a Tegsoft Database Instance.


mkdir -p /root/dataProcessingWorkspace
cd /root/dataProcessingWorkspace

bash <(curl -s https://setup.tegsoftcloud.com/resources/dataProcessingWorkspace/updateOnline.sh)

This kind of usage is good for one-time process execution, like moving data, listing, or deleting project recording files.

You can always run the command below for help.

/root/dataProcessingWorkspace/help.sh

Once you prepared your config file you can execute the process with the commands listed below (Please mind changing the config file name if needed).


export configUrl=/root/dataProcessingWorkspace/replicationConfig.json
nohup /root/dataProcessingWorkspace/runProcess.sh > output.txt 2>&1 &

During execution, you can exit the shell. The execution will continue and logs will be accessible via the output.txt file.

If you want to skip logs and access only the output

grep 'START\|DATA_OUTPUT\|DATA_FORMAT\|PROJECT_COMPLETE\|DONE' /root/dataProcessingWorkspace/output.txt

Display Project Summary

This action type is used to create a summary table for the project. The following parameters are valid.

projectRuleType: must be "projectSummary"

beginDate: Summary table is calculated based on project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

Listing Project Call Details

This action type is used to list project call details in a given period. The following parameters are valid.

projectRuleType: must be "listCDR"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

listType: This parameter is optional. If you want an EXCEL export of the rows this parameter needs to be EXCEL.

destinationFileName: This parameter is mandatory if the “listType” parameter value is EXCEL.  This value represents the Excel file name in absolute format. Like “/parent/folder/of/the/excel/excelFileName.xlsx”  

sheetName: This parameter is optional and “dataExport” is the default value. You can use this parameter to customize the sheet name of the Excel sheet.

Listing Project Audio Files

This action type is used to list project audio files in a given period. The process will fetch project CDR data and will search audio files against that CDR list. Search will be performed on all archive destinations. The following parameters are valid.

projectRuleType: must be "listAudioFiles"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

listType: This parameter is optional. If you want an EXCEL export of the rows this parameter needs to be EXCEL.

destinationFileName: This parameter is mandatory if the “listType” parameter value is EXCEL.  This value represents the Excel file name in absolute format. Like “/parent/folder/of/the/excel/excelFileName.xlsx”  

sheetName: This parameter is optional and “dataExport” is the default value. You can use this parameter to customize the sheet name of the Excel sheet.

Copy Project Audio Files

This action type is used to copy project audio files in a given period to a new destination. The process will fetch project CDR data and will search audio files against that CDR list. Search will be performed on all archive destinations. And found audio files will be copied to the new destination. The following parameters are valid.

projectRuleType: must be "copyAudioFiles"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

destinationFolder: This is the new location where audio files will be copied. This parameter is mandatory. destination can be a local file system, mounted NFS storage area, FTP, or SFTP-like destinations. Any kind of mountable destination can be used.

Move Project Audio Files

This action type is used to move project audio files for the project in a given period to a new destination. The process will fetch project CDR data and will search audio files against that CDR list. Search will be performed on all archive destinations. And found audio files will be copied to the new destination. After a successful copy source files will be deleted. The following parameters are valid.

projectRuleType: must be "moveAudioFiles"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

destinationFolder: This is the new location where audio files will be moved. This parameter is mandatory. destination can be a local file system, mounted NFS storage area, FTP, or SFTP-like destinations. Any kind of mountable destination can be used.

Delete Project Audio Files

This action type is used to delete project audio files in a given period. The process will fetch project CDR data and will search audio files against that CDR list. Search will be performed on all archive destinations. And found audio files will be deleted. The following parameters are valid.

projectRuleType: must be "deleteAudioFiles"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

Delete Project Call Detail Records

This action type is used to delete project call detail records (CDR) in a given period. The process will fetch project CDR data and delete the resulting data. The following parameters are valid.

projectRuleType: must be "deleteCDR"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.

Anonymize Project Call Detail Records

This action type is used to anonymize project call detail records (CDR) in a given period. The process will fetch project CDR data and anonymize the resulting data. Mostly data will convert to 9999. The following parameters are valid.

projectRuleType: must be "anonymizeCDR"

beginDate: Process will access project data between given duration. This value is the beginning date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

beginTime: This is the optional parameter to define the starting time of the duration. If the value is null “begin date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

endDate: This value is the upper limit date of the data period. Value needs to be in YYYYMMDD format like 20230727 is 27th of July 2023

endTime: This is the optional parameter to define the ending time of the duration. If the value is null “end date” parameter will be used omitting time. Value needs to be in “HH:MM:SS” format like "17:05:22" is five past 5pm and 22 seconds.

initialFetchSize: This parameter is optional and 300 is the default value. Changing this value requires advanced technical skills.

periodDurationDays: This parameter is optional and 10 is the default value. Changing this value requires advanced technical skills.



Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.