OBJECTIVE To use the Data warehousing, Data Mining, Statistic, Machine Learning education and skills acquired to work on challenging and interesting projects in industry. My goal is to use both these skills to assist data analysis & data model, build data warehouses and eventually work on data mining on the data to produce interesting patterns for decision support. Here are some industry specific positions that I qualify for (with experience from past) and I am interested: 1) ETL Developer/Any position that involves ETL experience 2) Data Analyst/Data Mining Analyst/Data warehouse Analyst/Business Intelligence Analyst 3) Data modeling 4) Reporting I am looking for positions in Dallas, Texas. PROFILE Over 5 Years of IT experience in analysis, design, development and implementation of software applications, Data Mining and Data Warehousing in Engineering, Financial, Sales, Human Resources, Health Care and Call Centre industries. Over 3 years of experience with Data Mining, Data Warehousing and Business Intelligence applications: ETL: a) Using Informatica (PowerCenter/PowerMart) 7.x/6.x/5.1.1 (Mapping Designer, Warehouse Designers, Mapplet Designer, Transformation Developer, Repository Manager, Server Manager, Workflow Manager, Workflow Monitor), OLAP and OLTP. b) Using Microsoft Integration Services SSIS (Control Flow tasks, Data Flow tasks, Transformations, Database administration tasks) c) Cognos Decision Stream (Fact/Dimension Builds, Job Streams) d) Business Objects Data Integrator Modelling: a) Data Modeling experience using Cognos Framework Manager/Business Objects Universe - well convergent with concepts of Star-join Schema/Snowflake modeling, FACT & Dimensions tables and Logical & Physical data modeling. Reporting: a) Extensive reporting experience using Cognos ReportNet (Query Studio, Report Studio)/Business Objects. Machine Learning/Data Mining/Predictive Analytics: a) Excellent working knowledge of Data Mining Models (Classification, Association Rule Mining, Clustering, Genetic Algorithms) b) Extensive experience using Data Mining tools WEKA(Analyzer, Experimenter, Feature Extraction, Visualization), Microsoft Analysis Services Extensive experience in working with relational databases like Oracle 9i/8i/8.x/7.x, MS SQL Server 2000/2005, MS Access 7.0/2000, SQL, PL/SQL, SQL*Plus, SQL*Loader and Developer 2000. Extensive experience in CAD/CAE (Computer Aided Design and Engineering) software development using C/C++/Shell Scripting/Shell programming on UNIX platform. Strong analytical, presentation, problem solving and excellent inter-personal skills to perform as a part of a team. EDUCATION M.S. in Computer Science (Specialization: Databases, Data Mining) University of Calgary, Alberta, Canada M.S in Mathematics, Indian Institute of Technology (IIT) (Madras), Chennai, India B.S in Mathematics, University of Madras, Chennai, India. SOFTWARE COURSES Diploma Equivalent (One Year) – Unix, C and Oracle (Tata InfoTech Computer Education, India) Certificate Course (2 months) – Data warehousing using Informatica, Business Objects (Texsas Technologies, India) Business Objects XI: Certificate Course – ETL using Data Integrator , Universe design using Business Objects, Report Development using Business Objects (Infosol Inc, Phoenix, Arizona, USA) Cognos Report Net and Cognos 8: Certificate Course - ETL, Modelling, Reporting (Calgary, Canada, USA) PROFESSIONAL CERTIFICATIONS Cognos Reportnet Product User Cognos Advanced Report Authoring Professional Cognos Reportnet Modelling Professional SKILLS Relational Databases: Oracle 9i, 8i / 7.x, SQL Server (2000, 2005), MS Access, Querying/Reporting: PL/SQL, SQL * Plus, SQL*Loader, TOAD Reporting Tools: Cognos ReportNet, Cognos 8, Business Objects, Oracle Reports Modelling Tools: Cognos Framework Manager, Business Objects Universe, Rational Rose, UML ETL Tools: Informatica PowerCenter/ PowerMart(7.x, 6.x, 5.1), Business Objects Data Integrator, Cognos Decision Stream, Microsoft Integration Services (SSIS) Data Mining Tools: WEKA, Microsoft Analysis Services Data Mining Models: Classification, Association Rule Mining, Clustering Knowledge Discovery Models: Genetic Algorithms Programming Languages: C, C++,Unix shell scripting, PL/SQL, Python, Shell Programming Documentation Tools: Latex, MS Office (Including MS Project, VISIO) Web Portal Technologies, Front End: HTML, Visual Studio, Sharepoint Operation Systems: Windows (98, 2000, NT, XP), UNIX, Linux PROFESSIONAL EXPERIENCE Take solutions Inc New jersey, USA June 06 - Ongoing Business Intelligence Consultant Client: Infosol Inc, Phoenix, Arizona, USA (Partner of Take Solutions Inc) Working for Banner Health, Phoenix, Arizona, USA (Client) ETL Developer/Consultant Description: Banner Health is the administrative unit of Banner Hospital in Phoenix, Arizona. Worked on 2 projects with the Enterprise Data Services (EDS) group. The first project, Quality Book‟s goal was to integrating 11 sources (or Administrative Units/Health Facilities) performance data against metrics (Measures) defined by the management. The final objective here is to report/produce dashboard on initiatives (and key metrics) by different units. The second project, Nursing Advisary Board‟s goal was to produce 10 independent patient data extracts from 10 different sources. The third project „Glucose‟ was an initiative taken by Banner to analyze the Glucose Levels of the patients to build a decision support system eventually. Every patient visits the hospital for events such as diagnosis or treatments or ordering medicines etc. When a patient is admitted for a treatment, the patient moves between different ICU and NON ICU units. The patient also goes through several „Clinical Events‟. This project tracked down the patients in nursing units with their glucose levels/insulin orders. The challenge in QBOOK project was integration between 11 sources to build 2 FACT tables and 6 Dimension tables which will also be a starting point for building Flexible & Scalable Enterprise Data Ware House System. The challenge in Nursing Advisary Board was in processing source data using complex transformations to produce the extracts. The challenge in Glucose project was to deal with 2 million data on a daily basis. From an ETL stand point, significant query optimization techniques was required load the Datawarehouse tables in a timely manner. A lot of data cleansing in dealing with date- time/var char fields were required as well. Responsibilities: Extract, Transform and Load source data into respective target tables to build the required data marts. Understand user requirements and Data Model defined and developed by the business analysts and data architects. Communicate to the business analyst, Data Modeler on a regular basis to be informed on changing requirements or data model design alterations. Stage (Extract) the source data from files, SQL server, DB2 sources into STAGE tables. Worked with Flat Files (Pipe Delimited) sources and implemented error handling routines. Worked with Flat Files Data source connections, ODBC data source connections (to SQL server sources) and DB2 data source connections. Built Lookup (LKP) tables (Code Set and Code Value) to identify correlation between facilities defined by different administrative units/departments. Used incremental approach in the Work Flows and Data Flows to integrate source data into warehouse tables using required transformations. Used Query, SQL, LKP, Merge, Case, Validation transforms to transform the data (as needed) before loading into the warehouse tables. Wrote custom functions to produce the derived fields using the source table fields for each record from source Used Date functions, String functions, Database functions to manipulate and format source data as a part of Data cleansing and data profiling process. Worked with Date/time manipulations/arithmetic in DB2 using DB2 built in functionalities as well as Data Integrator Date functions Implemented history preserving using Table comparison, History Preserving and Key generation transformations in warehouse /dimension workflows. As a part of optimization: o Used MAP operations to route UPDATE and INSERT records in warehouse workflows o Created the necessary indexes (for fields in WHERE clause) o Run stats on all tables o Right choice of transformations & right design in data flows o Implemented incremental load process Created necessary indexes Built Data marts (Fact and Dimension tables) from the warehouse tables. Handled the cross schema (Stage, Warehouse, Data mart) challenges between the different environments (Development, QA and Production) by granting permissions on tables using Database functions in scripts before running work flows. As a part of QA testing, worked with the ECS team, Dash board developer and the Data modeler to test the ETL process to populate the dashboard successfully. Prepared ETL design documents for implemented work flows and loaded the documents in the project share point site. Batch jobs configuration and set repository schedules using Data Integrator Management console Environment: Data Integrator 11.7.2 (Business Objects ETL, SQL server (2000), DB2 (Server on UNIX), Toad For DB2, Data Integrator Management Console Online Business Systems, Calgary, Alberta, Canada Dec ’06 – Present Business Intelligence Consultant Client: ConAgra Foods, Toronto, Canada April’06 -Ongoing Datawarehousing/Cognos consultant (Focussed on Reporting (inclusive of the ETL using Cognos Decision Stream, Modelling using Cognos Framework Manager)) Description: ConAgra Foods is North America‟s largest Packaged Food Company. The source system used to capture their Sales numbers is BPCS. The Sales Data Model pre-built package was bought from Cognos and this was customized to meet their specific business requirements. I was involved in development of reports in Cognos Report Studio to report on key sales metrics and the related ETL work using Decision Stream and Modelling enhancements/customization in Cognos Framework manager. The target datawarehouse tables were SQL server (2000) tables. Responsibilities: Understand the Sales domain and ConAgra business processes, Cognos pre-built Sales Datamodel (Dimensional, Star Schema) Analyzed the business logic built into the warehouse and also gathered user requirements for the reporting work Documented requirements and performed Gap analysis to identify the gaps Created the required fact/dimension tables in SQL server 2000, exported tables into the production server (from development) using import/export wizard in SQL server 2000 The business logic to identify the required sales transactions were implemented in the Data Source SQL code using constructs Used the Data Stream and Transformation model components in Decision Stream to extract the required attributes from source and map to the target elements Built the Dimension and Fact builds in Decision Stream, executed the builds and job streams to load the required Fact and Dimension data from the BPCS source to target SQL server Fact/Dimension tables Enhanced the Sales data model query subjects in the Cognos Framework manager by selecting the required query items, creating complex calculation model elements (attributes) and fact measures Published Sales package to Cognos Report Net server Used Cognos Report Studio and developed complex queries to pull the sales information from different fact and dimension tables Implemented the required prompts, filters, groups in the reports (based on user requirements) Used tabular reference from different queries, tabular models to re-use summary calculations in reporting Used data formatting techniques and complex SQL calculations (specifically in grouping) to present the data in the appropriate format Backed up XML specifications of reports and used the local save option in Cognos to transfer reports between different environments/servers (Development/Production) Balanced sales numbers between the reports and the BPCS source system (for validation/testing) on a daily basis Environment: BPCS ERP system, SQL server (2000), Cognos Decision stream ETL/Data warehousing/Dimensional Modeling, Cognos Framework manager (version 7), Cognos Report Studio (Report Net), Windows XP Client: City of Calgary, Calgary, Alberta, Canada Dec’06 – Feb‘07 Datawarehousing/Business Intelligence (with Informatica, Business Objects) consultant Description: City of Calgary offers services to the city people using a 3-1-1 phone line. Citizens can use this service line to request information on the services provided or for any form of assistance offered by the government. Call center employees record the user information through a source CSR system (Contact Center Representative System). The source information stored (in the background) in the Oracle database is loaded into the CSR Datawarehouse using the ETL tool – Informatica and Business Objects is used to report on the warehouse information providing business intelligence for end users. Worked on 7 key enhancement problems related to the Business Intelligence solutions currently offered by the CSR datawarehouse. The solutions provided covered all the areas of Business intelligence including the Datawarehouse design, ETL (Informatica) design, Business Objects Universe design and Business Objects reporting. Specific responsibilities include: Responsibilities: Understand the Call Center Datawarehouse and the underlying Dimensional Data Model Used TOAD (menu driven interface) for Oracle to query the call center datamart (in Oracle). Extensive use of SQL for querying to understand the underlying data model Technical analysis of the datawarehouse and relate to the user requirements to find the source of the problems Analyzed the business logic built into the warehouse and also gathered user requirements for enhancement problems Enhanced the data model, Designed and Developed technical solutions ( complying the underlying data model) for business users Attacked core issues of the datawarehouse including datamart security and implemented solutions Understand the ETL (Informatica) that was used to transform and load data between the source system and datamart Designed ETL process flow and developed data mapping spread sheets to define transformation rules for each stage of ETL process Enhanced/Developed new ETL processes for the enhancement solutions. Worked with transformations in Informatica Responsible for monitoring, scheduled, running, completed and failed sessions in Informatica Enhanced the Business Objects Universe design to add new tables, identify joins and contexts, define new classes and objects for enhancement solutions Provided sample reports using business objects to clients as proof of concept solutions Worked with different data providers in Business Objects, including TAB delimited and CSV data providers. Enhanced Visual Basic Macros to modify CSV source files data providers Exposure/worked with processes involving the refresh of the datawarehouse, business objects universe and executed Informatica workflows Unit and integrated testing of developed solutions in the datawarehouse, ETL, Business Objects Universe and reporting Designed, implemented and documented QA processes for User Acceptance Testing Documented the requirements gathering and the design of the solutions provided Communicated with Clients and provided status reports on a weekly basis Project management, project planning and utilization of resources Environment: CSR system, Oracle (9), ETL/Data warehousing/Dimensional Modeling, Informatica (7.4), Business Objects (5.1 and 6.5), Windows XP, TOAD for Oracle Enterprise Reporting, Information Technology, University of Calgary, Calgary, Alberta, Canada Apr ’06-Nov ‘06 Datawarehouse Analyst/Developer Description: University of Calgary, Enterprise reporting team offers Datawarehousing and Reporting services to the internal customers such as Finance, HR, Student and other administrative departments. The datawarehouse is SQL server and the reporting tool is Cognos (Reportnet) and Cognos 8. People Soft ERP is used to store all the source data and Decision Stream as well as Microsoft Integration Services (SSIS) is used to extract, transform and load between the source and the target databases. COGNOS 8- TRAINING Underwent 2 day Report Studio and Query Studio module (COGNOS 8) training from COGNOS, Ottawa, Canada. Responsibilities: Data/Requirements Design and Analysis of Data warehouse portal projects. Understand the physical data structures and contents of the Data marts in the warehouse and all related metadata and business logic. Interacting with business users and translating requirements to developers. Provide technical support (hardware and software) to COGNOS Report Net users within the University. Communication with the user base and addressing user related issues (user access, training, communication). Understand user issues; track down problems from the oracle source to people soft systems to warehouse data marts using SQL queries. Log user reports, maintain and resolve user bugs using an action request software application. Identified frequently asked problems and prepared documentation. Maintain status updates of user problems using MS EXCEL spread sheets. Extract, Transform and Load data using Microsoft ETL tool (SSIS) from stage tables between different sources and target databases (Oracle, SQL 2005). Integrate the TSQL stored procedures inside the SSIS tool to automate backend monitoring tasks in SQL 2005 using SSIS tool. Using property expressions in SSIS to dynamically connect multiple servers at runtime (to run backend jobs). Error handling using SSIS to report job failures. Preparation of test cases and test plans for system/module testing (Unit, integrated and Regression tests) for accomplished projects. Created corporate Report Templates in Report Net with Customized headers and footers. Created List, Crosstab, and Chart reports. Added business rules, filter, calculations and prompts to reports. Set up as drill through targets from other cognos reports. Managed reports in cognos connection. Tested reports for data validation, synchronization & optimizing performance. Involved in Reports Testing in different testing environments as Integration Testing, System testing and User acceptance testing. Project charter, planning and preparation of design documents for all accomplished projects. Use Visual Source Safe version control system to maintain projects versions. Environment: People Soft Source System, OLAP Cognos Report Net, ETL/Data warehousing, Oracle, SQL Server 2005, Query Analyzer, PL/SQL, Visual SourceSafe 6.1, Windows XP, Visual Studio Advanced Database Systems Department of Computer Science, University of Calgary Jan’02-Dec’04, Feb’06-Apr’06 Calgary, Alberta, Canada Graduate Assistant (Data Mining) Description: The Database Laboratory at the University of Calgary undertakes Data Mining Research work with focus to applications in Business Intelligence. The laboratory also has several industry tie-ups to ensure the practicability of the work in the real-world. I presented all related work in World Data mining conference (participated by Data Mining industries and relevant community) in Las Vegas, June 2006. Also, the work was published in a Machine Learning Conference in Los Angeles, December 2005. Responsibilities: Designed, Developed and Validated a Data Mining framework for knowledge discovery in Databases. Developed solutions (using Data Mining techniques) that process and manipulate large amounts of data into information for business decisions. Worked with Health Care (Breast Cancer Datasets), business datasets, small and large. Ability to bring a business problem within the framework of Data Mining, find solutions Proposed a new methodology, devised and developed new algorithms for Data Pre-processing and Data-Post processing. The functionalities were implemented using the C-programming language on Unix platform. Implementation also included in-depth usage of WEKA Data Mining Software tool written in JAVA. Provided basic technical support for users of WEKA software as well. Enhancements to the JAVA functionalities to improve performance. Extensive UNIX Shell Programming to interface (pass messages) between the JAVA code (tool) and the C code (Software) to exploit existing decision support techniques. Demonstrated, proven sound knowledge of Data Mining and Data Analytics Concepts (Refer publications). Experience in Predictive modeling and Data Analysis Using appropriate tools and techniques to analyze a variety of data sources Extensive Extraction, Data cleansing, Data pre-processing, feature extraction (using Pre-processing tools) (for Mining) from databases Qualitative Data analysis to finding the right data-split, data selection for training and validation datasets Experience using Random sampling, stratified random sampling, Boot Strap and Cross validation techniques for data-split Using Data Mining tools find patterns or rules answering decision support problems Extensive experience with Classification (Including Decision Tress, Neural Networks), Association Rule Mining and Clustering Data post-processing using Genetic Algorithms, rule-extraction using relevant Data Mining tools to find efficient set of accurate and intelligent (new) patterns Used post-processing visualization tools Maintenance of Data quality and Data Integrity Created and present analytical reports – both written and graphical Exposure to Data warehousing methodologies Trained students for Programming, Data structures: Pascal, C, C++, and Python on UNIX platform. Helped in coding and debugging as well. Environment: UNIX, C, WEKA Data Mining Tool, Genetic Algorithms Tom Baker Cancer Center Jan’03-Dec’03 Alberta Cancer Board, Calgary, Alberta, Canada Datamining Consultant Description: Tom Baker Cancer Center is a unit of the Alberta Cancer Research Board. They were using statistical models to detect Breast Cancer Recurrence Risk. While the quantitative data analysis was effective to an extent, they wanted to make better predictions using qualitative data analysis such as Data Mining (combined with Quantitative analysis). I worked on a project to detect risk factors patterns or rules for high risk cancer patients in Alberta using Data Mining techniques. Responsibilities: Understand the underlying Breast Cancer Data Bring the problem within the framework of WEKA Data Mining tool Data pre-processing and transformation (including cleaning missing values) of the data suitable for mining Data Split with Random sampling, Cross Validation techniques Convert the schema to a form handled by Data Mining tools (CSV format) Experience working with Flat File (CSV) schema, discrete attributes Used Analyzer component of WEKA Data Mining tool to check for file formats Used the Experimenter component of WEKA Data Mining tool to run several classification algorithms (including Decision Trees and Neural Networks) Used feature extraction techniques in the tool to discover the most important features affecting the problem Compared the performance of the algorithms, including the boosting and bagging techniques Used the Association Rule Mining technique (APRIORI) in the tool to discover patterns Used post-processing strategies for discovering intelligent and accurate patterns hence-forth Validated the results to the Cancer Board Excellent presentation skills in the form of Preparation of reports and graphical results, submitted to the Alberta Cancer Board Presented and Communicated benefits and process of Data Mining to medical audience (in Tom Baker Cancer Center and Calgary Health Region) Fluent India Pvt Ltd Aug’98-Jun’01 Subsidiary of Fluent Inc, Newhampshire, USA Senior Software Engineer Description: Fluent Inc, based at New Hampshire develops Computational Fluid Dynamics software. Their clients span engineering CAD/CAE (CAD Aided Engineering) applications. Pre-processor, Solver and Post-processors (for visualization) are the products developed. Training@Fluent Inc, USA: Underwent 2 month training program from Fluent Inc, New Hampshire, USA between Nov’98 – Dec’99 Responsibilities: As an application developer, worked through the software development lifecycle to design and build the integration of Fluent Solver with other CAD/CAE Software Products. Understand the underlying Fluent Code, Import/Export of Data File formats Understand the File formats of several CAE software products such as Ansys, Abaqus, Data Explorer, IDEAS etc. Requirements analysis, Design of functionalities facilitating data import/export between products Communicate with users, translate business requirements Implement product functionalities Worked with several file formats (Flat files, text files, Binary files, CSV files) Excellent Experience working with conversion tools, PDF, PS, Latex, HTML for text and graphical display Experience in understanding, interpreting underlying high level software codes (C, C++, LISP, Fortran) of different CAE products Experience in linking (using libraries) of software codes Experience using CVS version control technique to maintain codes Porting the software codes on different UNIX, LINUX platforms Developed and debugged prototype on multiple platforms to ensure maximum usability Developed Graphical User Interface for Integration functionalities of Fluent Software Perform Unit testing, Integrated testing, Acceptance testing of developed functionalities Also, performed regression Testing of Key functionalities of Fluent Solver. As a part of this work, written UNIX shell scripts to automate the testing process (that involves high manual time otherwise). Excellent writing skills demonstrated. Extensive Technical writing/documentation of POLYFLOW (one of the software products) user manuals for end users using Latex. Collaborated very well with developers, application staff and users of Fluent Inc across the Globe (USA, UK, Belgium) Installation of Fluent application (and supported) software‟s on Windows/UNIX workstations. Provided support to users during training/work sessions. Presentation of developed functionalities (using MS OFFICE) to developers in the team/world wide. Experience working with support/developers/technical consultants between different CAE software companies (work necessitated quite a lot of interactions) in India/North America/Europe Environment: UNIX, C, C++, Latex, MS Office products (Word, Power Point) Tata InfoTech Computer Education, India July’07-July’08 Software Training Accomplished Projects: Automating Updation of Time Sheets Time Sheet System is user-friendly software used to check if employees have logged their time sheets weekly. Developed a system using Shell programming whose functionalities include: Keep track of users time sheets Check if users have updated their time sheets on the specified day of the week If the users did not time sheets, send a reminder Once the time sheets are updated in their local directories, copy it to a common time sheet repository Send a report to the team manager on the status of time sheets Created variables, used UNIX commands (echo, ls, wc, cp etc.) Proficient in using regular expressions using grep, find, expr, sed, awk Used File manipulation commands such as cat, touch, sort Very proficient in using pipes, redirections and shell programming constructs (if-then, case) Environment: UNIX Shell Programming Automated Inventory System Involved in design and development of an application for automation of Government identity cards (for purchase of items in a Government handled unit). This interactive application has the following features. Inform customers about availability of things and ordering. Kept trace of consumer payments. Accepted orders from consumer and allot goods taking into account the priority or demand of each consumer Involved in configuration management and quality assurance Simple Report generation on daily statistics using status reports. Responsible for development of front end using Developer 2000. Store backend data in Oracle; write triggers to pop event messages using PL / SQL and Oracle reports Environment: Oracle, PL /SQL, Oracle Designer and Forms4.5 and 5 Library Management System Library management system is user-friendly software used for the maintenance of a library. It maintained the transaction of books, record of members and the list of returnable books. It provided reports such as categorized by members, group codes, author and publishers. It also gave the listing of members, issues, outstanding and all the facilities and options required for an efficient library management system. Analyzed the requirements and prepared the analysis report. Designed and developed the functionalities: Registration Item Management Lending Search for Books Search for Members Used basic and advanced C programming techniques using functions, procedures, Pointers, Linked lists and Memory management on UNIX platform Environment: UNIX and C (Pointers, Linked Lists, Memory Management) Publications J. Gopalan, E. Korkmaz, R. Alhajj and K. Barker, “Effective Data Mining by Integrating Genetic Algorithm into the Data Preprocessing Phase,” Proceedings of the International Conference on Machine Learning and Applications, Los Angeles, CA, USA, Dec. 2005. J. Gopalan, R. Alhajj and K. Barker, “Post-processing Rule Sets Using Genetic Algorithms,” Proceedings of the International Conference on Data Mining, Las Vegas, USA, June. 2006.