0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
187 tayangan4 halaman
The document outlines the job role and certification exam guide for a Google Certified Professional - Data Engineer. A data engineer enables data-driven decision making by collecting, transforming, and visualizing data to design, build, maintain, and troubleshoot reliable data processing systems. The certification exam tests knowledge across 7 sections: designing flexible and scalable data systems; building and maintaining data structures and pipelines; analyzing and transforming data for machine learning; modeling business processes; ensuring reliability; visualizing data; and designing for security and compliance.
The document outlines the job role and certification exam guide for a Google Certified Professional - Data Engineer. A data engineer enables data-driven decision making by collecting, transforming, and visualizing data to design, build, maintain, and troubleshoot reliable data processing systems. The certification exam tests knowledge across 7 sections: designing flexible and scalable data systems; building and maintaining data structures and pipelines; analyzing and transforming data for machine learning; modeling business processes; ensuring reliability; visualizing data; and designing for security and compliance.
The document outlines the job role and certification exam guide for a Google Certified Professional - Data Engineer. A data engineer enables data-driven decision making by collecting, transforming, and visualizing data to design, build, maintain, and troubleshoot reliable data processing systems. The certification exam tests knowledge across 7 sections: designing flexible and scalable data systems; building and maintaining data structures and pipelines; analyzing and transforming data for machine learning; modeling business processes; ensuring reliability; visualizing data; and designing for security and compliance.
A Google Certified Professional - Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The data engineer should be able to design, build, maintain, and troubleshoot data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able to analyze data to gain insight into business outcomes, build statistical models to support decision-making, and create machine learning models to automate and simplify key business processes.
Certification Exam Guide
Section 1: Designing data processing systems 1.1 Designing flexible data representations. Considerations include: future advances in data technology changes to business requirements awareness of current state and how to migrate the design to a future state data modeling tradeoffs distributed systems schema design 1.2
Designing data pipelines. Considerations include:
future advances in data technology changes to business requirements awareness of current state and how to migrate the design to a future state data modeling tradeoffs system availability distributed systems schema design common sources of error (eg. removing selection bias)
1.3
Designing data processing infrastructure. Considerations include:
future advances in data technology changes to business requirements awareness of current state, how to migrate the design to the future state data modeling tradeoffs system availability distributed systems schema design capacity planning
different types of architectures: message brokers, message queues, middleware,
service-oriented
Section 2: Building and maintaining data structures and databases
2.1 Building and maintaining flexible data representations 2.2
Building and maintaining pipelines. Considerations include:
data cleansing batch and streaming transformation acquire and import data testing and quality control connecting to new data sources
2.3
Building and maintaining processing infrastructure. Considerations include:
provisioning resources monitoring pipelines adjusting pipelines testing and quality control
Section 3: Analyzing data and enabling machine learning
3.1 Analyzing data. Considerations include: data profiling data correlation patterns and insights anomaly detection statistical models machine learning assessing the statistical relevance of conclusions 3.2 Transforming data to enable machine learning and pattern discovery. Considerations include: repeatability generalization distributed computing improved model accuracy 3.3
Identifying or building data visualization and reporting tools. Considerations include:
automation decision support data summarization enabling patterns and insights
Section 4: Modeling business processes for analysis and optimization
4.1 Mapping business requirements to data representations. Considerations include: working with business users gathering business requirements 4.2 Optimizing data representations, data infrastructure performance and cost. Considerations include: resizing and scaling resources data cleansing, distributed systems high performance algorithms common sources of error (eg. removing selection bias) Section 5: Ensuring reliability 5.1 Performing quality control. Considerations include: verification building and running test suites pipeline monitoring 5.2 Assessing, troubleshooting, and improving data representations and data processing infrastructure. 5.3
Recovering data. Considerations include:
planning (e.g. fault-tolerance) executing (e.g., rerunning failed jobs, performing retrospective re-analysis) stress testing data recovery plans and processes
Section 6: Visualizing data and advocating policy
6.1 Building (or selecting) data visualization and reporting tools. Considerations include: automation decision support data summarization, (e.g, translation up the chain, fidelity, trackability, integrity) 6.2
Advocating policies and publishing data and reports.
Section 7: Designing for security and compliance
7.1 Designing secure data infrastructure and processes. Considerations include: Identify and Access Management (IAM) data security penetration testing Separation of Duties (SoD) security control 7.2
Designing for legal compliance. Considerations include:
Health Insurance Portability and Accountability Act (HIPAA), Childrens Online