Anda di halaman 1dari 4

Google Certified Professional - Data Engineer

Job Role Description


A Google Certified Professional - Data Engineer enables data-driven decision making by collecting,
transforming, and visualizing data. The data engineer should be able to design, build, maintain, and
troubleshoot data processing systems with a particular emphasis on the security, reliability,
fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able
to analyze data to gain insight into business outcomes, build statistical models to support
decision-making, and create machine learning models to automate and simplify key business processes.

Certification Exam Guide


Section 1: Designing data processing systems
1.1
Designing flexible data representations. Considerations include:
future advances in data technology
changes to business requirements
awareness of current state and how to migrate the design to a future state
data modeling
tradeoffs
distributed systems
schema design
1.2

Designing data pipelines. Considerations include:


future advances in data technology
changes to business requirements
awareness of current state and how to migrate the design to a future state
data modeling
tradeoffs
system availability
distributed systems
schema design
common sources of error (eg. removing selection bias)

1.3

Designing data processing infrastructure. Considerations include:


future advances in data technology
changes to business requirements
awareness of current state, how to migrate the design to the future state
data modeling
tradeoffs
system availability
distributed systems
schema design
capacity planning

different types of architectures: message brokers, message queues, middleware,


service-oriented

Section 2: Building and maintaining data structures and databases


2.1
Building and maintaining flexible data representations
2.2

Building and maintaining pipelines. Considerations include:


data cleansing
batch and streaming
transformation
acquire and import data
testing and quality control
connecting to new data sources

2.3

Building and maintaining processing infrastructure. Considerations include:


provisioning resources
monitoring pipelines
adjusting pipelines
testing and quality control

Section 3: Analyzing data and enabling machine learning


3.1
Analyzing data. Considerations include:
data profiling
data correlation
patterns and insights
anomaly detection
statistical models
machine learning
assessing the statistical relevance of conclusions
3.2
Transforming data to enable machine learning and pattern discovery. Considerations
include:
repeatability
generalization
distributed computing
improved model accuracy
3.3

Identifying or building data visualization and reporting tools. Considerations include:


automation
decision support
data summarization
enabling patterns and insights

Section 4: Modeling business processes for analysis and optimization


4.1
Mapping business requirements to data representations. Considerations include:
working with business users
gathering business requirements
4.2
Optimizing data representations, data infrastructure performance and cost.
Considerations include:
resizing and scaling resources
data cleansing, distributed systems
high performance algorithms
common sources of error (eg. removing selection bias)
Section 5: Ensuring reliability
5.1
Performing quality control. Considerations include:
verification
building and running test suites
pipeline monitoring
5.2
Assessing, troubleshooting, and improving data representations and data processing
infrastructure.
5.3

Recovering data. Considerations include:


planning (e.g. fault-tolerance)
executing (e.g., rerunning failed jobs, performing retrospective re-analysis)
stress testing data recovery plans and processes

Section 6: Visualizing data and advocating policy


6.1
Building (or selecting) data visualization and reporting tools. Considerations include:
automation
decision support
data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)
6.2

Advocating policies and publishing data and reports.

Section 7: Designing for security and compliance


7.1
Designing secure data infrastructure and processes. Considerations include:
Identify and Access Management (IAM)
data security
penetration testing
Separation of Duties (SoD)
security control
7.2

Designing for legal compliance. Considerations include:

Health Insurance Portability and Accountability Act (HIPAA), Childrens Online


Privacy Protection Act (COPPA), etc.
audits

Anda mungkin juga menyukai