P. 1
Struktur Organisasi Data 1

Struktur Organisasi Data 1

|Views: 157|Likes:
Dipublikasikan oleh Agung Prastyo

Tugas Perkuliahan

Tugas Perkuliahan

More info:

Published by: Agung Prastyo on Dec 14, 2012
Hak Cipta:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

09/17/2013

Struktur Organisasi Data 1

Tugas Pengulangan Kelas
Agung Prastyo Wibowo | 32109534 | 2DB20

3DB23

DATA MINING
Data mining (the analysis step of the knowledge discovery in

databases process,[1] or KDD), a relatively young and interdisciplinary field of computer science[2][3] is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database

systems.[2] The goal of data mining is to extract knowledge from a data set in a humanunderstandable structure[2] and involves database and data management, data merics, complexity

preprocessing, model and inference considerations,

interestingness

considerations, post-processing of found structure, visualization and online updating.[2]

The term is a buzzword, and is frequently misused to mean any form of large scale data or information processing (collection, extraction, warehousing, analysis and statistics) but also generalized to any kind of computer decision support system including artificial intelligence, machine learning and business intelligence. In the proper use of the word, the key term is discovery, commonly defined as "detecting something new". Even the popular book "Data mining: Practical machine learning tools and techniques with Java"[4] (which covers mostly machine learning material) was originally to be named just "Practical machine learning", and the term "data mining" was only added for marketing reasons.[5] Often the more general terms "(large scale) data analysis" or "analytics" or when referring to actual methods, artificial intelligence and machine learning are more appropriate.

The actual data-mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). This usually involves using database techniques such as spatial indexes. These patterns can then be seen as a kind of summary of the input data, and used in further analysis or for example in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data

1|Tugas Pengulangan Kelas SOD

collection, data preparation nor result interpretation and reporting are part of the data mining step, but do belong to the overall KDD process as additional steps. The related terms data dredging, data fishing and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.

The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has increased data collection, storage and manipulations. As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms (1950s),decision trees (1960s) and support vector machines (1990s).

Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns.[6] It has been used for many years by businesses, scientists and governments to sift through volumes of data such as airline passenger trip records, census data and supermarket scanner data to produce market research reports. (Note, however, that reporting is not always considered to be data mining.)

A primary reason for using data mining is to assist in the analysis of collections of observations of behavior. Such data is vulnerable to collinearity because of unknown interrelations. An unavoidable fact of data mining is that the (sub-)set(s) of data being analyzed may not be representative of the whole domain, and therefore may not contain examples of certain critical relationships and behaviors that exist across other parts of the domain. To address this sort of issue, the analysis may be augmented using experimentbased and other approaches, such as choice modelling for human-generated data. In these situations, inherent correlations can be either controlled for, or removed altogether, during the construction of the experimental design. 2|Tugas Pengulangan Kelas SOD

Data mining involves six common classes of tasks:[1]

Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors and require further investigation.

Association rule learning (Dependency modeling) – Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam.

 

Regression – Attempts to find a function which models the data with the least error. Summarization – providing a more compact representation of the data set, including visualization and report generation.i

3|Tugas Pengulangan Kelas SOD

PENGGALIAN DATA

Penggalian data (bahasa Inggris: data mining) adalah ekstraksi pola yang menarik dari data dalam jumlah besar [1]. Suatu pola dikatakan menarik apabila pola tersebut tidak sepele, implisit, tidak diketahui sebelumnya, dan berguna. Pola yang disajikan haruslah mudah dipahami, berlaku untuk data yang akan diprediksi dengan derajat kepastian tertentu, berguna, dan baru. Penggalian data memiliki beberapa nama alternatif, meskipun definisi eksaknya berbeda, seperti KDD (knowledge discovery in database), analisis pola, arkeologi data, pemanenan informasi, dan intelegensia bisnis. Penggalian data diperlukan saat data yang tersedia terlalu banyak (misalnya data yang diperoleh dari sistem basis data perusahaan, e-commerce, data saham, dan data bioinformatika), tapi tidak tahu pola apa yang bisa didapatkan.

Penggalian data adalah salah satu bagian dari proses pencarian pola. Berikut ini urutan proses pencarian pola: 1. Pembersihan Data: yaitu menghapus data pengganggu (noise) dan mengisi data yang hilang. 2. Integrasi Data: yaitu menggabungkan berbagai sumber data. 3. Pemilihan Data: yaitu memilih data yang relevan. 4. Transformasi Data: yaitu mentransformasi data ke dalam format untuk diproses dalam penggalian data. 5. Penggalian Data: yaitu menerapkan metode cerdas untuk ekstraksi pola. 6. Evaluasi pola: yaitu mengenali pola-pola yang menarik saja. 7. Penyajian pola: yaitu memvisualisasi pola ke pengguna.

4|Tugas Pengulangan Kelas SOD

Perkembangan yang pesat di bidang pengumpulan data dan teknologi penyimpanan di berbagai bidang, menghasilkan basis data yang terlampau besar. Namun, data yang dikumpulkan jarang dilihat lagi, karena terlalu panjang, membosankan, dan tidak menarik. Seringkali, keputusan -yang katanya berdasarkan data- dibuat tidak lagi berdasarkan data, melainkan dari intuisi para pembuat keputusan. Sehingga, lahirlah cabang ilmu penggalian data ini.

Analisis data tanpa menggunakan otomasi dari penggalian data adalah tidak memungkinkan lagi, kalau 1) data terlalu banyak, 2) dimensionalitas data terlalu besar, 3) data terlalu kompleks untuk dianalisis manual (misalnya: data time series, data

spatiotemporal, data multimedia, data streams).

Pada dasarnya penggalian data dibedakan menjadi dua fungsionalitas, yaitu deskripsi dan prediksi. Berikut ini beberapa fungsionalitas penggalian data yang sering digunakan:

Karakterisasi

dan

Diskriminasi:

yaitu

menggeneralisasi,

merangkum,

dan

mengkontraskan karakteristik data.

Penggalian pola berulang: yaitu pencarian pola asosiasi (association rule) atau pola intra-transaksi, atau pola pembelian yang terjadi dalam satu kali transaksi.

Klasifikasi: yaitu membangun suatu model yang bisa mengklasifikasikan suatu objek berdasar atribut-atributnya. Kelas target sudah tersedia dalam data sebelumnya, sehingga fokusnya adalah bagaimana mempelajari data yang ada agar klasifikator bisa mengklasifikasikan sendiri.

Prediksi: yaitu memprediksi nilai yang tidak diketahui atau nilai yang hilang, menggunakan model dari klasifikasi.

Penggugusan/Cluster analysis: yaitu mengelompokkan sekumpulan objek data berdasarkan kemiripannya. Kelas target tidak tersedia dalam data sebelumnya, sehingga fokusnya adalah memaksimalkan kemiripan intrakelas dan meminimalkan kemiripan antarkelas.

Analisis outlier: yaitu proses pengenalan data yang tidak sesuai dengan perilaku umum dari data lainnya. Contoh: mengenali noise dan pengecualian dalam data.

Analisis trend dan evolusi: meliputi analisis regresi, penggalian pola sekuensial, analisis periodisitas, dan analisis berbasis kemiripan.ii

5|Tugas Pengulangan Kelas SOD

Data Security
Encryption is a process that changes a code of conduct that can be understood to

be acode that can not be understood (not legible). Encryption can also be interpreted as a code or cipher. Security aspects of this data actually covers a author will lot of things the that are methods

related, but specifically in

paper the

discuss about

of encryption and security of data protection in some common application programs.

Almost all application programs such as MS Word, WordPerfect, Excel, PKZip provides data protection with the development'spassword, but this facility is actually easy to be dismantled. Even specialized programssuch as Norton Diskreet data protection (perhaps now rarely used) that protects data withDES method or methods "proprietary" faster, actually is not very safe. The method usedDES have an error in its implementation which greatly reduces the effectiveness of these methods.

Although it can accept passwords up to 40 characters, the character is thenconverted to all uppercase and then in-Reduced to 8 characters. This leads to a very largereduction in the number of possible encryption keys, so that not only the limited

number ofpossible passwords, but also there are a large number of equivalent keys that can be usedto decrypt the file. As an example of an encrypted file with the key 'xxxxxxx' to

can decrypt the'xxxxxx', 'xxxxyy', 'yyyyxx'. PC

Tools (this may also have hard

find) is another example of a software package that provides data protection is very unsafe. The implementation ofDES in this program reduces the 'round' on the DES are supposed 16 to 2, which makes it very easy to be dismantled.

For data protection is important enough there is no other way but to use a special programprotection / encryption of data. When this has been a lot of outstanding special programdata protection either freeware, shareware, and commercial is very good. In

general, these programs not only provide a single method alone, but some kind so that we can choosewho we think is the most secure

6|Tugas Pengulangan Kelas SOD

Keamanan Data

Enkripsi adalah sebuah proses yang melakukan perubahan sebuah kode dari yang bisa dimengerti menjadi sebuah kode yang tidak bisa dimengerti (tidak terbaca). Enkripsi juga dapat diartikan sebagai kode atau chipper. Aspek keamanan data sebenarya meliputi banyak hal yang saling berkaitan, tetapi khusus dalam tulisan ini penulis akan membahas tentang metoda enkripsi dan keamanan proteksi data pada beberapa program-program aplikasi umum. Hampir semua program aplikasi seperti MS Word, WordPerfect, Excel, PKZip menyediakan fasilitas proteksi data dengan pem-password-an, tapi sebenarnya fasilitas ini mudah untuk dibongkar. Bahkan program khusus proteksi data seperti Norton Diskreet (mungkin sekarang sudah jarang digunakan) yang memproteksi data dengan metoda DES ataupun metoda "proprietary" yang lebih cepat, sebenarnya sangat tidak aman. Metoda DES yang digunakan mempunyai kesalahan dalam implementasinya yang sangat mengurangi keefektipan dari metoda tersebut.

Walaupun dapat menerima password sampai 40 karakter, karakter ini kemudian diubah menjadi huruf besar semua dan kemudian di-reduce menjadi 8 karakter. Hal ini menyebabkan pengurangan yang sangat besar terhadap kemungkinan jumlah kunci enkripsi, sehingga tidak hanya terbatasnya jumlah password yang mungkin, tetapi juga ada sejumlah besar kunci yang equivalen yang dapat digunakan untuk mendekrip file. Sebagai contoh file yang dienkrip dengan kunci 'xxxxxxx' dapat didekrip dengan 'xxxxxx', 'xxxxyy', 'yyyyxx'. PC Tools (mungkin ini juga sudah sulit ditemukan) adalah contoh lain paket software yang menyediakan fasilitas proteksi data yang sangat tidak aman. Implementasi DES pada program ini mengurangi 'round' pada DES yang seharusnya 16 menjadi 2, yang membuatnya sangat mudah untuk dibongkar.

Untuk proteksi data yang cukup penting tidak ada jalan lain selain menggunakan program khusus proteksi/enkripsi data. Saat ini telah banyak beredar program khusus proteksi data baik freeware, shareware, maupun komersial yang sangat baik. Pada umumnya program tersebut tidak hanya menyediakan satu metoda saja, tetapi beberapa jenis sehingga kita dapat memilih yang menurut kita paling amaniii

7|Tugas Pengulangan Kelas SOD

SISTEM FILE
Pengertian Sistem Berkas Sistem berkas merupakan mekanisme penyimpanan on-line serta untuk akses, baik data maupun program yang berada dalam system operasi. Terdapat dua bagian penting dalam sistem berkas, yaitu : • Kumpulan berkas, sebagai tempat penyimpanan data, serta • Struktur direktori, yang mengatur dan menyediakan informasi mengenai seluruh berkas dalam system

Konsep Dasar Berkas Komputer dapat menyimpan informasi ke beberapa media penyimpanan yang berbeda, seperti magnetic disks, magnetic tapes dan optical disks. Agar komputer dapat digunakan dengan nyaman, system operasi menyediakan system penyimpanan dengan sistematika yang seragam. Sistem operasi mengabstraksikan property fisik dari media penyimpanannya dan mendefinisikan unit penyimpanan logis yaitu berkas. Berkas dipetakan ke media fisik oleh system operasi.

Media penyimpanan ini umumnya bersifat non-volatile, sehingga kandungan di dalamnya tidak akan hilang jika terjadi gagal listrik maupun system reboot. Berkas adalah kumpulan informasi berkait yang diberi nama dan direkam pada penyimpanan sekunder. Dari sudut pandang pengguna, berkas merupakan bagian terkecil dari penyimpanan logis, artinya data tidak dapat ditulis ke penyimpanan sekunder kecuali jika berada di dalam berkas.

Biasanya berkas merepresentasikan program dan data. Data dari berkas dapat bersifat numeric, alfabetik, alfanumerik atau pun biner. Format berkas juga bias bebas, misalnya berkas teks atau dapat juga diformat pasti. Secara umum, berkas adalah urutan bit, byte, baris atau catatan yang didefinisikan oleh pembuat berkas dan pengguna.

8|Tugas Pengulangan Kelas SOD

Informasi dalam berkas ditentukan oleh pembuatnya. Ada banyak beragam jenis informasi yang dapat disimpan dalam berkas. Hal ini disebabkan oleh struktur tertentu yang dimiliki oleh berkas, sesuai dengan jenisnya masing-masing. Contohnya : • Text file; yaitu urutan karakter yang disusun ke dalam baris-baris • Source file; yaitu urutan subroutine dan fungsi yang nantinya akan dideklarasikan • Object file; merupakan urutan byte yang diatur ke dalam blok-blok yang dikenali oleh linker dari system • Executable file; adalah rangkaian code section yang dapat dibawa loader ke dalam memori dan dieksekusiiv

9|Tugas Pengulangan Kelas SOD

FILE SYSTEM

What Is A Computer File System?
Hard drives are divided into sectors of about 512 bytes each. Sectors in turn are grouped into clusters. Clusters, also known as allocation units, have a defined size of 512 bytes to 64 kilobytes, so they usually contain multiple sectors. A cluster represents a continuous block of space on the disk. Operating systems rely on a file system to organize the clustered storage space. The file system maintains a database that records the status of each cluster. In essence, the file system shows the operating system in which cluster(s) a file is stored and where space is available to store new data.

Which File Systems Should I Know Of?
The prevalent Windows file systems are FAT (File Allocation Table), FAT32, and NTFS (New Technology File System). Briefly, NTFS supports a file size of more than 4 GB, partitions can be larger than 32 GB, it better manages available space than FAT or FAT32 and thus causes less fragmentation, and it comes with a number of securityrelated features including on-the-fly file encryption. Compared to NTFS, FAT file systems take up less space, they perform less write operations to the drive which makes them faster and a better fit for small flash drives, and they are cross-platform compatible. The biggest drawbacks of FAT and FAT32 are a partition size limit of 32 GB and a file size limited to 2GB or 4 GB, respectively. A new file system predominantly used for flash drives is exFAT (Extended File Allocation Table), also known as FAT64. Like NTFS it supports files larger than 4 GB, partitions larger than 32 GB, and its file management avoids fragmentation. At the same time it is fast and optimized for mobile personal storage and handling media files.

Which Operating Systems Can Handle These File Systems?
While FAT and FAT32 are recognized by almost all operating systems, formatting a drive with NTFS used to be a sure way to make the device unusable outside Windows. Meanwhile, NTFS read/write is supported natively by most Linux distributions. A hack is available to enable NTFS read/write on the Mac OS X version 10.6, however, it appears to be unstable, hence the use of MacFuse is recommended. exFAT on the other hand requires drivers for both Windows XP and Linux, while it is supported in the latest versions of Windows Vista (SP1), Windows 7, and Mac OS X.

10 | T u g a s P e n g u l a n g a n K e l a s S O D

Why Is Cluster Size Important?
If you have ever formatted a drive, you will know that you can choose the allocation unit size, also known as cluster size. Depending on the cluster size (from 512 bytes to 64 kilobytes) a single file can be stored in one or across hundreds or thousands of clusters. When a file is smaller than the actual cluster size, the remaining space is lost, a phenomenon known as wasted or slack space. Thus a large cluster size will lead to a lot of slack space, if lots of small files are stored on that drive. Choosing a small cluster size on the other hand means that large files are split up into many small pieces. This in turn can slow down the drive as it takes longer to read the respective file. In other words, choose the cluster size wisely. v

DATA STRUCTURE
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.[1][2] Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, B-trees are particularly well-suited for

11 | T u g a s P e n g u l a n g a n K e l a s S O D

implementation of databases, while compiler implementations usually use hash tables to look up identifiers.

Data structures are used in almost every program or software system. Data structures provide a means to manage huge amounts of data efficiently, such as

large databases and internet indexing services. Usually, efficient data structures are a key to designing efficient algorithms. Some formal design methods and programming

languages emphasize data structures, rather than algorithms, as the key organizing factor in software design.

An array stores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixed-length or expandable.

Record (also called tuple or struct) Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually

called fields or members.

A hash or dictionary or map is a more flexible variation on a record, in which namevalue pairs can be added and deleted freely.

Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and an integer; whereas, in a union, there is only one value at a time.

A tagged union (also called a variant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety.

A set is an abstract data structure that can store certain values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in".

An object contains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Data structures not containing code, like those above, are called plain old data structure.

12 | T u g a s P e n g u l a n g a n K e l a s S O D

Many others are possible, but they tend to be further variations and compounds of the above.

Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address—a bit string that can be itself stored in memory and manipulated by the program. Thus the record and array data structures are based on computing the addresses of data items with arithmetic operations; while the linked data structures are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways (as in XOR linking)

The implementation of a data structure usually requires writing a set of procedures that create and manipulate instances of that structure. The efficiency of a data structure cannot be analyzed separately from those operations. This observation motivates the theoretical concept of an abstract data type, a data structure that is defined indirectly by the operations that may be performed on it, and the mathematical properties of those operations (including their space and time cost).

Most assembly languages and some low-level languages, such as BCPL, lack support for data structures. Many high-level programming languages, and some higher-level assembly languages, such asMASM, on the other hand, have special syntax or other built-in support for certain data structures, such as vectors (one-dimensional arrays) in the C language or multi-dimensional arrays in Pascal. Most programming languages feature some sorts of library mechanism that allows data structure implementations to be reused by different programs. Modern languages usually come with standard libraries that implement the most common data structures. Examples are the C++ Standard Template Library, the Java Collections Framework, and Microsoft's .NET Framework.

Modern languages also generally support modular programming, the separation between the interface of a library module and its implementation. Some provide opaque data

13 | T u g a s P e n g u l a n g a n K e l a s S O D

types that allow clients to hide implementation details. Object-oriented programming languages, such as C++, Java and .NET Framework use classes for this purpose. Many known data structures have concurrent versions that allow multiple computing threads to access the data structure simultaneously.vi

i

http://en.wikipedia.org/wiki/Data_mining http://id.wikipedia.org/wiki/Penggalian_data iii http://menirlina.blogspot.com/2010/11/keamanan-data.html iv http://lulu.staff.gunadarma.ac.id/Downloads/files/14932/SISTEM+FILE.pdf v http://www.makeuseof.com/tag/file-system-find-runs-drives/ vi http://en.wikipedia.org/wiki/Data_structure
ii

14 | T u g a s P e n g u l a n g a n K e l a s S O D

You're Reading a Free Preview

Mengunduh
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->