_-_JUST FOR FUN_-

Big Data

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.

Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on."^[2] Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data sets in areas includingInternet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,^[3]connectomics, complex physics simulations, biology and environmental research.^[4]

Data sets are growing rapidly in part because they are increasingly gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.^[5]^[6] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;^[7] as of 2012, every day 2.5 exabytes (2.5×10¹⁸) of data are created.^[8] One question for large enterprises is determining who should own big data initiatives that affect the entire organization.^[9]

Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers".^[10] What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.

Cloud

Cloud computing, also on-demand computing, is a kind of Internet-based computing that provides shared processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources.^[1]^[2] Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers.^[3] It relies on sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network. At the foundation of cloud computing is the broader concept of converged infrastructure and shared services.

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort.

Proponents claim that cloud computing allows companies to avoid upfront infrastructure costs, and focus on projects that differentiate their businesses instead of on infrastructure.^[4] Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT to more rapidly adjust resources to meet fluctuating and unpredictable business demand.^[4]^[5]^[6] Cloud providers typically use a "pay as you go" model. This can lead to unexpectedly high charges if administrators do not adapt to the cloud pricing model.^[7]

The present availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture, and autonomic and utility computing have led to a growth in cloud computing.^[8]^[9]^[10] Companies can scale up as computing needs increase and then scale down again as demands decrease.

Cloud computing has become a highly demanded service or utility due to the advantages of high computing power, cheap cost of services, high performance, scalability, accessibility as well as availability. Some cloud vendors are experiencing growth rates of 50% per annum,^[11] but due to being in a stage of infancy, it still has pitfalls that need proper attention to make cloud computing services more reliable and user friendly.^[12]^[13]

A. Program Design Methodology

• Programming is really all about solving problems...but we use a computer to do that for us! With small problems -- you may be able to get away with just thinking of the solution to the problem in your head.

• In fact, many of you probably are writing your algorithms as an after thought instead of using an algorithm as a road-map to actually solving the problem by breaking it down in to major tasks & then subtasks. And not coding until we could get down to very fine detail of those subtasks FIRST WITH WORDS/Pseudo code.

• However, as we progress we will be concentrating on solving larger problems where the first step MUST be to design an algorithm to break down the problem in to sub-tasks and then we will implement the subtasks one at a time!

• Therefore, in order to solve the problems on a computer -- we must know how to design good algorithms. You all know what a basic algorithm is ... but now we have to concentrate on getting good at it! Because if you do your algorithm wrong (i.e., you go about solving the problem incorrectly)...then you might get the whole program wrong! When dealing with a big project where there may be multiple engineers developing code for one big program -- picking the wrong solution could lead to schedule delays...budget overruns...and overall your bosses lack of confidence in your work!

• Once you have an algorithm that REALLY WORKS - then programming is a very simply task - dealing primarily with the syntax we have been learning.

• The only problem is that designing algorithms is a very creative process...and might even be considered to be an artform. It is patterned after how we think...so that no two algorithms for a complex program will be alike!

• None the less - there are some basic steps we should be very familiar with when solving a problem and writing an algorithm. There is no guarantee that by following these steps you will create an accurate algorithm ... but it should help you to take the problem a step at a time and break it down into bite size pieces!

B. Steps to Developing an Algorithm

First....

• Make sure you have a complete specification. Look at what the specification states...does it make sense? Is it clear? When you read it do you have questions.

• Double check that the specification clearly outlines what input into the program is required...and what its format is (is all of the data on one line). Look at what output is required...is it completely defined?

• Next look at error conditions. Under what cases does the specification require error messages - and how should they appear. What type of data is correct versus incorrect? Should the program compute answers even if the data is incorrect?

• Make sure you know how the program should act when it ends. Is there a final message that gets printed to signify that the end of the program has been reached?

• Once you've made a list of all of the problems and questions you can come up with the specification -- go to your instructor, supervisor, or project leader and get them ironed out. Either participate with rewriting the specification or request that a new specification be provided that covers all of your concerns.

• So - Having a COMPLETE, CONCISE, and CORRECT specification is a NECESSARY first step! Ensuring this will enable you to not leave out vital information in your program.

Next....

• Formulate a precise statement of the problem to be solved by the algorithm.

• Break the problem into subproblems (subtasks).

• For each subtask...

formulate a precise statement of the problem the subtask is to solve

use existing algorithms if they have already been developed

Or, use standard techniques for solving the problem if they exist

design the data structure necessary to organize the data involved

write out each step of the subtask in english/pseudo code. These

steps should be at such a low level that it is clear how to develop a C++ program to do each step. The solution at this

point for each step you write out should be very obvious.

Then....

• When implementing the C++ code -- the modules (functions) designed should match the major subtasks developed!

• When a problem is divided into subtasks --- you can design algorithms for each subtask -- and therefore code and debug each subtask separately. This way we can test/debug a portion of the entire problem a step at a time to make sure that we are correctly solving each step.

• This approach is called procedural abstraction - where we build functions that match our major subtasks -- which can become self-contained subpieces. Once they are designed and debugged - we no longer have to worry about their inner-workings and can just be concerned that we know what their (a) purpose is, (b) input parameters are, and (c) output parameters are.

Now Let's Look at an Example --- Let's design an algorithm:

1. What if our problem specification was:

Write a modular program that maintains a database of bibliographic references in the form Author, Title, Journal, Volume, Page, Year. There may be 40 characters in the Author's name, in the book Title, and in the Journal's name. The Volume, Page, and Year are all integers. Design a program that allows users to enter a list of references, view the list, search for a desired reference by author or title, and delete references.

2. So, our first step is to make sure our specification covers everything we need to know inorder to design an algorithm:

• What about error checking?

• How does the user indicate what task he wants to do?

• Does this program run forever or is there a way to terminate it?

• What type of message should be printed at the end of the program?

• When responding to a request to view the list -- how should it appear?

• How many references can be in our database?

3. Once we get answers to these questions, it is time to start designing our algorithm. Let's first precisely state what the program will do:

Maintain a database of bibliographic references which can be interactively modified (references added, deleted), viewed, or searched.

4. The major tasks would be:

1. Introduce the user to our interactive bibliographic references program

2. Find out what the user wants to do: Enter a reference, View all references, View a specific reference, Delete a reference, or quit.

3. Either ...

a. Enter a reference,

b. View all references,

c. View a specific reference,

d. Delete a reference, or

e. If the user wants to quit, provide a shut down message and thank him for using our system!

4. Unless the user wants to quit -- continue with Step #2.

5. Now let's break this down into subtask...

6. And, design the data structures

• Once the algorithm is designed...it is time to test it. Choose test data that will expose any possible errors. Use boundary values -- using the largest and smallest possible values allowable. Use erroneous data and see if the correct results occur. Double check that all of your loops iterate the appropriate number of times. Make sure that all cases are fully exercised. A good rule of thumb is to design (YES DESIGN!) your tests to execute every line of code in addition to checking the above conditions.

• After we fully test and fix our bugs (called debugging), we should look at our code and verify that it is indeed portable. Since programs represent a large investment in programming time, it really does pay to make sure it will work on a variety of computers. One way this is done is to use standard C++ syntax and not tricks that may be available on one system but not on another. This doesn't ensure portability, but it is a step in the right direction.

• This means we should not assume that our system automatically initializes our variables to zeros or blanks. We should ALWAYS explicitly initialize our variables.

• When you do need to use a feature that is depending on a particular computer, try to isolate it in a particular module so that when you have to maintain the program, or port it to another system, you can easily see what portions have to be touched. It is always a good idea to treat input/output using this approach.

Software Engineering

• In this class are programs are very small in comparison to the development of most industry software systems. In industry - a large amount of planing needs to be done before any code is written down. Plus - the software isn't DONE when you think you are finished debugging. Customers need to take a look at the code (maybe called alpha/beta release) and see if it really meets their needs. Then the code needs to be officially released, maintained, and then will evolve as customer's needs change and develop.

• This development process is actually called the software life cycle. It has phases of:

Specification

Design

Implementation

Testing

Alpha/Beta Release

Revision/Testing/Debugging

Release of Software & Documentation

Maintenance

Evolution

Obsolescence

C. Pseudocode

is an informal high-level description of the operating principle of a computer program or other algorithm.

It uses the structural conventions of a programming language, but is intended for human reading rather than machine reading. Pseudocode typically omits details that are essential for machine understanding of the algorithm, such as variable declarations, system-specific code and some subroutines. The programming language is augmented withnatural language description details, where convenient, or with compact mathematical notation. The purpose of using pseudocode is that it is easier for people to understand than conventional programming language code, and that it is an efficient and environment-independent description of the key principles of an algorithm. It is commonly used in textbooks and scientific publications that are documenting various algorithms, and also in planning of computer program development, for sketching out the structure of the program before the actual coding takes place.

No standard for pseudocode syntax exists, as a program in pseudocode is not an executable program. Pseudocode resembles, but should not be confused with skeleton programs which can be compiled without errors. Flowcharts, drakon-charts and Unified Modeling Language (UML) charts can be thought of as a graphical alternative to pseudocode, but are more spacious on paper.

_-_JUST FOR FUN_-_

Tuesday, March 15, 2016

SUMMARY FOR MEETING 2

Monday, February 29, 2016

Sumarry for Meeting 1 Kevin S 190147630

About Me