ML Fundamentals
Building a machine learning model involves:
- Data collection and preparation,
- Selecting an appropriate algorithm,
- Training the model on the prepared data, and
- Evaluating its performance through testing and iteration.
Training data:
Bad data is often called garbage in, garbage out, and therefore an ML model is only as good as the data used to train it.
- Labeled data - each input instance has a label that represent output for classification
-
Unlabled data
- Structured data - data with particular format, mostly like form of tables or databases with rows and columns
- Tabular data - data stored in spreadsheets, databases, or CSV files
- Time-series data - data consists of sequences of values measured at successive points in time, such as stock prices, sensor readings, or weather data.
- Unstructured data - data that lacks a predefined structure or format, such as text, images, audio, and video
- Text data - documents, articles, social media posts, and other textual data.
- Image data - digital images, photographs, and video frames.
Machine learning process:
- In supervised learning, the algorithms are trained on labeled data. The goal is to learn a mapping function that can predict the output for new, unseen input data.
- Unsupervised learning refers to algorithms that learn from unlabeled data. The goal is to discover inherent patterns, structures, or relationships within the input data.
- In reinforcement learning, the machine is given only a performance score as guidance and semi-supervised learning, where only a portion of training data is labeled. Feedback is provided in the form of rewards or penalties for its actions, and the machine learns from this feedback to improve its decision-making over time.
Inferencing: (process of using the information that a model has learned to make predictions or decisions)
Syncronous inferencing
Client-req -> AI -> immediate response
Asyncronous inferencing - longer processing time
Client-req -> AI(ack) -> later
- Batch inferencing - analyzes large amount of data all at once, where the speed of the decision-making process is not as crucial as the accuracy of the results.
Eg: data analysis - Real-time inferencing - process the incoming data and make a decision almost instantaneously, without taking the time to analyze a large dataset
Eg: chatbots or self-driving cars