Predictive Model Markup Language (PMML)

Latest News

Beautiful-Templates.com Joomla Extensions - Joomla Templates

Predictive Model Markup Language (PMML)

Category: August 2018 - Predictive Algorithms & Native Ads

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format that provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms. Version 1.0 was published in 1999 and Version 4.0 ten years later.

It’s latest version PMML 4.3 was released on August 23, 2016 with new features, including new model types (Gaussian Process, Bayesian Network), new built-in functions, usage clarifications, and documentation improvements.

In general, a PMML file can be described by the following components:

Header: contains general information about the PMML document, such as copyright information for the model, its description, and information about the application used to generate the model such as name and version. It also contains an attribute for a timestamp which can be used to specify the date of model creation.

Data Dictionary: contains definitions for all the possible fields used by the model. It is here that a field is defined as continuous, categorical, or ordinal (attribute OpType). Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double).

Data Transformations: allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations.

Normalization: map values to numbers, the input can be continuous or discrete.
Discretization: map continuous values to discrete values.
Value mapping: map discrete values to discrete values.
Functions (custom and built-in): derive a value by applying a function to one or more parameters.
Aggregation: used to summarize or collect groups of values.

Model: contains the definition of the data mining model, for instance, a multi-layered “feedforward neural network” is represented in PMML by a "NeuralNetwork" element which contains attributes such as Model Name, Function Name, Algorithm Name, Activation Function, Number of Layers. This information is then followed by three kinds of neural layers which specify the architecture of the neural network model being represented in the PMML document. These attributes are NeuralInputs, NeuralLayer, and NeuralOutputs. Besides neural networks, PMML allows for the representation of many other types of models including support vector machines, association rules, Naive Bayes classifier, clustering models, text models, decision trees, and different regression models.

Mining Schema: a list of all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as:

Name: must refer to a field in the data dictionary
• Usage type: defines the way a field is to be used in the model. Typical values are: active, predicted, and supplementary. Predicted fields are those whose values are predicted by the model.
Outlier Treatment: defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values (based on the definition of high and low values for a field), or as is.
Missing Value Replacement Policy: if this attribute is specified then a missing value is automatically replaced by the given values.
Missing Value Treatment: indicates how the missing value replacement was derived (e.g. as value, mean or median).

Targets: allows for post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute priorProbability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values.

Output: this element can be used to name all the desired output fields expected from the model. These are features of the predicted field and so are typically the predicted value itself, the probability, cluster affinity (for clustering models), standard error, etc.

Source: Wikipedia

Latest News

Messengers and social media can serve to disclose personal data

Not all data is created equal

What needs to be considered in terms of data protection when cooperating with freelancers

Growing security risks to drive APAC managed security services revenue to US$17bn in 2024

The importance of E-A-T in content creation

List of Asian countries most at risk of RDP brute-force attacks

Malaysia successfully fighting streaming piracy

Novel Linux malware targets VoIP softswitches

Manipulated language AI can write manifestos that appear human and radicalize automatically

Sentryc stops product piracy on the internet

Fileless malware continues to grow

Competitive advantage of cybersecurity often still undervalued

Apple operating system increasingly the target of malware

ConnecTechAsia announces headliners and themes for virtual conference

10 golden rules for a safe home office

ioGates ultra secure content sharing service adds desktop app and smart link sharing

Privileged accounts remain a popular gateway for hackers

Thycotic deploys advanced machine learning to control dangerous applications on endpoints

It’s not about technology, but people

How and especially why do human beings select or decide against a product or service?

Predictive Model Markup Language (PMML)

August 2018 - Predictive Algorithms & Native Ads

Latest News

August 2018 - Predictive Algorithms & Native Ads

About Us

Contact Us