ML之Kmeans：利用自定义Kmeans函数实现对多个坐标点(自定义四个点)进行自动(最多迭代10次)分类

2024-06-10 08:20:59

输出结果

核心代码

#!/usr/bin/python
# -*- coding:utf-8 -*-

import numpy as np
#ML之Kmeans：利用自定义Kmeans函数实现对多个坐标点(自定义四个点)进行自动(最多迭代10次)分类

def kmeans(X, k, maxIt):  

    numPoints, numDim = X.shape 

    dataSet = np.zeros((numPoints, numDim + 1))
    dataSet[:, :-1] = X   

    centroids = dataSet[np.random.randint(numPoints, size = k), :]
    #centroids = dataSet[0:2, :]
    #Randomly assign labels to initial centorid给初始中心随机分配标签
    centroids[:, -1] = range(1, k +1)  

    iterations = 0
    oldCentroids = None  

    # Run the main k-means algorithm
    while not shouldStop(oldCentroids, centroids, iterations, maxIt):
        print ("iteration: \n", iterations)
        print ("dataSet: \n", dataSet)
        print ("centroids: \n", centroids)
        # Save old centroids for convergence test. Book keeping.
        oldCentroids = np.copy(centroids)
        iterations += 1                    

        # Assign labels to each datapoint based on centroids
        updateLabels(dataSet, centroids)    

        # Assign centroids based on datapoint labels
        centroids = getCentroids(dataSet, k) 

    # We can get the labels too by calling getLabels(dataSet, centroids)
    return dataSet
# Function: Should Stop
# -------------
# Returns True or False if k-means is done. K-means terminates either
# because it has run a maximum number of iterations OR the centroids
# stop changing.
def shouldStop(oldCentroids, centroids, iterations, maxIt):
    if iterations > maxIt:
        return True
    return np.array_equal(oldCentroids, centroids)
# Function: Get Labels
# -------------
# Update a label for each piece of data in the dataset.
def updateLabels(dataSet, centroids):
    # For each element in the dataset, chose the closest centroid.
    # Make that centroid the element's label.
    numPoints, numDim = dataSet.shape
    for i in range(0, numPoints):
        dataSet[i, -1] = getLabelFromClosestCentroid(dataSet[i, :-1], centroids)  

def getLabelFromClosestCentroid(dataSetRow, centroids):
    label = centroids[0, -1];
    minDist = np.linalg.norm(dataSetRow - centroids[0, :-1])
    for i in range(1 , centroids.shape[0]):
        dist = np.linalg.norm(dataSetRow - centroids[i, :-1])
        if dist < minDist:
            minDist = dist
            label = centroids[i, -1]
    print ("minDist:", minDist)
    return label

# Function: Get Centroids
# -------------
# Returns k random centroids, each of dimension n.
def getCentroids(dataSet, k):
    # Each centroid is the geometric mean of the points that
    # have that centroid's label. Important: If a centroid is empty (no points have
    # that centroid's label) you should randomly re-initialize it.
    result = np.zeros((k, dataSet.shape[1]))
    for i in range(1, k + 1):
        oneCluster = dataSet[dataSet[:, -1] == i, :-1]
        result[i - 1, :-1] = np.mean(oneCluster, axis = 0)
        result[i - 1, -1] = i 

x1 = np.array([1, 1])
x2 = np.array([2, 1])
x3 = np.array([4, 3])
x4 = np.array([5, 4])
testX = np.vstack((x1, x2, x3, x4))  

result = kmeans(testX, 2, 10)
print ("final result:")
print (result)

相关文章
ML之Kmeans：利用自定义Kmeans函数实现对多个坐标点(自定义四个点)进行自动(最多迭代10次)分类

【生成模型】简述概率密度函数可处理流模型

本期将介绍第二种非常优雅的生成模型-流模型,它也是一种概率密度函数可处理的生成模型.本文将对其原理进行介绍,并对nice模型的源码进行讲解. 作者&编辑 | 小米粥 1 流模型这是一种想法比 ...
你可能用了一个"假的"Kmeans

年三十晚,想起之前写Kmeans聚类的一些感悟. 今天在高铁上,看了一本书,书上又再次出现了这么一句话,我觉得挺好,大体意思是: 在写代码这个事情上,没有人能告诉你怎么做一定对,但是总有人能告诉你,怎 ...
Python机器学习算法：线性回归

https://m.toutiao.com/is/JTpGVoD/ 线性回归可能是最常见的算法之一,线性回归是机器学习实践者必须知道的.这通常是初学者第一次接触的机器学习算法,了解它的操作方式对于更好 ...
(4条消息) 深度学习中的epochs，batch

深度学习框架中涉及很多参数,如果一些基本的参数如果不了解,那么你去看任何一个深度学习框架是都会觉得很困难,下面介绍几个新手常问的几个参数. batch 深度学习的优化算法,说白了就是梯度下降.每次的参 ...
tensorflow 学习笔记-- 深度学习中epochs batchsize iteration的概念

深度学习框架中涉及很多参数,如果一些基本的参数如果不了解,那么你去看任何一个深度学习框架是都会觉得很困难,下面介绍几个新手常问的几个参数. batch 深度学习的优化算法,说白了就是梯度下降.每次的参 ...
ML之K-means：基于K-means算法利用电影数据集实现对top 100 电影进行文档分类

ML之K-means:基于K-means算法利用电影数据集实现对top 100 电影进行文档分类输出结果先看文档分类后的结果,一共得到五类电影: 实现代码 # -*- coding: utf-8 ...
导数高考题分析之2017年全国I理数：利用导数研究函数单调性、函数零点问题，注意导函数因式分解

导数高考题分析之2017年全国I理数 :利用导数研究函数单调性.函数零点问题,注意导函数因式分解函数导数研究函数性质和证明不等式问题,一直都是以高考压轴题的地位出现,也是大家的噩梦,但其实这类问题最 ...
PQ-M及函数：结合前期案例，学习自定义函数

小勤:<单个格式化表单转数据明细>的方法学会了,赶紧给我讲一下怎么实现批量的转换吧. 大海:不着急嘛,要实现批量的转换,需要学点儿新的知识--自定义函数. 小勤:自定义函数?就是自己写函数 ...
利用不同的函数，根据年份算出生肖

不知道你有没有遇到过这样的情况,有时候给你一个年份,让你算出它属于鸡年,猪年,还是狗年.你又懒得算,可能就直接查百度.今天我们可以用几种函数自己制作个查询表. - 01- vlookup函数用vl ...
2021高考高分套路系列-导数（上）导数的概念及计算+导数的切线方程+利用导数求函数的单调性、极值、最值

2021高考高分套路系列-导数(上)导数的概念及计算+导数的切线方程+利用导数求函数的单调性.极值.最值#新高考##高考##导数##高三##高中数学# 长图长图长图长图长图长图长图长图 ...
利用工作表函数进行排重(VBA学习方案系列讲座之二十五讲)

3.2.13 工作表函数在VBA中的灵活应用在<VBA代码解决方案>这套教程中,给大家介绍了一种利用工作表函数CountIf来判断是否录入了重复值的方案.这种方案虽然不是我推荐的,但对于 ...
导数的综合应用之利用导数探索函数的零点问...

导数的综合应用之利用导数探索函数的零点问...
Excel中利用Forecast.ets函数做未来数据预测，简单到没朋友！

Excel中利用Forecast.ets函数做未来数据预测，简单到没朋友！
高中数学利用导数研究函数的单调性极值和最...

高中数学利用导数研究函数的单调性极值和最值,再由单调性来证明不等式是函数.导数.不等式综合中的一个难点,也是近几年高考的热点.解题技巧是构造辅助函数,把不等式的证明转化为利用导数研究函数的单调性 ...

ML之Kmeans：利用自定义Kmeans函数实现对多个坐标点(自定义四个点)进行自动(最多迭代10次)分类

输出结果

核心代码

相关推荐